Res Futuras

I’ve been watching the AI space for a while now. I was already convinced that AI was a game changer, but what it meant and when it was happening were up in the air. In 2024 it was obvious SEO was dead; a “paradigm shift” in search had happened. Not to mention “content” was dead, but let’s talk about it another day.

Next target was images/videos, and that actually affected a lot of jobs and small creative works, but not a huge collective impact yet. A tool in the toolbox. There is a whole philosophical aspect of this conversation, but let’s put that aside for now too.

In late 2025, December to be precise, Opus 4.5 (generally paired with Claude Code) was released along with OpenAI’s next frontier model GPT5.1 (generally paired with Codex). To me this was the turning point of Agentic AI. Moving from AI that can tell you why birds caw instead of barking, now AI can actually get shit done. If you haven’t tried AI coding assistants powered by a frontier model, you won’t believe the hype. If you did, you’ll realize if anything it’s under-hyped!

In 2026, Agentic AI became real in a way that you can practically one-shot a small “production-ready” project. Which forced me to rethink everything, you can read my initial thoughts after spending about 3 weeks in this newfound world: Inflection Point for AI Coding

Unlimited Power

Wielding this superpower, I started building this unique set of projects, which is technically known as random shit. Because I inherently care about privacy and avoid storing private information on the cloud, my first task was to build some local-first AI projects. Here is a rough timeline.

Introducing AI capabilities to a toy Robot (PuppyPi)
Learning and deploying code to ROS2 Apparently ROS2 is a specialized OS for robotics, while it’s overkill for hobby projects, I enjoyed playing with it.
A local AI Chat Agent
Effectively a knock-off of ChatGPT powered by Qwen3 (Chinese open-weight model). <Insert a joke here about model being Chinese and project being a knock-off, but make it subtle>
TTS/STT
This has ranged from real-time / batch STT processing, and TTS/STT models. Pretty fascinating how good and light some of these models are.
Hacking a bunch of open source projects
I want to explore this topic more, and will write about it later.

Building a RAG

As part of a local AI agent, I’ve added features like long-term memory and tool/function calling and MCP support.

While working on this I’ve realized that my local agent is an idiot, and RAG cannot be just a simple vector database.

Pipelines and Orchestration

RAG needs a RAG pipeline, which can be something like this:

Ingestion → normalization → chunking → embedding → indexing → retrieve → reranking

A lot of complexity to deliver very mediocre results. I used to complain about how dumb Gemini is, having all my data and being practically unable to even properly search in it. I have changed my mind, this is a big fucking problem to solve, and to solve it well at scale is a real problem.

Challenges of general purpose AI

A production quality general purpose LLM like ChatGPT or Gemini does a lot more than simply connecting a model to a chat window.

It generally has a FSM (Finite State Machine) and many moving pieces. Here is an over-simplified version of it.

Intake → Router → (Planner || chunking) → Executor (<Tools>) → Verifier → Presenter

Even if you do the most obvious Planner → Executor → Presenter loop you are suddenly using 2 different models and making 3 LLM calls for one request while dealing with the complexity of tool calls/MCP and injecting their results into the context.

I’m completely ignoring multimodal challenges like working with images and text.

ChatGPT is a great example of a good design, with the hidden complexity. It provides a product like magic. What appears to be a simple LLM response and tool calling is actually a very good UX that hides layers and layers of complexity with the state of the art model.

Multiple LLM calls to do one thing

Context Management

If you are going to understand one thing about LLMs it has to be the context. It’s literally the most important aspect of LLMs while implementing them into your flows. LLM conversations happen ephemerally, you write something, it answers, you close the session, and it’s all gone, never to be seen or mean anything ever again.

This introduces two key problems

When context size is reached, you have to decide what you are doing. Actually, even before reaching it, when context reaches a certain point LLMs’ capabilities decrease. See: Context Rot
Providing a long-term memory. When a user starts a new conversation you need to provide relevant memory.

Both are deceptively hard problems to solve well.

LangGraph

All of this complexity led me to solutions like LangGraph; effectively, it simplifies defining flows for agents. I’ve experimented with my Local LLM Agent and it helps a lot with experimentation of different workflows. There is a whole ecosystem of these libraries and tools.

So? What did we learn?

So, like everything else… building good stuff is hard, and shit is complicated.

Also:

We can develop real things, extremely fast now.
AI ecosystem is super strong and there is tremendous open source adoption
Building good local AI solutions is very hard. Besides the complexity, it requires way too much inference power to make it practical for real work. 5090 RTX might help but won’t cut it unless it’s for light personal usage.

This is as chaotic as a blog post gets, but treat this one as a diary and introduction of a long story. Now you have some background, I’ll dive into specific topics like building a Coding Agent, looking into Agentic AI’s limitations and reality, exploring the privacy and security problems of AI agents.