Giving an LLM Tools and the Room to Plan

Agentic AILLMsRAG

Giving an LLM Tools and the Room to Plan

Chanda Charan Reddy·May 20, 2026 8 min read

A chatbot answers. An agent acts.

The first time I really felt that difference, I'd stopped asking a model questions and started handing it a job — "pull this invoice, check it against the purchase order, flag anything that doesn't line up" — and then watching it work out the steps on its own. No flowchart. No if-this-then-that I'd written in advance. Just a goal, a set of tools, and enough room to plan.

That gap — between answering and acting — is where agentic AI lives. It's also where I've spent most of my last year, and where I've learned that the demo is easy and the trust is hard.

The unglamorous definition

Strip away the hype and an "agent" is embarrassingly simple: a loop. The model looks at a goal, decides on one action, takes it, reads the result, and decides again — until it thinks it's done.

The magic isn't the model. It's the three things you give it:

Tools — functions it can call. Search a database. Read a file. Hit an API. The model doesn't do these things; it asks your code to, and your code reports back.
Memory — what's happened so far, so step five knows what step two found.
A stopping condition — the part everyone forgets until their agent is on its fortieth loop, confidently going nowhere.

Most "agent frameworks" are just opinionated wrappers around that loop. Understanding the loop first saved me from cargo-culting a framework I didn't need.

What actually broke

The demo worked on day one. Production took the next three months. Here's what went wrong, roughly in order of how much it hurt.

It would loop forever. Given a goal it couldn't reach, the model didn't give up — it kept trying, each attempt a little more creative and a little more wrong. The fix wasn't smarter prompting. It was a hard budget: a maximum number of steps, after which the agent stops and reports "I couldn't finish this — here's how far I got." A confused human who asks for help beats a confident agent that doesn't.

It hallucinated tool calls. It would invent a function that didn't exist, or pass arguments in a shape my code never expected. I stopped trusting the model to format anything and started validating every tool call against a strict schema before executing it. If the call doesn't parse, the model gets the error back and tries again — which, surprisingly, it's quite good at recovering from.

It was confidently wrong on the things that mattered most. This is the one that keeps you up at night. The fix is structural, not clever: route low-confidence decisions to a human. In the document pipeline I built at work, the agent handles the clear cases on its own and quietly escalates the ambiguous ones. That single design choice is the difference between "neat prototype" and "something a business will actually run."

Tools are an interface, not a list

The biggest lever on agent quality wasn't the model or the prompt. It was the tools.

A tool called get_data that returns a wall of JSON is a bad tool — the model drowns in it. A tool called find_invoice(vendor, month) that returns three clean fields is a great one. Designing tools for an agent feels a lot like designing an API for a junior engineer who is brilliant, tireless, and has no memory of yesterday: be explicit, return only what's needed, and make the failure messages teach.

I started writing tool descriptions like documentation, because that's exactly what they are — the model reads them to decide what to reach for. Vague description, wrong tool. Every time.

Plan, then act — but verify

The pattern that finally clicked was separating thinking from doing.

First, let the model plan: "To do this, I'll need to look up X, compare it to Y, then summarize." Planning out loud, before touching a single tool, made the whole run easier to debug — when something went sideways, I could see where the reasoning went wrong, not just that the output was wrong.

Then act, one step at a time, feeding real results back in. And critically: verify at the end. An agent that grades its own homework against the original goal catches a shocking number of its own mistakes. "Did I actually answer what was asked?" is a cheap, powerful final step.

Where I've landed

Agentic systems are not magic, and they're not nearly as autonomous as the word suggests. The good ones are humble: they plan visibly, act in small steps, validate constantly, and escalate the moment they're unsure. The bad ones are the opposite — opaque, overconfident, and impossible to debug at 2 a.m.

If you're starting out, build the loop by hand once before you reach for a framework. Give the model a couple of well-described tools, a step budget, and a way to say "I'm not sure." You'll learn more from that one weekend than from any amount of reading — including this post.

The future I'm building toward isn't AI that replaces the human in the loop. It's AI that's earned the right to be trusted in the loop. That trust is engineered, one guardrail at a time.

Tags: Agentic AI, LLMs, RAG, Engineering