Deep Dive · LLM Engineering

Building a Real Agent,
Step-by-Step

Agents are everywhere — in demos, product launches, research papers, and every developer thread online. But "agent" has become one of those overloaded words that can mean anything. This post ignores the hype and builds one from first principles, piece by piece, without any framework.

30 – 60 min read Node.js · OpenAI-compatible APIs · MCP

We are going to ignore the hype and build an agent from first principles, piece by piece, without using any framework — so we can see what an agent actually is under the hood, and how it works.

Expected time: 30 min – 1 hour 📦 Repo: agentic-loop on GitHub

What "agentic loop" actually means

A normal LLM app does this:

  1. send prompt
  2. get answer
  3. stop

An agentic loop does this instead:

  1. send prompt plus a list of available tools
  2. let the model decide whether it needs a tool
  3. execute the requested tool call in our own code
  4. give the tool result back to the model
  5. repeat until the model stops asking for tools

That repeat cycle is the loop. The model chooses, and our runtime executes the chosen tool(s).

What we are building

We are not building a toy chatbot that answers once and exits. We are building a loop that:

  • keeps conversation state across turns
  • supports tool calls, including multi-step tool chains
  • records messages and usage metrics in SQLite
  • can resume a previous session by ID
  • can use both built-in tools and MCP tools side by side

That last point matters a lot. The built-in tools are just a teaching step — they make the mechanics easy to see because we can read the tool definition and its handler in the same file. The real power is MCP. Once the loop speaks MCP, the agent can discover and use capabilities from external tool servers declared in config.json, all without changing the core loop. This is what gives the agentic loop all its powers to do anything, including, but not limited to, the coding superpowers.


Start with one plain chat completion

Before we talk about agents, let us start with the simplest thing that actually works.

Create a file called src/index.js. We are going to need a raw HTTP function because the whole project uses zero runtime dependencies at this stage (we will bring in exactly one package much later, when we actually need it).

js
'use strict';

const http = require('http');
const https = require('https');

function httpPost(url, headers, bodyObj) {
  return new Promise((resolve, reject) => {
    const payload = JSON.stringify(bodyObj);
    const lib = url.startsWith('https') ? https : http;
    const u = new URL(url);

    const req = lib.request({
      hostname: u.hostname,
      port: u.port || (u.protocol === 'https:' ? 443 : 80),
      path: u.pathname + u.search,
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'Content-Length': Buffer.byteLength(payload),
        ...headers,
      },
    }, (res) => {
      const chunks = [];
      res.on('data', (c) => chunks.push(c));
      res.on('end', () => {
        const raw = Buffer.concat(chunks).toString('utf8');
        if (res.statusCode < 200 || res.statusCode >= 300) {
          return reject(new Error(`HTTP ${res.statusCode}: ${raw.slice(0, 300)}`));
        }
        resolve(JSON.parse(raw));
      });
    });

    req.on('error', reject);
    req.write(payload);
    req.end();
  });
}

async function chatCompletions(baseURL, apiKey, body) {
  return httpPost(
    `${baseURL.replace(/\/$/, '')}/chat/completions`,
    { Authorization: `Bearer ${apiKey}` },
    body
  );
}

Nothing clever there. Raw Node.js HTTP, a JSON body, a Bearer token. Now add a main() to make it runnable:

js
async function main() {
  const baseURL = process.env.BASE_URL || 'https://api.openai.com/v1';
  const apiKey = process.env.OPENAI_API_KEY;

  if (!apiKey) {
    throw new Error('Set OPENAI_API_KEY first.');
  }

  const resp = await chatCompletions(baseURL, apiKey, {
    model: 'gpt-4o',
    messages: [
      { role: 'user', content: 'What is an agentic loop?' }
    ],
  });

  console.log(resp.choices[0].message.content);
}

main().catch((err) => {
  console.error(err);
  process.exit(1);
});

Run it and you get a response. Valid LLM app. Not an agent. Why not? No loop. No tool execution. No growing transcript. The conversation starts and ends in a single round-trip. That is the specific thing we are going to fix, step by step.

Note

The code uses the raw http/https modules instead of fetch or axios. You could absolutely use fetch here (it is built into Node 18+). The raw approach was chosen because it makes the mechanics explicit and introduces zero dependencies. In a production codebase you would probably just reach for fetch or your HTTP client of choice.

Add config and CLI input

Hardcoding credentials and model names is fine for ten minutes and annoying after that. Let us add:

  • config.json with all tunables
  • CLI prompt input so we do not have to edit the file every time
  • an optional system prompt
js
function loadConfig() {
  const argv = process.argv.slice(2);
  let configPath = path.resolve('./config.json');
  const promptTokens = [];

  for (let i = 0; i < argv.length; i++) {
    if ((argv[i] === '--config' || argv[i] === '-c') && argv[i + 1]) {
      configPath = path.resolve(argv[++i]);
    } else {
      promptTokens.push(argv[i]);
    }
  }

  const file = fs.existsSync(configPath)
    ? JSON.parse(fs.readFileSync(configPath, 'utf8'))
    : {};

  const cfg = {
    model: file.model || process.env.MODEL || 'gpt-4o',
    baseURL: (file.baseURL || process.env.BASE_URL || 'https://api.openai.com/v1').replace(/\/$/, ''),
    apiKey: file.apiKey || process.env.OPENAI_API_KEY || process.env.API_KEY || '',
    maxAgenticLoops: Number(file.maxAgenticLoops ?? process.env.MAX_AGENTIC_LOOPS ?? 10),
    maxTotalInputTokens: Number(file.maxTotalInputTokens ?? process.env.MAX_TOTAL_INPUT_TOKENS ?? 100000),
    maxTotalOutputTokens: Number(file.maxTotalOutputTokens ?? process.env.MAX_TOTAL_OUTPUT_TOKENS ?? 20000),
    sqliteDbPath: file.sqliteDbPath || process.env.SQLITE_DB_PATH || './agentic-loop.db',
    systemPrompt: file.systemPrompt || process.env.SYSTEM_PROMPT || null,
    mcpServers: file.mcpServers || file.mcpServer || {},
  };

  const userPrompt = promptTokens.join(' ').trim();

  if (!cfg.apiKey) throw new Error('No API key. Set config.apiKey or OPENAI_API_KEY.');
  if (!userPrompt) throw new Error('Usage: node src/index.js "your prompt"');

  return { cfg, userPrompt };
}

Notice the config layering: config.json wins, then environment variables, then hardcoded defaults. This is a common pattern that keeps things flexible without being complicated.

The config.json to put alongside the file:

json
{
  "model": "gpt-4o",
  "baseURL": "https://api.openai.com/v1",
  "apiKey": "sk-...",
  "maxAgenticLoops": 10,
  "maxTotalInputTokens": 100000,
  "maxTotalOutputTokens": 20000,
  "systemPrompt": "You are a helpful coding assistant.",
  "mcpServers": {}
}

Still not agentic. But now the app is configurable, usable from the command line, and ready to grow.

Turn one request into a real loop

This is the first big shift.

Instead of one request and stop, we keep a messages array and send the whole thing back on every round. The model gets to see:

  • the original user request
  • its own previous replies
  • tool requests it made
  • the outputs of those tool requests

That growing transcript is the memory of the loop. Nothing is forgotten between rounds — it is all in the array.

js
let finalText = null;
let loopCount = 0;

while (loopCount < cfg.maxAgenticLoops) {
  loopCount++;

  const resp = await chatCompletions(cfg.baseURL, cfg.apiKey, {
    model: cfg.model,
    messages,
  });

  const choice = resp.choices?.[0];
  if (!choice) throw new Error('No choices in API response');

  const msg = choice.message;
  if (msg.content) finalText = msg.content;

  messages.push({
    role: 'assistant',
    content: msg.content || null,
    tool_calls: msg.tool_calls,
  });

  const toolCalls = msg.tool_calls || [];
  if (toolCalls.length === 0) break;

  // We will execute tool calls in the next step.
}

console.log(finalText || '[No text response generated]');
Note

At this point the loop has no tools to offer the model, so tool_calls will always be empty and the loop will always exit after exactly one round. That is fine. The shape is right and we are about to fill in the interesting part.

Add two tiny built-in tools first

Before connecting to external MCP servers, we will start with tools that live right here in the same file. That removes all network and process complexity so we can focus on the contract itself:

  • the model sees a tool definition (name, description, parameters schema)
  • the model returns a tool call with arguments
  • our runtime executes it
  • we append the result and loop back

Two tools: get_current_time (no args, dead simple) and read_text_file (one arg, does real I/O).

js
function getInternalTools() {
  return [
    {
      name: 'get_current_time',
      description: 'Return the current server time in ISO 8601 format.',
      parameters: { type: 'object', properties: {}, required: [] },
      handler: async () => new Date().toISOString(),
    },
    {
      name: 'read_text_file',
      description: 'Read a UTF-8 text file from local disk using an absolute or relative path.',
      parameters: {
        type: 'object',
        properties: {
          filePath: {
            type: 'string',
            description: 'Absolute or relative path to the text file to read',
          },
        },
        required: ['filePath'],
      },
      handler: async (args) => {
        const rawPath = String(args?.filePath || '').trim();
        if (!rawPath) throw new Error('filePath is required');
        const resolvedPath = path.resolve(rawPath);
        if (!fs.existsSync(resolvedPath)) throw new Error(`File not found: ${resolvedPath}`);
        const stat = fs.statSync(resolvedPath);
        if (!stat.isFile()) throw new Error(`Not a file: ${resolvedPath}`);
        return fs.readFileSync(resolvedPath, 'utf8');
      },
    },
  ];
}

function bootInternalTools() {
  const toolMap = {};
  const tools = [];

  for (const tool of getInternalTools()) {
    tools.push({
      type: 'function',
      function: {
        name: tool.name,
        description: tool.description || '',
        parameters: tool.parameters || { type: 'object', properties: {}, required: [] },
      },
    });
    toolMap[tool.name] = { kind: 'internal', execute: tool.handler };
  }

  return { toolMap, tools };
}

Two different things are happening here and both matter:

  • tools is the list we hand to the model. It is pure schema — name, description, parameter shapes. No implementation.
  • toolMap is our private dispatch table. It is keyed by tool name and holds the actual executor function.

The model never sees the executor. It only sees the schema. Your runtime keeps the implementation. That separation is not just good design — it is required. The model is running in a datacenter somewhere. It literally cannot call your function. It can only tell you what it wants called, and you run it on its behalf.

Execute tool calls

Now we wire those tools into the loop. This is the heart of the whole system:

js
const toolCalls = msg.tool_calls || [];
if (!toolCalls.length) break;

for (const tc of toolCalls) {
  const fnName = tc.function?.name ?? '?';
  let args = {};

  try {
    args = JSON.parse(tc.function?.arguments || '{}');
  } catch {}

  const binding = toolMap[fnName];
  let resultText;

  if (!binding) {
    resultText = `Error: no handler found for tool "${fnName}"`;
  } else {
    try {
      resultText = await binding.execute(args);
    } catch (e) {
      resultText = `Error: ${e.message}`;
    }
  }

  messages.push({
    role: 'tool',
    tool_call_id: tc.id,
    content: resultText,
    name: fnName,
  });
}

If you only understand one thing from this whole walkthrough, make it this pattern:

  1. the model asks for a tool by name with arguments
  2. your code looks up that name in toolMap
  3. your code runs the executor with the provided arguments
  4. your code appends a role: "tool" message with the result and the matching tool_call_id
  5. the model sees that output on the next round and continues from there

Notice also that errors are handled gracefully: instead of crashing, we send the error message back as the tool result. The model can then decide what to do — retry with different arguments, explain the problem to the user, or try something else.

Add budgets and stopping conditions

Agent loops need guardrails. Without them, a loop can run for too many rounds, spend more tokens than intended, or get caught in a cycle where the model keeps making tool calls that go nowhere.

js
class TokenBudget {
  constructor(maxIn, maxOut) {
    this.maxIn = maxIn; this.maxOut = maxOut;
    this.totalIn = 0; this.totalOut = 0; this.totalCached = 0;
  }

  record(usage) {
    if (!usage) return { prompt_tokens: 0, completion_tokens: 0, cached_tokens: 0 };
    const pt = usage.prompt_tokens ?? usage.input_tokens ?? 0;
    const ct = usage.completion_tokens ?? usage.output_tokens ?? 0;
    const cached = usage.prompt_tokens_details?.cached_tokens ?? 0;
    this.totalIn += pt; this.totalOut += ct; this.totalCached += cached;
    return { prompt_tokens: pt, completion_tokens: ct, cached_tokens: cached };
  }

  check() {
    if (this.totalIn > this.maxIn)
      throw new Error(`Input token budget exceeded: ${this.totalIn} > ${this.maxIn}`);
    if (this.totalOut > this.maxOut)
      throw new Error(`Output token budget exceeded: ${this.totalOut} > ${this.maxOut}`);
  }
}

// Inside the loop, after each response:
const budget = new TokenBudget(cfg.maxTotalInputTokens, cfg.maxTotalOutputTokens);
const metrics = budget.record(resp.usage);
budget.check();

The token accounting handles both OpenAI naming (prompt_tokens, completion_tokens) and Anthropic naming (input_tokens, output_tokens). Because the loop is built against any OpenAI-compatible API, this small normalization step makes it portable across providers.

Beyond the token budget, the loop should also stop when:

  • tool_calls is empty — the model has nothing more to do
  • loopCount >= cfg.maxAgenticLoops — hard ceiling on iterations
  • finish_reason === "length" — the model's response was truncated

These are simple controls but they genuinely matter in production. A loop without them is a loop you will eventually regret.

Add SQLite persistence

The loop now works correctly, but every run is ephemeral. When the process exits, the conversation is gone. We fix that by recording everything to SQLite. Three tables:

  • sessions — one row per conversation
  • messages — every message in every session, in order
  • iterations — per-round token usage for cost tracking and debugging

This is where the only external dependency comes in:

terminal
$ npm install better-sqlite3
js
function initDB(db) {
  db.exec(`
    CREATE TABLE IF NOT EXISTS sessions (
      id TEXT PRIMARY KEY,
      created_at DATETIME DEFAULT CURRENT_TIMESTAMP
    );
    CREATE TABLE IF NOT EXISTS messages (
      id INTEGER PRIMARY KEY AUTOINCREMENT,
      session_id TEXT,
      role TEXT,
      content TEXT,
      tool_calls TEXT,
      tool_call_id TEXT,
      name TEXT,
      created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
      FOREIGN KEY(session_id) REFERENCES sessions(id)
    );
    CREATE TABLE IF NOT EXISTS iterations (
      id INTEGER PRIMARY KEY AUTOINCREMENT,
      session_id TEXT,
      loop_index INTEGER,
      prompt_tokens INTEGER,
      completion_tokens INTEGER,
      cached_tokens INTEGER,
      created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
      FOREIGN KEY(session_id) REFERENCES sessions(id)
    );
  `);
}

With that in place, we have real history. We can open the database with any SQLite client and inspect exactly what happened in every run, including which tools were called with what arguments and what they returned.

Add resume support

Resume, exactly what the word means, allows us to continue a past conversation. We can close the terminal, come back tomorrow, and pick up exactly where we left off. All that work, all those tool calls, all that cost — need not be repeated. We simply continue with the next task, building upon past conversations, exactly how you do it in ChatGPT or Claude Code.

js
let resumeSessionId = null;

for (let i = 0; i < argv.length; i++) {
  if ((argv[i] === '--config' || argv[i] === '-c') && argv[i + 1]) {
    configPath = path.resolve(argv[++i]);
  } else if (argv[i] === '--resume' && argv[i + 1]) {
    resumeSessionId = argv[++i];
  } else {
    promptTokens.push(argv[i]);
  }
}

// In main(), branch on resumeSessionId:
// - if set: load all messages for that session from SQLite into the messages array
// - if a new prompt was also given, append it as a new user message
// - if no new prompt, just re-enter the loop with the loaded history
// - if not set: generate a fresh UUID and start clean

Add MCP — where this gets powerful

Up to now the tools lived inside our own file. That was useful for learning, but the more powerful pattern is: keep the same agent loop exactly as it is, keep the same tool call contract, and plug in external tool servers through MCP.

Our loop needs to:

  1. connect to MCP servers defined in config.json
  2. ask each server which tools it exposes
  3. present those tools to the model alongside the built-ins
  4. dispatch calls back to the right MCP client when the model requests them

This project supports both standard transport types:

  • stdio — the server runs as a child process, communicates over stdin/stdout using newline-delimited JSON-RPC. Tools like the official @modelcontextprotocol/server-filesystem work this way.
  • SSE / Streamable HTTP — the server is an HTTP service. The client first tries the newer Streamable HTTP protocol, and falls back to legacy SSE if that does not work. Used by remote or hosted MCP servers.

The conversion step: MCP tool schema to OpenAI tool schema

js
function mcpToOpenAITool(serverName, tool) {
  return {
    type: 'function',
    function: {
      name: `${serverName}__${tool.name}`,
      description: tool.description || '',
      parameters: tool.inputSchema || { type: 'object', properties: {}, required: [] },
    },
  };
}

The serverName__toolName naming convention is the key design choice. If we have a server called filesystem with a tool called read_file, it becomes filesystem__read_file. This namespacing is how the runtime knows which MCP client to dispatch to when the model calls the tool.

Merging built-in and MCP tools into one registry

js
const internal = bootInternalTools();
const mcp = await bootMcpServers(cfg.mcpServers);

const toolMap = { ...internal.toolMap, ...mcp.toolMap };
const tools = [...internal.tools, ...mcp.tools];

To the model, all tools look the same. To the runtime, each entry in toolMap has a kind field that says whether it is 'internal' or 'mcp', and for MCP tools it holds a reference to the client that can execute it. The loop dispatches through binding.execute(args) regardless — the routing is invisible at the call site.

This is the architectural payoff: built-in tools are the first teaching layer. MCP tools are the scalable layer. The loop itself does not care which one it is calling.

Why this becomes a coding agent

At a high level, a coding agent is just an agentic loop with coding-relevant tools attached. If the toolset includes things like:

  • read file / write file
  • search code (grep, semantic search)
  • run tests or shell commands
  • inspect logs
  • call external development services (GitHub, CI systems, issue trackers)

...then the same loop that would otherwise be a generic assistant starts behaving like a coding agent. It can read code, modify files, run tests, inspect what broke, and iterate — all within a single session.

The loop is not the coding-specific part. The tool ecosystem is.

This is another reason MCP is such a strong idea here. It decouples the loop from the capabilities. The loop stays generic. The capabilities are pluggable by configuration.


Steps summary

The full system in 8 lines

  1. load config.json and parse CLI args
  2. open SQLite, create tables if needed
  3. boot built-in tools, build toolMap and tools list
  4. connect MCP servers from config, discover their tools, merge into the same toolMap and tools list
  5. build initial messages array (system prompt + user message, or load from DB if resuming)
  6. enter the loop: call model → record tokens → append assistant message → execute any tool calls → repeat
  7. close MCP clients, close DB
  8. print the final answer and session ID for resuming

That is a real agentic loop. Not a framework. Not an abstraction. Just a well-structured while-loop with a dispatch table.

The finished implementation

The complete implementation uses a small entrypoint plus a few focused modules:

Reading through src/index.js after following these steps should feel familiar. Every major piece still maps directly to one of the steps above, with helpers moved into small supporting files.

Example config.json with MCP servers

Here is a more interesting config that connects two real tool servers:

json
{
  "model": "google/gemini-3.1-flash-lite-preview",
  "baseURL": "https://openrouter.ai/api/v1",
  "apiKey": "<sk-or-v1- ... YOUR LLM PROVIDER KEY>",
  "maxAgenticLoops": 20,
  "maxTotalInputTokens": 500000,
  "maxTotalOutputTokens": 200000,
  "sqliteDbPath": "./agentic-loop.db",
  "systemPrompt": "You are a helpful assistant. Use available tools when needed.",
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "./"]
    },
    "fetch": {
      "command": "uvx",
      "args": ["mcp-server-fetch"],
      "env": {}
    }
  }
}
  • filesystem is a stdio server that gives the agent read and write access to ./.
  • fetch mcp server gives your agent the ability to make external HTTP calls.

An important point: the loop does not change when you add more servers. Only configuration changes. This means the same agentic code can be extended to perform any number of functions by simply extending it with MCP servers.

⚠ Important: token cost & tool scaling
  1. All tool definitions become part of input tokens, irrespective of how many are actually used, thus adding to costs.
  2. General pattern to tackle this: use a wrapper MCP-server that exposes at least 2 tools. The LLM follows a 2-step approach — first, use the exposed search tool to find the needed tool; second, use a call tool to actually run it with the required arguments.
  3. For learning how such a wrapper MCP server might work, refer to one-mcp — another open-source project by the same author.

How to run it

Start a new session
$ node src/index.js "Read README.md and explain what this project does."
Resume a previous session (session ID is printed at program end)
$ node src/index.js --resume <session_id> "Continue from where you left off and now, <YOUR CONTINUATION PROMPT>."
Use a different config file
$ node src/index.js --config ./my-config.json "What tools are available to you?"

Live demo recording

asciinema recording

Final takeaway

If you strip away all the noise, an agentic loop is not mysterious. It is just a cycle:

  • send state
  • let the model choose actions
  • execute those actions outside the model
  • append results
  • repeat until no-tool-call response from LLM

The MCP integration is what turns the same loop into a genuinely capable system. Plug in a filesystem server and it can read and write files. Plug in a GitHub server and it can inspect issues and pull requests. Plug in a database server and it can query your schema. The loop does not change. The capabilities grow by configuration.

Similarly, you can write your own tools — either using the MCP SDK and exposing them through stdio/SSE/HTTP, or just hardcoding them into your agentic code. If you are just testing or playing around, prefer hardcoding for simplicity. But for any production-level setup, go the MCP route.