Guide

Build a GPT agent

A step-by-step tutorial: tokenize text, train a small GPT-style transformer, generate text, and wire it into an agent with tools and memory using serez-agentai.

What you'll learn: character tokenization, building and training a GPTModel, sampling strategies, and assembling an Agent with a tool registry and episodic memory.

Step 1 — Set up

mkdir mini-gpt
cd mini-gpt
sz init --y
sz install serez-agentai

serez-agentai builds on serez-ai (pulled in automatically). sz init --y also created a serez.json with a dev script, so we'll run with sz run dev below. Create index.sz:

import "serez-agentai"

Random.seed(42)

Step 2 — Tokenize

A model works on token IDs, not text. CharTokenizer maps each character to an integer. Build the vocabulary from your training corpus:

let corpus = "hello world. hello serez. hello agents."

let tok = new CharTokenizer()
tok.buildVocab(corpus)

let ids = tok.encode("hello")
out "tokens: {ids}"               // [1, ...chars..., 2]  (BOS … EOS)
out "vocab size: {tok.vocab_size}"
out "decoded: {tok.decode(ids)}"  // "hello"

IDs 0–3 are reserved: [PAD], [BOS], [EOS], [UNK]. encode wraps text with BOS/EOS automatically.

Step 3 — Build the model

GPTModel is a decoder-only transformer with causal masking, positional encoding, and a linear output head. Keep it tiny for a quick demo:

let model = new GPTModel(
    tok.vocab_size,  // vocab_size — output classes
    64,              // d_model    — embedding width
    4,               // n_heads    — attention heads
    128,             // d_ff       — feed-forward inner width
    2,               // n_layers   — stacked transformer blocks
    64               // max_seq    — longest sequence
)

Step 4 — Train

Language modeling means predicting the next token. The target for position i is the token at position i+1. Each step records a tape, computes cross-entropy, backpropagates, and updates the weights:

let seq = tok.encode(corpus)

// inputs = all but last token, targets = all but first (shifted by one)
let inputs  = []
let targets = []
let i = 0
while (i < seq.length() - 1) {
    inputs.push(seq[i])
    targets.push(seq[i + 1])
    i = i + 1
}

let n = inputs.length()
let epoch = 0
while (epoch < 300) {
    Autodiff.tape()
    let logits = model.forward(inputs)
    let loss   = Autodiff.crossEntropyLoss(logits, targets, n, tok.vocab_size)
    Autodiff.backward(loss)
    model.update(0.01)
    if (epoch % 50 == 0) { out "epoch {epoch} — loss {loss.get(0)}" }
    epoch = epoch + 1
}

Tip: for larger corpora, use a WarmupCosineScheduler and pass sched.step(epoch) as the learning rate to model.update().

Step 5 — Generate text

generate runs the model autoregressively and decodes the result. The strategy controls randomness — start with "greedy" (deterministic), then try sampling:

// generate(model, tokenizer, prompt, max_tokens, strategy, temperature, k, p)
let greedy_out = generate(model, tok, "hello", 20, "greedy", 1.0, 40, 0.9)
out "greedy: {greedy_out}"

let sampled = generate(model, tok, "hello", 20, "topp", 0.8, 40, 0.9)
out "top-p:  {sampled}"

strategy	Behavior
`"greedy"`	Always the highest-probability token — deterministic
`"temperature"`	Random, scaled by temperature (higher = wilder)
`"topk"`	Sample from the k most likely tokens
`"topp"`	Nucleus sampling — smallest set of tokens summing to p

Step 6 — Add tools

An agent can call functions. Register tools in a ToolRegistry; each tool is a name, a description, and a handler that receives the raw argument string:

let tools = new ToolRegistry()

tools.register(new Tool("weather", "Get weather for a city",
    fn(args) { return "Sunny, 22C in " + args }
))
tools.register(new Tool("echo", "Echo the input back",
    fn(args) { return args }
))

out tools.describe()
out tools.callByName("weather", "Paris")   // "Sunny, 22C in Paris"

Step 7 — Add memory and assemble the agent

EpisodicMemory remembers past exchanges. The Agent ties together the model, tokenizer, tools, and memory into a perception → reasoning → action → observation loop:

let mem = new EpisodicMemory(100)

// config is a typed dict — build it with ({"key", value}) entries
let cfg <string, any> = ({"max_turns", 3}, {"max_tokens", 40}, {"strategy", "greedy"}, {"temperature", 1.0})

let agent = new Agent(model, tok, tools, mem, cfg)

let answer = agent.run("hello")
out "agent: {answer}"

// memory persists across turns; reset() clears the conversation
agent.reset()

When the model emits a [TOOL:name|arg] marker, the agent calls that tool, appends the result as an observation, and keeps going. Otherwise it returns the response and stores it in memory.

Run it

Run through the dev script from serez.json:

sz run dev

Reality check:a model this small trained on a few sentences won't produce coherent prose — the goal here is to see the full pipeline working end to end. Scale d_model, n_layers, and the corpus for real results.

Next steps

Use AgentDataLoader for shuffled mini-batch training.
Add KVCache to speed up generation.
See the serez-agentai reference for the full API.