Skip to content

examples/ — Runnable hands-on exercises

← Back to main path README

Every stage in the learning roadmap has a "Hands-on Exercises" section that tells you what to do. This folder adds the actual runnable starter code — copy → install deps → python starter.py → see expected output.

Directory layout

examples/
├── stage-3/                     # Tool Use & Agent intro
│   ├── 03-react-from-scratch/   # Exercise 3: ReAct from scratch
│   │   ├── starter.py           # Main program (~70 LOC runnable)
│   │   ├── test.py              # Self-check (pure assert, no pytest)
│   │   ├── README.md            # 200-400-word walkthrough (+.zh-Hans.md +.en.md)
│   │   └── requirements.txt     # Pinned deps
│   └── ...
├── stage-1/
└── ...

Short exercises (≤30 LOC) stay inline as <details> blocks in the stage doc — no folder. Longer ones (>30 LOC) get their own folder so stage docs don't get bloated by code blocks.

How to run any example

bash
cd examples/stage-3/03-react-from-scratch
pip install -r requirements.txt
export ANTHROPIC_API_KEY=sk-ant-...   # Each example header lists the key it needs
python starter.py                     # Hits the real API to see output (~$0.001 in credits)
python test.py                        # Runs validation (mock-based, free)

Design rules

DimensionRule
Program lengthstarter ≤80 LOC, split if longer
Dependenciesstdlib + ≤2 pip packages, pinned versions
TestsPlain assert, no pytest; reader runs python test.py to see ✅
CommentsChinese (zh-TW primary), English variable / function names
Self-checkEvery starter.py ends with a # === Self-check === block
Environment varsHeader comment must list required keys
Free-tier friendlyUse the cheapest model (claude-haiku / Ollama); note how to switch to Sonnet
Windows encodingEvery .py must reconfigure stdout to UTF-8 (see below)

Windows cp950 encoding fix (mandatory in every starter.py / test.py)

Windows consoles default to cp950 (Big5) and can't print emoji or non-Big5 Chinese. Add this right after imports in every .py:

python
import sys
if hasattr(sys.stdout, "reconfigure"):
    sys.stdout.reconfigure(encoding="utf-8", errors="replace")

Without it, Windows readers running in PowerShell / cmd hit UnicodeEncodeError: 'cp950' codec can't encode character '✅'.

Three paths — default is Ollama (cost-driven)

💰 Why default to Ollama? Running 1000 practice iterations on Sonnet costs ~$4; on haiku ~$0.25; on local Ollama $0. API cost should not block learning. Reserve cloud LLMs for "want to see high-quality answers / production deployment".

Every exercise ships with all three paths:

  • Default starter.py / first inline <details> block uses a local model
  • Requires Ollama; pull a model based on the stage:
    • Stage 1 + 2 (plain chat / prompt eng): ollama pull gemma4:e4b (~7.5 GB; multimodal (text + image + audio); CPU-friendly)
    • Stage 3+ (tool use / agent): ollama pull qwen2.5:3b (1.9 GB; reliable tool-use support)
  • $0, offline, fine for privacy-sensitive data
  • SDK uses the openai package (OpenAI-compatible API) with base_url="http://localhost:11434/v1"
  • Best for: all readers (this is the default recommendation)

Path B (optional) — Anthropic API (when you want cloud quality)

  • Companion starter_anthropic.py (folder) or the second inline <details> block
  • Requires ANTHROPIC_API_KEY; ~$0.001 per run (haiku) / ~$0.004 (sonnet)
  • Higher answer quality and lower latency than local 3-4B Ollama models
  • Best for: production-quality demands, long-context work, the Stage 7 production tier

Path C (verify logic, no API call)

  • Every test.py uses unittest.mock; python test.py validates code logic without spending
  • Complements A / B — mock first, then real call

Trade-offs

DimensionA Ollama (default)B AnthropicC Mock
Cost per call$0~$0.001-0.004$0
RequiresOllama installAPI keynothing
Answer qualitymedium (3-4B model)highcanned, unrepresentative
Speed5-30 s/call (no GPU)~1-3 s/call<0.1 s
Offline
Privacy-sensitive data
Stage 3+ tool use✅ (qwen2.5 / llama3.2)
Best fordefault, no budget pressureproduction upgradelogic verification

Recommended flow: C first (validate logic, no cost), then A (see real model behaviour locally), then B at the Stage 7 production stage if cloud quality is needed.

Local + cloud, user-perspective.
💡 You don't need to install every model — this table shows "which to use for practice" and "which to upgrade to for production". Claude is the canonical / production reference; Ollama is the practice default.

Local LLMs (practice default, via Ollama)

ModelDownloadRecommended RAMStageTool-useSpeed (CPU/GPU)Primary use
gemma4:e4b7.5 GB8 GB1+2basicslow / medStage 1-2 plain chat / prompt eng (default)
qwen2.5:3b1.9 GB4 GB3+reliablemed / fastStage 3+ tool use / agent (default)
llama3.2:3b2.0 GB4 GB3+reliablemed / fastqwen2.5:3b alternative
mistral-nemo:12b7.1 GB16 GB3+strongslow / medWhen you want closer-to-cloud quality
qwen2.5:14b9.0 GB16 GBadvancedstrongslow / medLarger-model comparison (GPU preferred)
gemma4:e2b4.0 GB4 GB1+2basicmed / fast4 GB-RAM-machine alternative

Install: ollama pull <model> + ollama serve. Hardware tuning details: resources/cli-agents-guide.en.md.

Cloud LLMs (canonical / production stack, via Anthropic)

Model$/1M input$/1M outputContextPrimary use
claude-haiku-4-5$1$5200kCheapest; fine for Stage 1-7 cloud-quality comparisons
claude-sonnet-4-6$3$151MProduction default; Stage 5+ agent development
claude-opus-4-7$5$251MHighest quality; complex reasoning / long-context refactors

Subscription alternative: Claude Pro $20/month (includes Sonnet usage); Claude Max $100/month (includes Opus). Details: resources/cli-agents-guide.en.md.

Cloud LLM Chinese / open-source alternatives (region limits / budget / Chinese-language scenarios)

Can't or don't want to use Anthropic? These APIs are all OpenAI-compatible — change base_url and model name to run the same exercises.

ProviderMain model$/1M input$/1M outputOpenAI-compat?Key selling point
DeepSeekdeepseek-chat (V3)$0.27$1.10Cheapest cloud (4× cheaper than haiku $1/$5); strong CN & EN; free web at chat.deepseek.com
DeepSeek R1deepseek-reasoner$0.55$2.19Reasoning model (o1-class), still 1/30 the price of OpenAI o1
Moonshot Kimikimi-k2-turbo-preview$5-10$15-301M-token context (key selling point); good for large files / long conversations. Free web at kimi.com
Qwen (Alibaba)qwen-max / qwen-turbo$0.50-1.50$1.50-6✅ (DashScope)Native Chinese; same models also run locally via Ollama (cloud + local both work)
GLM (ZhipuAI)glm-4.5 / glm-4-plus$0.30-2$1.50-9China-native, has free tier. Free web chatglm.cn
NVIDIA NIMLlama / Mistral / DeepSeek / Qwen etc. hostedfree tier 1000 credits(same)Hosts 10+ open models; new accounts get credits; no local GPU needed. build.nvidia.com

API endpoints (OpenAI SDK usage):

python
# DeepSeek
client = OpenAI(api_key=os.environ["DEEPSEEK_API_KEY"], base_url="https://api.deepseek.com/v1")
r = client.chat.completions.create(model="deepseek-chat", messages=[...])

# Moonshot Kimi (China endpoint; international uses .ai)
client = OpenAI(api_key=os.environ["MOONSHOT_API_KEY"], base_url="https://api.moonshot.cn/v1")
r = client.chat.completions.create(model="kimi-k2-turbo-preview", messages=[...])

# Qwen (Alibaba DashScope)
client = OpenAI(api_key=os.environ["DASHSCOPE_API_KEY"],
                base_url="https://dashscope.aliyuncs.com/compatible-mode/v1")
r = client.chat.completions.create(model="qwen-turbo", messages=[...])

# GLM (ZhipuAI)
client = OpenAI(api_key=os.environ["ZHIPUAI_API_KEY"], base_url="https://open.bigmodel.cn/api/paas/v4")
r = client.chat.completions.create(model="glm-4.5-flash", messages=[...])

# NVIDIA NIM (hosted open-source)
client = OpenAI(api_key=os.environ["NVIDIA_API_KEY"], base_url="https://integrate.api.nvidia.com/v1")
r = client.chat.completions.create(model="meta/llama-3.3-70b-instruct", messages=[...])

How to pick:

ScenarioPickWhy
Mainland China, no cloud accessOllama local / DeepSeek APILocal is free; DeepSeek has an in-China endpoint
Tight budget (< $1/month)DeepSeek API4× cheaper than haiku; quality close
Large files / long-doc RAGMoonshot Kimi1M-token context
Chinese-native task (classical Chinese, CN search)Qwen / GLMHigher Chinese training corpus ratio
Want to try 10+ open models without GPUNVIDIA NIMOne key, play with Llama / Mixtral / Qwen / DeepSeek
Production agent (tool use)Anthropic Claude (canonical)This repo's Path B default; tool calling most reliable

Budget estimate (completing all 54 exercises across Stage 1-7)

Learning pathTotal timeTotal costBest for
All local Ollama~30 hr (CPU) / ~10 hr (GPU)$0Budget-conscious, privacy needs, China-mainland no-cloud-access
Mixed: local practice + haiku final review~30 hr$2-5Recommended default — practice locally, run final 1-2 iterations on haiku to see cloud quality
All haiku~10 hr$5-15Want speed, budget allows, want full cloud experience
All sonnet~8 hr$20-50Production-grade practice, want high-quality answers
Mixed: sonnet + opus on hard problems~8 hr$30-80Already a production agent developer

🎯 Beginner default: run everything locally first; cap budget at $5. Only consider upgrading to sonnet at the Stage 7 production tier.

Index by stage

StageExercisesExample location
1 LLM basics6inline 4 + folder 2 (examples/stage-1/)
2 Prompt engineering4all inline
3 Tool use6inline 1 + folder 5 (examples/stage-3/)
4 Frameworks5all folder (examples/stage-4/)
5 Claude Code ecosystem11inline 6 + folder 5 (examples/stage-5/)
6 Memory/RAG5all folder (examples/stage-6/)
7 Multi-agent5inline 1 + folder 4 (examples/stage-7/)
Track A1-A312all inline + 2 small folders (CLI-9 / CLI-10)

→ T1 scope: Stage 3 全 6 exercises only (remaining stages roll out per plan tiers).

Contributing / reporting issues

If something doesn't run, output doesn't match expectations, or you want to add a new example:

  • File an issue tagged examples
  • Or open a PR following the "Design rules" table above

Why this split (instead of stuffing everything into stage docs)

  1. Stage docs stay readable — roadmap readers don't always want code, they want concepts; long code blocks break that
  2. Examples evolve independently — SDK bumps, model rename, example needs its own commit without polluting the roadmap's git log
  3. Readers can clone one examplesvn export or git clone --filter=tree:0 grabs a single folder
  4. Future CI — example failures shouldn't block mdbook deploy; this split lets CI run examples conditionally