Live Coding Showcase
From an empty repo to a published policy brief in one session
On April 16, 2026, we ran a live coding showcase at the Kiel Institute AI Bootcamp. The goal: build a complete policy brief on China’s zero-tariff policy for African countries — from pulling data, through general equilibrium modeling, to a LaTeX document on Overleaf — using Claude Code as the primary tool.
This page documents what we did, how we did it, and what researchers at different experience levels can take away from it. Terminal logs are included throughout to show what the interaction actually looked like.
This demo uses R, but the workflow patterns — planning, subagents, spec review, multi-session state — apply identically to Python, Stata, or any language Claude Code supports. - How a real research workflow looks when driven by an AI coding agent - The planning-first approach: design spec, then implementation plan, then code - How to use a general equilibrium trade model from R - How to generate publication-ready figures with ggplot2 - How to write a policy brief and compile it with LaTeX - How to use AI-generated “expert reviewers” to critique your own work
The Setup
Starting point: A nearly empty GitHub repository with one R script from a collaborator producing a single scatter plot, plus raw BACI trade data and CEPII gravity data.
Ending point (90 minutes later):
- 4 numbered R scripts forming a complete analysis pipeline
- 9 publication-ready figures (descriptives + GE model results + map)
- A 14-page policy brief compiled on Overleaf
- 5 AI-generated expert reviews with a synthesis document
- Everything committed and pushed to GitHub
Tools used:
| Tool | Purpose |
|---|---|
| Claude Code (Opus 4.6, 1M context) | AI coding agent driving the entire workflow |
| R + data.table + ggplot2 | Data analysis and visualization |
| KITE package | General equilibrium trade model |
| LaTeX (Overleaf via Dropbox) | Policy brief typesetting |
| Git + GitHub | Version control |
| Subagent architecture | Parallel task execution and multi-perspective review |
Phase 1: Research and Planning
Pulling the latest code
We started by pulling the collaborator’s latest changes. A small hiccup: git-lfs wasn’t installed, so Claude diagnosed the error, installed the dependency, and retried:
$ claude
> pull the latest changes from paula
Claude ran git remote -v
⎿ origin https://github.com/[...]/KITE-PB-China-Africa-Tariffs.git
⚠ There's no remote called "paula". Only origin exists.
? What is the URL for Paula's repository?
> just pull from main
Claude ran git pull origin main
⎿ error: git-lfs filter-process: git-lfs: command not found
fatal: the remote end hung up unexpectedly
Claude ran brew install git-lfs
⎿ 🍺 /opt/homebrew/Cellar/git-lfs/3.7.1: 82 files, 14MB
Claude ran git lfs install && git pull origin main
⎿ Git LFS initialized.
Merge made by the 'ort' strategy.
19 files changed, 6190 insertions(+)
When git pull failed because of missing git-lfs, Claude diagnosed the error, installed the dependency, and retried — all in one flow. You don’t need to know the fix — but you do need to read the error and approve the action.
Parallel research
We asked Claude to build a policy brief on China’s zero-tariff policy for African countries. Before writing any code, it dispatched two subagents in parallel:
> I need to write a policy brief on the potential impact of China
dropping its tariffs on imports from African countries.
◐ Agent "Explore KITE-PB repos and styleguide" is running...
◐ Agent "Research China-Africa tariff news" is running...
One agent explored the codebase (existing policy brief projects for India, Hormuz, Ukraine), the other searched the web. This is a key pattern: use subagents for independent research tasks so you don’t wait sequentially.
The research agent came back with a comprehensive timeline:
- September 2024: FOCAC summit — Xi announces zero tariffs for 33 African LDCs
- December 2024: LDC zero-tariff measure takes effect
- February 2026: Extended to all 53 African countries (AU Summit)
- May 1, 2026: Full implementation (two weeks away at time of showcase)
- Context: US “Liberation Day” tariffs (10–50% on Africa), AGOA expired September 2025
The codebase exploration agent found the established pattern across existing projects:
- Consistent pipeline:
scenarios.R→run.R→outputs.R - Model invocation via
update_equilibrium(model = caliendo_parro_2015, ...) - Tariff convention:
1 + ad_valorem_rate(so zero tariff =1.0, not0) - A detailed project styleguide defining coding conventions
Interactive design decisions
Rather than making assumptions, Claude asked structured questions:
? Which KITE model should we use for the GE analysis?
● caliendo_parro_2015 (Recommended)
○ hinz_mahlkow_sogalla_2026
? What scenarios should the policy brief compare?
● China-only (Recommended) — S1: China removes tariffs. S2: Same + US tariffs.
○ Broader trade war
○ Phased rollout
? What descriptive figures should we produce?
● Comprehensive set (Recommended) — 4 charts + scatter plot
○ Minimal
○ Comparative focus
? What time horizons should we model?
○ Short-run + Long-run
● Long-run only
○ Three horizons
Claude Code can present multiple-choice questions with trade-offs explained. This is more efficient than open-ended prompting. You make the research judgment calls; the tool handles the implementation.
The design spec
Claude wrote a formal design spec covering the pipeline structure, scenario design, figure descriptions, data requirements, and coding conventions. Then it dispatched a spec reviewer subagent that caught critical issues:
◐ Agent "Review spec document" is running...
Spec reviewer found 3 critical issues:
1. TARIFF VALUE SEMANTICS: The spec says value = 0 for zero tariff,
but GTAP stores 1 + ad_valorem_rate. Zero tariff = 1.0.
Setting to 0 would mean a -100% tariff (a subsidy!).
2. MISSING copy() PATTERN: In R's data.table, modifying by reference
can corrupt shared objects. Every scenario must use
copy(initial_conditions$tariff).
3. GTAP REGIONAL AGGREGATION: Many small African countries are
aggregated into composite regions (XNF, XWF, etc.). The scenario
code needs GTAP region codes, not raw ISO3 codes.
Verdict: REVISE BEFORE IMPLEMENTATION.
These are exactly the kinds of bugs that would cause silent wrong results if caught later (or not at all).
The implementation plan
After the spec was approved, Claude wrote a detailed implementation plan with 5 tasks, each broken into numbered steps with complete code. A plan reviewer then checked:
- Does
XCF(Rest of Central Africa) actually exist in GTAP11? → No. Removed. - Should
XSC(Rest of SACU, includes Eswatini) be included? → Yes, with a footnote. - Does
process_results()exist in the KITE package? → Yes. The raw output needs processing before welfare data is available.
Phase 2: Implementation
We chose subagent-driven development: Claude dispatches a fresh subagent for each task, reviews results between tasks, and course-corrects.
Task 0: Repository setup
A fast subagent created the project infrastructure — .gitignore, Makefile, directory structure — and copied the GTAP11 initial conditions from an existing project:
◐ Agent "Implement Task 0: repo setup" is running...
✓ Created .gitignore (ignoring /input, /output, /temp)
✓ Created Makefile with targets: descriptives, scenarios, run, outputs, clean
✓ Created directories: input/, temp/scenarios/, temp/simulations/, output/figures/
✓ Copied initial_conditions_GTAP11_241129.rds to input/
✓ Removed old script (incorporated into 01-descriptives.R)
✓ Committed
Task 1: Descriptive figures (01-descriptives.R)
A subagent wrote the full script and ran it. Five figures produced:
- Top 15 African exporters to China — South Africa ($29B), Angola ($19B), DRC ($14B) dominate
- Sectoral composition — 60.7% mineral products, 16.4% base metals, 14.0% precious metals
- Trade trend 2022–2024 — stable around $103–110 billion
- China vs. US as destination — China dwarfs the US for nearly every African exporter
- GDP vs. imports scatter — refactored from the collaborator’s original script
◐ Agent "Implement Task 1: descriptives" is running...
Claude ran Rscript code/01-descriptives.R
⎿ Reading BACI 2022...
Reading BACI 2023...
Reading BACI 2024...
Reading Gravity...
✓ 5 PNG files + 5 PDF files generated in output/figures/
✓ Committed as "Add descriptive figures script (BACI HS22)"
After each figure was generated, we visually inspected it by reading the PNG directly — Claude Code can display images inline. When the China-vs-US chart had a cut-off title, we widened it from 16cm to 20cm and re-rendered.
Key patterns worth noting:
- Packages loaded via
pacman::p_load()(auto-installs if missing) - Data manipulation uses
data.tablewithmagrittrpipes (e.g.,dt[filter] %>% .[, .(x = sum(y)), by = group]) - Figures saved in both PNG (for web) and PDF (for LaTeX), with
rm()cleanup after each save - Custom theme
theme_kiel()ensures all figures match the Kiel Institute background color (#F5F1E7)
Task 2: Scenario definition (02-scenarios.R)
The scenario script defines 42 African GTAP regions (37 individual countries + 5 regional aggregates) and constructs two scenarios — baseline and tariff removal:
◐ Agent "Implement Task 2: scenarios" is running...
Claude ran Rscript code/02-scenarios.R
⎿ African GTAP regions: 42
Tariff rows (Africa -> China): 2,730
Non-zero tariff rows: 339
Tariff lines set to zero: 339
Scenarios saved: baseline, S1_china_zero
copy() trap
In R’s data.table, assignment by reference means a <- b does NOT create an independent copy. If you then modify a, you also modify b. Every scenario must use copy(initial_conditions$tariff) to avoid corrupting the shared baseline. The spec reviewer caught this before any code was written.
The tariff convention is also a trap: GTAP stores tariffs as 1 + ad_valorem_rate. So zero tariff = 1.0, not 0. Setting to 0 would model a -100% tariff — effectively a subsidy. Again, the spec reviewer caught this.
Task 3: KITE model execution (03-run.R)
The model runs both scenarios through the Caliendo-Parro (2015) general equilibrium framework:
◐ Agent "Implement Task 3: KITE model run" is running...
Claude ran Rscript code/03-run.R
⎿
========================================
Running scenario: baseline
========================================
Converged in 1 iteration.
Saved: temp/simulations/260416_china_africa_baseline.rds
========================================
Running scenario: S1_china_zero
========================================
Iteration 1: criterion = 0.0423
Iteration 10: criterion = 0.00147
Iteration 20: criterion = 0.000089
Iteration 45: criterion = 0.0000018
Converged in 45 iterations.
Saved: temp/simulations/260416_china_africa_S1_china_zero.rds
All scenarios complete.
After the model ran, we inspected the output structure to discover that process_results() is required before welfare data becomes available, and that welfare values are ratios (~1.0), not percentages. Percent change = (value - 1) * 100.
Task 4: GE output figures (04-outputs.R)
Three figures plus a welfare map:
◐ Agent "Implement Task 4: GE outputs" is running...
Claude ran Rscript code/04-outputs.R
⎿
=== Headline Results ===
African countries - mean welfare change: 0.0915 %
African countries - median welfare change: 0.0159 %
China - welfare change: -0.0081 %
Top 5 African beneficiaries:
COD +1.8225%
CIV +0.3448%
COG +0.3047%
NAM +0.2575%
BEN +0.2254%
Tables saved to output/tables/
✓ 4 PNG + 4 PDF figures generated
✓ 2 CSV welfare tables generated
✓ Committed
The map was added on the fly, after the initial plan, following the pattern from an existing policy brief project:
- Uses the
sfpackage with a Robinson projection - Choropleth color scale: green (gains) to red (losses), capped at +/-2%
- DRC stands out clearly as the largest beneficiary
Key results
| Country | Welfare change | Mechanism |
|---|---|---|
| DRC | +1.82% | Cobalt/copper tariffs removed; mining-concentrated economy |
| Cote d’Ivoire | +0.34% | Agricultural products gain access |
| Congo | +0.30% | Oil and minerals |
| Namibia | +0.26% | Diversified minerals |
| South Africa | +0.19% | Broad export basket |
| Malawi | -0.03% | Trade diversion (was already duty-free) |
| China | -0.008% | Negligible tariff revenue loss |
Average African welfare gain: 0.09%. Modest — because 94.5% of exports were already duty-free.
Phase 3: The Policy Brief
Writing on Overleaf
The Overleaf project syncs via Dropbox. Claude created the directory, copied all PDF figures, and wrote a complete 14-page LaTeX document following the template from an existing Kiel Institute policy brief:
Claude ran mkdir -p .../Overleaf/PB China Africa Zero Tariffs/figures
Claude ran cp output/figures/260416_*.pdf .../Overleaf/.../figures/
✓ Created policy_brief.tex (246 lines)
✓ Created policy_brief.bib (3 references)
✓ Copied 9 PDF figures
Claude ran pdflatex policy_brief.tex && bibtex policy_brief && pdflatex ...
⎿ Output written on policy_brief.pdf (14 pages, 737507 bytes).
The brief has 7 sections: Overview, Introduction (FOCAC, AGOA, geopolitics), Descriptive Analysis (4 figures), KITE Simulations (welfare + trade creation), Discussion (opportunities, limits, risks), Policy Implications, Conclusion.
The background color bug
When we compiled the PDF, the figure backgrounds didn’t match the Kiel beige page color. The issue: each ggplot explicitly called theme_minimal(), which overrode the global theme settings. We verified programmatically:
Claude ran python3 -c "from PIL import Image; ..."
⎿ 260416_top_exporters.png: RGB(255, 255, 255) - WRONG
260416_sectoral_composition.png: RGB(255, 255, 255) - WRONG
260416_welfare_map.png: RGB(245, 241, 231) - OK
...
Fix: defined theme_kiel() wrapping theme_minimal() with Kiel bg,
replaced all theme_minimal() calls, regenerated all figures.
Claude ran python3 -c "from PIL import Image; ..."
⎿ 260416_top_exporters.png: RGB(245, 241, 231) - OK
260416_sectoral_composition.png: RGB(245, 241, 231) - OK
260416_trade_creation.png: RGB(245, 241, 231) - OK
...
All 9/9 figures verified ✓
The figures looked right in an earlier render, but a pixel-level check revealed white backgrounds. When appearance matters for publication, automated verification catches what the eye misses.
Phase 4: AI-Assisted Peer Review
We used Claude Code’s multi-agent architecture to generate 5 reviewer personas with different backgrounds:
| Reviewer | Affiliation | Lens |
|---|---|---|
| Prof. Karanja | Nairobi / AERC | African trade realities on the ground |
| Prof. Nicoletti | Bocconi | GE modeling rigor and scenario design |
| Dr. Voss | BMWK Berlin | German/EU policy relevance |
| Dr. Okonkwo | Brookings Africa | Actionable policy recommendations |
| Tariq El-Mansouri | LSE MSc | Can a non-specialist follow this? |
All 5 ran as parallel subagents. Each read the full LaTeX source and wrote a structured review. The synthesis identified consensus issues (flagged by 3+ reviewers):
- Trade creation percentages need absolute dollar values alongside them
- The AfCFTA interaction is completely absent
- Rules of Origin aren’t discussed
- GTAP sector codes (
wol,pfb,sgr) are unintelligible to policymakers - The DRC’s 1.82% result needs sensitivity analysis
And perspective-specific insights that only one reviewer type would catch:
- Nicoletti (methods): trade creation may be computed on trade shares, not trade flows — potentially the wrong metric entirely
- Voss (BMWK): the entire EU dimension is missing — the Kiel Institute’s main audience gets no policy recommendations
- Karanja (Africa): Africa is presented as a passive recipient, not a strategic actor
What We Learned
For researchers new to AI coding tools
- You don’t need to know how to code to direct a coding agent. You need to know what you want, what the data looks like, and how to evaluate output.
- Visual inspection matters. We caught a cut-off title, wrong background colors, and a dubious figure by looking at the output. Don’t trust — verify.
- The planning step is not overhead. The spec and plan caught 3 critical bugs (tariff convention,
copy()trap, wrong GTAP regions) before any code was written.
For researchers who already code
- Subagents parallelize independent work. Research + codebase exploration ran simultaneously. Five reviewers ran simultaneously. Don’t serialize what can be parallelized.
- Pattern reuse across projects is powerful. Claude explored 3 existing projects and extracted the common pattern (scenarios → run → outputs). Your old projects are templates for new ones.
- The AI makes mistakes that matter. Setting tariffs to
0instead of1.0would have been silently wrong. Thecopy()issue would have corrupted the baseline. Code review — even by another AI agent — catches these.
For everyone
- The ratio of thinking to typing has shifted. Most of our time was spent deciding what to analyze, which scenarios to run, and whether the results made sense. Almost none was spent writing code.
- A policy brief is not just code output. The writing, framing, and editorial judgment are still human. Claude wrote the LaTeX, but the argument structure, the geopolitical framing, and the decision to lead with “this is mostly symbolic” came from research judgment.
- AI-generated peer review is surprisingly useful. Five diverse personas found issues that a single reviewer might miss. The BMWK adviser’s critique (“where is the EU?”) and the methodologist’s concern about trade shares vs. flows were genuinely valuable feedback.
The Full Pipeline
1. Pull collaborator's code + data
2. Research: news + existing repo patterns (parallel)
3. Design spec → spec review → fix
4. Implementation plan → plan review → fix
5. Task 0: Repo setup (.gitignore, Makefile, data)
6. Task 1: Descriptive figures (BACI HS22) → 5 figs
7. Task 2: Scenario definition (GTAP11) → RDS
8. Task 3: KITE model run (Caliendo-Parro 2015) → RDS
9. Task 4: GE output figures → 4 figs
10. Write policy brief (LaTeX on Overleaf) → PDF
11. Fix figure backgrounds (theme_kiel)
12. Expert review (5 parallel personas) → synthesis
13. Commit and push
See Also
- Writing a CLAUDE.md — the most important file in any Claude Code project
- Advanced Track — try building your own project in 40 minutes
- Context Windows — why sessions should be short and state should live in files
- claudeblattman.com — a non-coder’s complete AI workflow system