Live Coding Showcase

From an empty repo to a published policy brief in one session

On April 16, 2026, we ran a live coding showcase at the Kiel Institute AI Bootcamp. The goal: build a complete policy brief on China’s zero-tariff policy for African countries — from pulling data, through general equilibrium modeling, to a LaTeX document on Overleaf — using Claude Code as the primary tool.

This page documents what we did, how we did it, and what researchers at different experience levels can take away from it. Terminal logs are included throughout to show what the interaction actually looked like.

NoteWhat you’ll learn from reading this (~20 min read)

This demo uses R, but the workflow patterns — planning, subagents, spec review, multi-session state — apply identically to Python, Stata, or any language Claude Code supports. - How a real research workflow looks when driven by an AI coding agent - The planning-first approach: design spec, then implementation plan, then code - How to use a general equilibrium trade model from R - How to generate publication-ready figures with ggplot2 - How to write a policy brief and compile it with LaTeX - How to use AI-generated “expert reviewers” to critique your own work


The Setup

Starting point: A nearly empty GitHub repository with one R script from a collaborator producing a single scatter plot, plus raw BACI trade data and CEPII gravity data.

Ending point (90 minutes later):

  • 4 numbered R scripts forming a complete analysis pipeline
  • 9 publication-ready figures (descriptives + GE model results + map)
  • A 14-page policy brief compiled on Overleaf
  • 5 AI-generated expert reviews with a synthesis document
  • Everything committed and pushed to GitHub

Tools used:

Tool Purpose
Claude Code (Opus 4.6, 1M context) AI coding agent driving the entire workflow
R + data.table + ggplot2 Data analysis and visualization
KITE package General equilibrium trade model
LaTeX (Overleaf via Dropbox) Policy brief typesetting
Git + GitHub Version control
Subagent architecture Parallel task execution and multi-perspective review

Phase 1: Research and Planning

Pulling the latest code

We started by pulling the collaborator’s latest changes. A small hiccup: git-lfs wasn’t installed, so Claude diagnosed the error, installed the dependency, and retried:

$ claude
> pull the latest changes from paula

  Claude ran  git remote -v
  ⎿  origin  https://github.com/[...]/KITE-PB-China-Africa-Tariffs.git

  ⚠ There's no remote called "paula". Only origin exists.
  ? What is the URL for Paula's repository?
    > just pull from main

  Claude ran  git pull origin main
  ⎿  error: git-lfs filter-process: git-lfs: command not found
     fatal: the remote end hung up unexpectedly

  Claude ran  brew install git-lfs
  ⎿  🍺 /opt/homebrew/Cellar/git-lfs/3.7.1: 82 files, 14MB

  Claude ran  git lfs install && git pull origin main
  ⎿  Git LFS initialized.
     Merge made by the 'ort' strategy.
     19 files changed, 6190 insertions(+)
TipLesson: Claude handles tooling issues

When git pull failed because of missing git-lfs, Claude diagnosed the error, installed the dependency, and retried — all in one flow. You don’t need to know the fix — but you do need to read the error and approve the action.

Parallel research

We asked Claude to build a policy brief on China’s zero-tariff policy for African countries. Before writing any code, it dispatched two subagents in parallel:

> I need to write a policy brief on the potential impact of China
  dropping its tariffs on imports from African countries.

  ◐ Agent "Explore KITE-PB repos and styleguide" is running...
  ◐ Agent "Research China-Africa tariff news" is running...

One agent explored the codebase (existing policy brief projects for India, Hormuz, Ukraine), the other searched the web. This is a key pattern: use subagents for independent research tasks so you don’t wait sequentially.

The research agent came back with a comprehensive timeline:

  • September 2024: FOCAC summit — Xi announces zero tariffs for 33 African LDCs
  • December 2024: LDC zero-tariff measure takes effect
  • February 2026: Extended to all 53 African countries (AU Summit)
  • May 1, 2026: Full implementation (two weeks away at time of showcase)
  • Context: US “Liberation Day” tariffs (10–50% on Africa), AGOA expired September 2025

The codebase exploration agent found the established pattern across existing projects:

  • Consistent pipeline: scenarios.Rrun.Routputs.R
  • Model invocation via update_equilibrium(model = caliendo_parro_2015, ...)
  • Tariff convention: 1 + ad_valorem_rate (so zero tariff = 1.0, not 0)
  • A detailed project styleguide defining coding conventions

Interactive design decisions

Rather than making assumptions, Claude asked structured questions:

  ? Which KITE model should we use for the GE analysis?
    ● caliendo_parro_2015 (Recommended)
    ○ hinz_mahlkow_sogalla_2026

  ? What scenarios should the policy brief compare?
    ● China-only (Recommended) — S1: China removes tariffs. S2: Same + US tariffs.
    ○ Broader trade war
    ○ Phased rollout

  ? What descriptive figures should we produce?
    ● Comprehensive set (Recommended) — 4 charts + scatter plot
    ○ Minimal
    ○ Comparative focus

  ? What time horizons should we model?
    ○ Short-run + Long-run
    ● Long-run only
    ○ Three horizons
TipLesson: Let the tool ask you questions

Claude Code can present multiple-choice questions with trade-offs explained. This is more efficient than open-ended prompting. You make the research judgment calls; the tool handles the implementation.

The design spec

Claude wrote a formal design spec covering the pipeline structure, scenario design, figure descriptions, data requirements, and coding conventions. Then it dispatched a spec reviewer subagent that caught critical issues:

  ◐ Agent "Review spec document" is running...

  Spec reviewer found 3 critical issues:

  1. TARIFF VALUE SEMANTICS: The spec says value = 0 for zero tariff,
     but GTAP stores 1 + ad_valorem_rate. Zero tariff = 1.0.
     Setting to 0 would mean a -100% tariff (a subsidy!).

  2. MISSING copy() PATTERN: In R's data.table, modifying by reference
     can corrupt shared objects. Every scenario must use
     copy(initial_conditions$tariff).

  3. GTAP REGIONAL AGGREGATION: Many small African countries are
     aggregated into composite regions (XNF, XWF, etc.). The scenario
     code needs GTAP region codes, not raw ISO3 codes.

  Verdict: REVISE BEFORE IMPLEMENTATION.

These are exactly the kinds of bugs that would cause silent wrong results if caught later (or not at all).

The implementation plan

After the spec was approved, Claude wrote a detailed implementation plan with 5 tasks, each broken into numbered steps with complete code. A plan reviewer then checked:

  • Does XCF (Rest of Central Africa) actually exist in GTAP11? → No. Removed.
  • Should XSC (Rest of SACU, includes Eswatini) be included? → Yes, with a footnote.
  • Does process_results() exist in the KITE package? → Yes. The raw output needs processing before welfare data is available.

Phase 2: Implementation

We chose subagent-driven development: Claude dispatches a fresh subagent for each task, reviews results between tasks, and course-corrects.

Task 0: Repository setup

A fast subagent created the project infrastructure — .gitignore, Makefile, directory structure — and copied the GTAP11 initial conditions from an existing project:

  ◐ Agent "Implement Task 0: repo setup" is running...

  ✓ Created .gitignore (ignoring /input, /output, /temp)
  ✓ Created Makefile with targets: descriptives, scenarios, run, outputs, clean
  ✓ Created directories: input/, temp/scenarios/, temp/simulations/, output/figures/
  ✓ Copied initial_conditions_GTAP11_241129.rds to input/
  ✓ Removed old script (incorporated into 01-descriptives.R)
  ✓ Committed

Task 1: Descriptive figures (01-descriptives.R)

A subagent wrote the full script and ran it. Five figures produced:

  1. Top 15 African exporters to China — South Africa ($29B), Angola ($19B), DRC ($14B) dominate
  2. Sectoral composition — 60.7% mineral products, 16.4% base metals, 14.0% precious metals
  3. Trade trend 2022–2024 — stable around $103–110 billion
  4. China vs. US as destination — China dwarfs the US for nearly every African exporter
  5. GDP vs. imports scatter — refactored from the collaborator’s original script
  ◐ Agent "Implement Task 1: descriptives" is running...

  Claude ran  Rscript code/01-descriptives.R
  ⎿  Reading BACI 2022...
     Reading BACI 2023...
     Reading BACI 2024...
     Reading Gravity...

  ✓ 5 PNG files + 5 PDF files generated in output/figures/
  ✓ Committed as "Add descriptive figures script (BACI HS22)"

After each figure was generated, we visually inspected it by reading the PNG directly — Claude Code can display images inline. When the China-vs-US chart had a cut-off title, we widened it from 16cm to 20cm and re-rendered.

Key patterns worth noting:

  • Packages loaded via pacman::p_load() (auto-installs if missing)
  • Data manipulation uses data.table with magrittr pipes (e.g., dt[filter] %>% .[, .(x = sum(y)), by = group])
  • Figures saved in both PNG (for web) and PDF (for LaTeX), with rm() cleanup after each save
  • Custom theme theme_kiel() ensures all figures match the Kiel Institute background color (#F5F1E7)

Task 2: Scenario definition (02-scenarios.R)

The scenario script defines 42 African GTAP regions (37 individual countries + 5 regional aggregates) and constructs two scenarios — baseline and tariff removal:

  ◐ Agent "Implement Task 2: scenarios" is running...

  Claude ran  Rscript code/02-scenarios.R
  ⎿  African GTAP regions: 42
     Tariff rows (Africa -> China): 2,730
     Non-zero tariff rows: 339
     Tariff lines set to zero: 339
     Scenarios saved: baseline, S1_china_zero
ImportantThe copy() trap

In R’s data.table, assignment by reference means a <- b does NOT create an independent copy. If you then modify a, you also modify b. Every scenario must use copy(initial_conditions$tariff) to avoid corrupting the shared baseline. The spec reviewer caught this before any code was written.

The tariff convention is also a trap: GTAP stores tariffs as 1 + ad_valorem_rate. So zero tariff = 1.0, not 0. Setting to 0 would model a -100% tariff — effectively a subsidy. Again, the spec reviewer caught this.

Task 3: KITE model execution (03-run.R)

The model runs both scenarios through the Caliendo-Parro (2015) general equilibrium framework:

  ◐ Agent "Implement Task 3: KITE model run" is running...

  Claude ran  Rscript code/03-run.R
  ⎿
     ========================================
     Running scenario: baseline
     ========================================
     Converged in 1 iteration.
     Saved: temp/simulations/260416_china_africa_baseline.rds

     ========================================
     Running scenario: S1_china_zero
     ========================================
     Iteration 1: criterion = 0.0423
     Iteration 10: criterion = 0.00147
     Iteration 20: criterion = 0.000089
     Iteration 45: criterion = 0.0000018
     Converged in 45 iterations.
     Saved: temp/simulations/260416_china_africa_S1_china_zero.rds

     All scenarios complete.

After the model ran, we inspected the output structure to discover that process_results() is required before welfare data becomes available, and that welfare values are ratios (~1.0), not percentages. Percent change = (value - 1) * 100.

Task 4: GE output figures (04-outputs.R)

Three figures plus a welfare map:

  ◐ Agent "Implement Task 4: GE outputs" is running...

  Claude ran  Rscript code/04-outputs.R
  ⎿
     === Headline Results ===
     African countries - mean welfare change: 0.0915 %
     African countries - median welfare change: 0.0159 %
     China - welfare change: -0.0081 %
     Top 5 African beneficiaries:
        COD  +1.8225%
        CIV  +0.3448%
        COG  +0.3047%
        NAM  +0.2575%
        BEN  +0.2254%

     Tables saved to output/tables/

  ✓ 4 PNG + 4 PDF figures generated
  ✓ 2 CSV welfare tables generated
  ✓ Committed

The map was added on the fly, after the initial plan, following the pattern from an existing policy brief project:

  • Uses the sf package with a Robinson projection
  • Choropleth color scale: green (gains) to red (losses), capped at +/-2%
  • DRC stands out clearly as the largest beneficiary

Key results

Country Welfare change Mechanism
DRC +1.82% Cobalt/copper tariffs removed; mining-concentrated economy
Cote d’Ivoire +0.34% Agricultural products gain access
Congo +0.30% Oil and minerals
Namibia +0.26% Diversified minerals
South Africa +0.19% Broad export basket
Malawi -0.03% Trade diversion (was already duty-free)
China -0.008% Negligible tariff revenue loss

Average African welfare gain: 0.09%. Modest — because 94.5% of exports were already duty-free.


Phase 3: The Policy Brief

Writing on Overleaf

The Overleaf project syncs via Dropbox. Claude created the directory, copied all PDF figures, and wrote a complete 14-page LaTeX document following the template from an existing Kiel Institute policy brief:

  Claude ran  mkdir -p .../Overleaf/PB China Africa Zero Tariffs/figures
  Claude ran  cp output/figures/260416_*.pdf .../Overleaf/.../figures/

  ✓ Created policy_brief.tex (246 lines)
  ✓ Created policy_brief.bib (3 references)
  ✓ Copied 9 PDF figures

  Claude ran  pdflatex policy_brief.tex && bibtex policy_brief && pdflatex ...
  ⎿  Output written on policy_brief.pdf (14 pages, 737507 bytes).

The brief has 7 sections: Overview, Introduction (FOCAC, AGOA, geopolitics), Descriptive Analysis (4 figures), KITE Simulations (welfare + trade creation), Discussion (opportunities, limits, risks), Policy Implications, Conclusion.

The background color bug

When we compiled the PDF, the figure backgrounds didn’t match the Kiel beige page color. The issue: each ggplot explicitly called theme_minimal(), which overrode the global theme settings. We verified programmatically:

  Claude ran  python3 -c "from PIL import Image; ..."
  ⎿  260416_top_exporters.png:     RGB(255, 255, 255) - WRONG
     260416_sectoral_composition.png: RGB(255, 255, 255) - WRONG
     260416_welfare_map.png:        RGB(245, 241, 231) - OK
     ...

  Fix: defined theme_kiel() wrapping theme_minimal() with Kiel bg,
  replaced all theme_minimal() calls, regenerated all figures.

  Claude ran  python3 -c "from PIL import Image; ..."
  ⎿  260416_top_exporters.png:     RGB(245, 241, 231) - OK
     260416_sectoral_composition.png: RGB(245, 241, 231) - OK
     260416_trade_creation.png:    RGB(245, 241, 231) - OK
     ...
     All 9/9 figures verified ✓
TipLesson: Verify visually and programmatically

The figures looked right in an earlier render, but a pixel-level check revealed white backgrounds. When appearance matters for publication, automated verification catches what the eye misses.


Phase 4: AI-Assisted Peer Review

We used Claude Code’s multi-agent architecture to generate 5 reviewer personas with different backgrounds:

Reviewer Affiliation Lens
Prof. Karanja Nairobi / AERC African trade realities on the ground
Prof. Nicoletti Bocconi GE modeling rigor and scenario design
Dr. Voss BMWK Berlin German/EU policy relevance
Dr. Okonkwo Brookings Africa Actionable policy recommendations
Tariq El-Mansouri LSE MSc Can a non-specialist follow this?

All 5 ran as parallel subagents. Each read the full LaTeX source and wrote a structured review. The synthesis identified consensus issues (flagged by 3+ reviewers):

  • Trade creation percentages need absolute dollar values alongside them
  • The AfCFTA interaction is completely absent
  • Rules of Origin aren’t discussed
  • GTAP sector codes (wol, pfb, sgr) are unintelligible to policymakers
  • The DRC’s 1.82% result needs sensitivity analysis

And perspective-specific insights that only one reviewer type would catch:

  • Nicoletti (methods): trade creation may be computed on trade shares, not trade flows — potentially the wrong metric entirely
  • Voss (BMWK): the entire EU dimension is missing — the Kiel Institute’s main audience gets no policy recommendations
  • Karanja (Africa): Africa is presented as a passive recipient, not a strategic actor

What We Learned

For researchers new to AI coding tools

  1. You don’t need to know how to code to direct a coding agent. You need to know what you want, what the data looks like, and how to evaluate output.
  2. Visual inspection matters. We caught a cut-off title, wrong background colors, and a dubious figure by looking at the output. Don’t trust — verify.
  3. The planning step is not overhead. The spec and plan caught 3 critical bugs (tariff convention, copy() trap, wrong GTAP regions) before any code was written.

For researchers who already code

  1. Subagents parallelize independent work. Research + codebase exploration ran simultaneously. Five reviewers ran simultaneously. Don’t serialize what can be parallelized.
  2. Pattern reuse across projects is powerful. Claude explored 3 existing projects and extracted the common pattern (scenarios → run → outputs). Your old projects are templates for new ones.
  3. The AI makes mistakes that matter. Setting tariffs to 0 instead of 1.0 would have been silently wrong. The copy() issue would have corrupted the baseline. Code review — even by another AI agent — catches these.

For everyone

  1. The ratio of thinking to typing has shifted. Most of our time was spent deciding what to analyze, which scenarios to run, and whether the results made sense. Almost none was spent writing code.
  2. A policy brief is not just code output. The writing, framing, and editorial judgment are still human. Claude wrote the LaTeX, but the argument structure, the geopolitical framing, and the decision to lead with “this is mostly symbolic” came from research judgment.
  3. AI-generated peer review is surprisingly useful. Five diverse personas found issues that a single reviewer might miss. The BMWK adviser’s critique (“where is the EU?”) and the methodologist’s concern about trade shares vs. flows were genuinely valuable feedback.

The Full Pipeline

 1. Pull collaborator's code + data
 2. Research: news + existing repo patterns (parallel)
 3. Design spec → spec review → fix
 4. Implementation plan → plan review → fix
 5. Task 0: Repo setup (.gitignore, Makefile, data)
 6. Task 1: Descriptive figures (BACI HS22)        → 5 figs
 7. Task 2: Scenario definition (GTAP11)           → RDS
 8. Task 3: KITE model run (Caliendo-Parro 2015)   → RDS
 9. Task 4: GE output figures                       → 4 figs
10. Write policy brief (LaTeX on Overleaf)          → PDF
11. Fix figure backgrounds (theme_kiel)
12. Expert review (5 parallel personas)             → synthesis
13. Commit and push

See Also