Writing a CLAUDE.md

The most important file in your project

What It Is

A CLAUDE.md file lives in your project’s root folder. Claude Code reads it automatically every time you start a session in that directory — before you type a single prompt.

Think of it as a standing briefing memo for a new research assistant. Instead of re-explaining your data structure, coding conventions, and constraints every session, you write it once and Claude just knows.

Tip

A good CLAUDE.md is worth 10 minutes of writing and saves you hours of repeated context-setting across every session.

You do not need one to use Claude Code. But without it, Claude will make plausible-but-wrong assumptions: it might reach for lm() when you always use feols(), write figures to the wrong folder, or try to install packages without asking. The CLAUDE.md is where you prevent all of that.

Best external resources

Blattman’s CLAUDE.md Guide — Chris Blattman’s own guide, written for non-coders
A Real CLAUDE.md (Annotated) — Blattman’s actual production file. The Confirmation Guidelines section is especially worth studying.
Matt Van Horn — Every Claude Code Hack I Know — CLAUDE.md + plan.md as a system. “No IDE. Just plan.md files and voice.”

Template: Economics Research Project

This template is built around a typical empirical trade/macro workflow — R, fixest, ggplot2, BACI-style bilateral data. Adapt to your own project.

# [Project Title]

## Project Context
[One paragraph: what this project is about, the core research question,
and where it currently stands. Be concrete: "We estimate a structural
gravity model to test whether geopolitical alignment affects German export
flows at the product level, using HS6 data from BACI 2000–2022."]

Researcher: [Your name], [Your institution]
Collaborators: [Names if applicable]
Status: [e.g., "Data cleaning complete, running baseline regressions"]

## Data

### Sources
- `data/baci_sample.csv` — BACI bilateral trade flows
  - Columns: year, exporter_iso3, importer_iso3, hs6, value_usd
  - Filtered to: Germany as exporter or importer, 2000–2022
  - Product level: HS 6-digit
  - Size: ~2.4M rows

- `data/cepii_distances.csv` — CEPII GeoDist bilateral distances
  - Columns: iso_o, iso_d, dist, contig, comlang_off, colony
  - dist = population-weighted bilateral distance in km

- `data/unga_idealpoints.csv` — UNGA ideal point estimates (Bailey et al.)
  - Columns: year, iso3, idealpoint
  - Political distance = |idealpoint_i - idealpoint_j|

- `data/controls.csv` — [describe your control variables]

### Rules
- Never modify raw data files in `data/` — they are read-only
- Processed/merged data goes in `output/data/`
- Final datasets used for regressions go in `output/data/final/`
- All operations must be reproducible from raw inputs
- If you need a new derived variable, create it in `02_merge.R`, not inline

## Code Conventions

### Language & Packages
- R is the only language for this project
- Use `dplyr` and `tidyr` for data manipulation (or `data.table` — pick one per project)
- Use `fixest` for all regressions (`feols`, `fepois`)
- Use `ggplot2` for all figures — no base R plots
- Use `modelsummary` for regression tables
- Use `readr` for CSV I/O, `haven` for Stata files

### Script Structure
- One script per task, numbered: `01_clean.R`, `02_merge.R`,
  `03_gravity.R`, `04_figures.R`
- Comment sections with `# ---- Section Name ----`
- snake_case for all variable and function names
- Never use `attach()` or `setwd()`
- Use explicit package prefixes for ambiguous functions: `dplyr::select()`

### Regressions (fixest conventions)
- Use PPML (`fepois`) as the preferred estimator
- Include exporter-year and importer-year fixed effects for structural gravity
- Report OLS on log flows as a comparison specification
- Cluster standard errors at the country-pair level: `cluster = ~iso_o^iso_d`
- Distance elasticity should be negative (typically around -1)
- Use `etable()` or `modelsummary()` for output, never print raw model objects

### Figures
- Theme: `theme_minimal()` with these customizations:
  - Kiel Institute blue: `#003366`
  - Accent orange: `#E67E22`
  - Axis labels: 11pt, titles: 13pt
- Export both PDF and PNG at 300 DPI
- Save to `output/figures/`
- File names match the script that produced them: `fig_gravity_main.pdf`

## What NOT to Do
- Do not install packages without asking first
- Do not modify any file in `data/`
- Do not push to git without explicit instruction
- Do not interpret causal relationships — describe patterns only
  (this is a gravity model, not a causal identification strategy)
- Do not mix languages — stick to R for this project
- Do not use `lm()` for gravity — always `feols` or `fepois`
- Do not hard-code file paths — use `here::here()` for portability

Template: Text-Heavy / Policy Project

For writing-focused projects — literature reviews, policy briefs, working papers — where there’s no data pipeline.

# [Project Title]

## Project Context
[One paragraph: what you're writing, for what audience, at what stage.
"This is a policy brief for the Kiel Policy Brief series on deglobalization
risks. Target audience: policymakers, not academics. Currently in first-draft
stage, section 2 is missing."]

Author: [Name], [Institution]
Target publication: [Journal / Series / Event]
Deadline: [Date]
Word limit: [If applicable]

## Document Structure

- `main.tex` — master LaTeX file (or `main.qmd` for Quarto)
- `sections/` — one file per section
- `refs/refs.bib` — BibTeX references
- `figures/` — all figures (tikz preferred, no external images)

## Voice & Style

- Audience: senior economists and policy advisors — not undergraduates,
  not the general public
- Tone: precise, direct, non-polemical
- No contractions, no rhetorical questions
- Short paragraphs, no more than 5 sentences
- Active voice preferred
- First reference to a dataset: give full name, source, and year
  (e.g., "We use data from BACI (CEPII, 2023)")
- Cite using BibTeX keys already in refs.bib — do not invent new ones

## What NOT to Do
- Do not add hedging filler ("It is worth noting that...")
- Do not change section headings without asking
- Do not alter the conclusions section — that is where I am most specific
- Do not suggest running regressions or creating tables — this project
  has no empirical component
- Do not add figures unless explicitly asked
- Do not change citation style or BibTeX formatting

## In-Progress Notes
[Use this section as a scratchpad — paste current TODO items,
unresolved questions, decisions made. Update it as you work.]

- [ ] Section 2 needs 400 words on supply chain fragmentation
- [ ] Check whether Freund & Manova (2012) is already cited
- [ ] Paula to review conclusions before submission

Principles for Good CLAUDE.md Files

1. Be specific about your data. Column names, units, row counts, what “country” means (ISO3 or name?). “The key columns are reporter_iso3, partner_iso3, hs4, value_usd” saves Claude from guessing — and guessing wrong.

2. State your conventions explicitly. If you use feols() instead of lm(), say so. If figures go to output/figures/, say so. Claude will follow established patterns it sees in your code — but it might see multiple patterns. Resolve ambiguity upfront.

3. Include anti-patterns. The “What NOT to Do” section is one of the most valuable parts. Prohibition is clearer than instruction. “Do not interpret causally” and “Do not install packages without asking” prevent the most common frustrations.

4. Keep it current. The CLAUDE.md is only useful if it’s accurate. After adding a new data file, add it to the Data section. After a major refactor, update the script structure. An outdated CLAUDE.md is worse than none — it actively misinforms.

5. Don’t over-engineer it. A 10-line CLAUDE.md that accurately describes your project is better than a 200-line one that’s aspirational. Start small, add as you discover what Claude gets wrong.

Common Mistakes

Warning

Vague project context. “This is a research project” tells Claude nothing. Write a real sentence about your research question and data source.

Warning

Missing data rules. If you don’t say “never modify data/”, Claude will happily write cleaned data back to the raw folder. Always state the raw vs. output distinction.

Warning

Assuming Claude knows your packages. fixest, modelsummary, wbstats, plm — these are not universal defaults. State what you use, especially if you have a preference between competing packages (e.g., fixest vs lfe for high-dimensional fixed effects).

Warning

Forgetting to update after major changes. If you restructure your folder layout or rename scripts, an outdated CLAUDE.md sends Claude looking in the wrong places.

Warning

No “What NOT to Do” section. This is where you prevent the most painful mistakes. Always include at least 3–5 prohibitions.

FAQ

What if I use Stata?

CLAUDE.md works the same way. Specify .do files instead of .R, your conventions for naming locals and globals, whether you want estout or esttab for tables, and how you handle preserve/restore. Claude Code can read and write .do files. One note: Stata execution (running .do files) requires a Stata license accessible from the terminal — Claude Code can write the code, but you may need to run it yourself.

What if I use Python?

Same concept, different specifics. Swap dplyr/data.table for pandas, fixest for statsmodels or linearmodels, ggplot2 for matplotlib/seaborn, and here::here() for pathlib.Path. Specify your dependency management (requirements.txt, pyproject.toml, or conda), your formatter (black, ruff), and your test runner (pytest). Claude Code handles .py files as naturally as .R files.

Can I have multiple CLAUDE.md files?

Yes. A CLAUDE.md in a subdirectory adds to — not replaces — the root one. Useful for projects with distinct subcomponents: a code/CLAUDE.md for coding conventions and a writing/CLAUDE.md for manuscript style. Claude reads both.

How long should it be?

For a typical research project: 50–150 lines. Long enough to cover data, conventions, and prohibitions. Short enough that you’ll actually keep it updated. The templates above are upper bounds, not minimums.

Does it work with Codex (OpenAI)?

Yes. Codex reads CLAUDE.md as well. The format is compatible. See the costs guide for a comparison of the tools.

Should I commit it to Git?

Yes, always. It belongs in version control alongside your code. Your collaborators benefit from it too.

Examples

Demo project CLAUDE.md — the file used in the AI Bootcamp workshop (gravity model, BACI data, R/fixest)
Blattman’s CLAUDE.md guide — Chris Blattman’s own guidance, written for non-coders
A Real CLAUDE.md (annotated) — a sanitized version of Blattman’s actual production file. The Confirmation Guidelines section (when Claude should ask permission vs proceed) is the part that took him the most iteration and is well worth studying.
claudeblattman.com — Blattman’s full site with workflows, skills, and templates

Policy files — a Blattman pattern worth stealing

For larger workflows, Blattman recommends writing policy files (e.g. email-policy.md, calendar-policy.md, data-policy.md) before you build skills around them. Each takes 15–30 minutes to write and saves hours of debugging. They define what Claude is allowed to do without asking, what requires approval, and what is off-limits. Reference them from your CLAUDE.md.