Data Privacy & Confidentiality

Using AI tools with sensitive research data

How Claude Code Actually Handles Your Data

Understanding the data flow removes a lot of anxiety and helps you make good decisions.

Here is what happens when you use Claude Code:

Files stay on your machine. Claude Code runs in your terminal. Nothing is automatically uploaded. Your data files are not sent anywhere unless Claude explicitly reads them as part of a prompt.
When Claude reads a file, its contents go to Anthropic’s API. If you ask Claude to “look at data.csv” or it reads a file to answer your question, the file contents become part of the API request sent to Anthropic’s servers.
The conversation is processed on Anthropic’s infrastructure. Your prompt + file contents + Claude’s response travel over HTTPS to Anthropic, where the model generates a response, and the result is returned to your terminal.
Content is not retained for training. Anthropic does not use Claude Code API conversations for model training. They are processed and discarded. There is no persistent storage of your data on Anthropic’s side between sessions.
Each session is independent. When you close a Claude Code session and start a new one, the previous conversation is gone. Claude has no memory of what was in last week’s session.

The practical implication: the risk is not “Anthropic will store my data.” The risk is “this file’s contents were included in an API call” — which is the same risk as sending it in an email or pasting it into a web form.

The Dropbox Rule

If you wouldn’t put it on Dropbox, don’t let Claude read it.

This is a useful heuristic. When Claude reads a file, treat it like you’ve sent that file to a cloud service. Apply the same judgment you’d use for any external data processing: institutional policy, data agreements, IRB protocol, GDPR obligations.

The question is not “is this technically encrypted?” (it is). The question is: “do my data agreements and institutional rules permit sending this data to an external processor?”

German and EU Context: GDPR and DSGVO

If you work at a German institution — including the Kiel Institute — GDPR (in German: DSGVO, Datenschutz-Grundverordnung) applies to any personal data you handle. Using Claude Code with personal data has legal implications worth understanding.

What counts as personal data under GDPR:

Names, addresses, email addresses, phone numbers
ID numbers (Personalausweis, passport, social security equivalents)
IP addresses and device identifiers
Health, financial, or survey data that can identify individuals
Firm names (if firms are small enough to identify individuals)
Any combination of variables that could single out a specific person

What doesn’t count: Properly aggregated data where individuals cannot be re-identified. Country-level trade flows, sector-level employment statistics, anonymized survey indices — these are generally fine.

The DSGVO-Auftragsverarbeitung issue. Sending personal data to an external processor (like Anthropic) in principle requires an Auftragsverarbeitungsvertrag (AVV) — a data processing agreement. Anthropic offers a Data Processing Addendum for enterprise customers. For standard Claude subscriptions, you are subject to Anthropic’s standard Terms of Service and Privacy Policy, which may or may not satisfy your institution’s legal department.

Important

If your dataset contains personal data as defined by GDPR — survey microdata with identifiers, patient records, firm-level data with identifiable respondents — do not send it to Claude without first checking with your data steward and, if applicable, your DPO (Datenschutzbeauftragter).

Practical guidance for researchers:

Aggregated, anonymized data: proceed normally
Published microdata from official statistics (Eurostat, World Bank): generally fine
BACI, CEPII, UN Comtrade, WDI: no personal data, no issue
Firm-level survey data with identifiers: check your DTA first
Any IRB-restricted data: read your protocol before using cloud tools

The .claudeignore Pattern

Create a .claudeignore file in your project root to block Claude from reading specific files or directories. It works exactly like .gitignore.

A realistic example for a project with mixed data:

# Confidential firm-level microdata (NDA)
data/firm_survey/
data/confidential/
data/restricted/

# Patent application data with inventor identifiers
data/patents/raw/

# IRB-restricted survey responses
data/survey/raw_responses/
data/survey/identified/

# Any Stata dataset (conservative default — too easy to have identifiers)
*.dta

# Credentials and secrets
.env
credentials.json
api_keys.R
*.pem

With this file in place, Claude will never read these files — even if you accidentally ask it to.

Tip

Create your .claudeignore before you start working, not after. It’s a seatbelt: you put it on before you need it.

Working with Aggregated Data: A Concrete Workflow

You have confidential firm-level data but want Claude’s help with the analysis. Here is how to structure this:

Step 1: Do the sensitive aggregation yourself.

Run your own R/Stata script to aggregate to the level you can share: industry × year, region × sector, or similar. Save the aggregated file to output/data/aggregated.csv.

Step 2: Start Claude with the aggregated data.

“I have firm-level data I can’t share directly. I’ve aggregated it to the industry-year level. The file is output/data/aggregated.csv. Can you help me model this?”

Step 3: Have Claude write the analysis code.

Claude writes and debugs the analysis code using the aggregated file. It never needs to see the underlying microdata.

Step 4: You run the code on the confidential data.

When Claude has written the full analysis pipeline, you run it locally against the real data. Claude gets the output (aggregated results, not raw observations).

This workflow lets you use Claude for the 80% of work that doesn’t require seeing sensitive observations.

The Separate Projects Approach

If you have a project with truly sensitive data that shouldn’t mix with Claude at all, keep it in a completely separate directory and never start Claude Code there.

Claude Code’s scope is the directory you launch it from. If you’re in ~/projects/gravity_model/, it has no access to ~/projects/confidential_survey/ unless you explicitly navigate there or point Claude to it.

Note

If you work on multiple projects — some sensitive, some not — it helps to be deliberate about which terminal window you’re in before typing any prompts. A moment of confusion can lead to Claude reading a file it shouldn’t.

What to Never Let Claude See

These categories are absolute — no exceptions, no “just this once”:

Category	Example	Risk
API keys	`ANTHROPIC_API_KEY=sk-ant-...`	Immediate financial exposure
Passwords	Database credentials, SSH passphrases	Account compromise
SSH private keys	`~/.ssh/id_rsa`	Full server access
Personal identifiers	Names + IDs in microdata	GDPR liability
IRB-restricted data	Survey responses with names	Ethical/legal violation
Proprietary code	Licensed firm algorithms	Contract violation

The API key / password issue is particularly common. Never put credentials in a file that Claude might read. Use a .env file, add it to .claudeignore and .gitignore, and use Sys.getenv("API_KEY") in your R code to access it.

The Panic Button

“I think Claude just read a file it shouldn’t have — what now?”

First: stop the session. Close the terminal window. The conversation is done.

What Claude has seen: Only what was explicitly read during that session. Claude does not autonomously explore your file system. It reads files when you ask it to, or when it needs them to answer a question. Check the conversation log — every file read is visible.

What happens to it: According to Anthropic’s current privacy policy, API conversations are processed and discarded — they are not stored for training or logged in a searchable database. See anthropic.com/privacy for the authoritative statement.

If it was credentials: Rotate them immediately. Assume they are compromised from the moment they were in a conversation, regardless of what the privacy policy says. It takes 2 minutes to regenerate an API key. Don’t skip this step.

If it was personal data: Document what happened (file name, approximate content, when), and report it to your institution’s data protection officer if required by your DTA or IRB protocol. This is a legal obligation in some contexts, not a sign of catastrophe.

Important

File contents shared in a Claude Code session are included in that session’s API call. They are not persisted between sessions. But “not persisted” is not the same as “never transmitted.” If the data was sensitive enough to matter, follow your institution’s incident reporting procedure.

Institutional Guidance

Before using Claude Code with research data, check three things:

Your data transfer agreement (DTA) or data license. Many dataset licenses explicitly prohibit sending data to third-party processors. BACI and CEPII data are generally permissive; survey microdata with NDAs often is not. When in doubt, the license is the authority.
Your IRB protocol. If you have IRB approval covering your data, the protocol typically specifies how data can be stored and processed. AI assistants are new enough that many older protocols don’t address them. If your protocol is silent, ask your IRB coordinator before proceeding with sensitive data.
Your institution’s data steward or DPO. The Kiel Institute, like most German research institutions, has a data protection officer (Datenschutzbeauftragter). If you’re unsure whether your data use is compliant, that’s the right person to ask. A quick email is faster than assuming.

Anthropic’s Privacy Policy

For the authoritative source on what Anthropic does and does not retain: anthropic.com/privacy

Key points as of April 2026 (check the link for the latest version): API calls are not used for training. Consumer (Claude.ai) conversations may be reviewed for safety; API/Claude Code conversations have more stringent protections. Enterprise customers can negotiate additional data processing agreements.

On MCP and Connected Services

If you connect Claude Code to your email, calendar, or cloud storage via MCP (Model Context Protocol), you are giving the tool full account access — Gmail MCP can read any email, Drive MCP reaches every file. Chris Blattman writes about this candidly.

Three risks Blattman highlights for MCP setups

Full account access. Each MCP integration grants Claude Code access to your entire connected account, not just selected items.
Shared computer risk. Anyone with terminal access on your machine can run claude and reach all your connected services. If you share a workstation, treat MCP very carefully or use separate user accounts.
API transit. Content from connected services passes through Anthropic’s API. It is not used for training (per current policy), but it does leave your machine.

If you set up MCP, treat Claude Code as the most-privileged app on your computer. Audit which services are connected. Disable ones you no longer use.

Blattman’s MCP setup page is the most practical walkthrough we know of, with verbatim configuration JSON for Google Workspace, Zotero, Apple apps, and WhatsApp. Read the security warnings before enabling anything beyond local file access.

→ See also: claudeblattman.com.