Data Privacy & Confidentiality
Using AI tools with sensitive research data
How Claude Code Actually Handles Your Data
Understanding the data flow removes a lot of anxiety and helps you make good decisions.
Here is what happens when you use Claude Code:
Files stay on your machine. Claude Code runs in your terminal. Nothing is automatically uploaded. Your data files are not sent anywhere unless Claude explicitly reads them as part of a prompt.
When Claude reads a file, its contents go to Anthropic’s API. If you ask Claude to “look at data.csv” or it reads a file to answer your question, the file contents become part of the API request sent to Anthropic’s servers.
The conversation is processed on Anthropic’s infrastructure. Your prompt + file contents + Claude’s response travel over HTTPS to Anthropic, where the model generates a response, and the result is returned to your terminal.
Content is not retained for training. Anthropic does not use Claude Code API conversations for model training. They are processed and discarded. There is no persistent storage of your data on Anthropic’s side between sessions.
Each session is independent. When you close a Claude Code session and start a new one, the previous conversation is gone. Claude has no memory of what was in last week’s session.
The practical implication: the risk is not “Anthropic will store my data.” The risk is “this file’s contents were included in an API call” — which is the same risk as sending it in an email or pasting it into a web form.
The Dropbox Rule
If you wouldn’t put it on Dropbox, don’t let Claude read it.
This is a useful heuristic. When Claude reads a file, treat it like you’ve sent that file to a cloud service. Apply the same judgment you’d use for any external data processing: institutional policy, data agreements, IRB protocol, GDPR obligations.
The question is not “is this technically encrypted?” (it is). The question is: “do my data agreements and institutional rules permit sending this data to an external processor?”
German and EU Context: GDPR and DSGVO
If you work at a German institution — including the Kiel Institute — GDPR (in German: DSGVO, Datenschutz-Grundverordnung) applies to any personal data you handle. Using Claude Code with personal data has legal implications worth understanding.
What counts as personal data under GDPR:
- Names, addresses, email addresses, phone numbers
- ID numbers (Personalausweis, passport, social security equivalents)
- IP addresses and device identifiers
- Health, financial, or survey data that can identify individuals
- Firm names (if firms are small enough to identify individuals)
- Any combination of variables that could single out a specific person
What doesn’t count: Properly aggregated data where individuals cannot be re-identified. Country-level trade flows, sector-level employment statistics, anonymized survey indices — these are generally fine.
The DSGVO-Auftragsverarbeitung issue. Sending personal data to an external processor (like Anthropic) in principle requires an Auftragsverarbeitungsvertrag (AVV) — a data processing agreement. Anthropic offers a Data Processing Addendum for enterprise customers. For standard Claude subscriptions, you are subject to Anthropic’s standard Terms of Service and Privacy Policy, which may or may not satisfy your institution’s legal department.
If your dataset contains personal data as defined by GDPR — survey microdata with identifiers, patient records, firm-level data with identifiable respondents — do not send it to Claude without first checking with your data steward and, if applicable, your DPO (Datenschutzbeauftragter).
Practical guidance for researchers:
- Aggregated, anonymized data: proceed normally
- Published microdata from official statistics (Eurostat, World Bank): generally fine
- BACI, CEPII, UN Comtrade, WDI: no personal data, no issue
- Firm-level survey data with identifiers: check your DTA first
- Any IRB-restricted data: read your protocol before using cloud tools
The .claudeignore Pattern
Create a .claudeignore file in your project root to block Claude from reading specific files or directories. It works exactly like .gitignore.
A realistic example for a project with mixed data:
# Confidential firm-level microdata (NDA)
data/firm_survey/
data/confidential/
data/restricted/
# Patent application data with inventor identifiers
data/patents/raw/
# IRB-restricted survey responses
data/survey/raw_responses/
data/survey/identified/
# Any Stata dataset (conservative default — too easy to have identifiers)
*.dta
# Credentials and secrets
.env
credentials.json
api_keys.R
*.pem
With this file in place, Claude will never read these files — even if you accidentally ask it to.
Create your .claudeignore before you start working, not after. It’s a seatbelt: you put it on before you need it.
Working with Aggregated Data: A Concrete Workflow
You have confidential firm-level data but want Claude’s help with the analysis. Here is how to structure this:
Step 1: Do the sensitive aggregation yourself.
Run your own R/Stata script to aggregate to the level you can share: industry × year, region × sector, or similar. Save the aggregated file to output/data/aggregated.csv.
Step 2: Start Claude with the aggregated data.
“I have firm-level data I can’t share directly. I’ve aggregated it to the industry-year level. The file is output/data/aggregated.csv. Can you help me model this?”
Step 3: Have Claude write the analysis code.
Claude writes and debugs the analysis code using the aggregated file. It never needs to see the underlying microdata.
Step 4: You run the code on the confidential data.
When Claude has written the full analysis pipeline, you run it locally against the real data. Claude gets the output (aggregated results, not raw observations).
This workflow lets you use Claude for the 80% of work that doesn’t require seeing sensitive observations.
The Separate Projects Approach
If you have a project with truly sensitive data that shouldn’t mix with Claude at all, keep it in a completely separate directory and never start Claude Code there.
Claude Code’s scope is the directory you launch it from. If you’re in ~/projects/gravity_model/, it has no access to ~/projects/confidential_survey/ unless you explicitly navigate there or point Claude to it.
If you work on multiple projects — some sensitive, some not — it helps to be deliberate about which terminal window you’re in before typing any prompts. A moment of confusion can lead to Claude reading a file it shouldn’t.
What to Never Let Claude See
These categories are absolute — no exceptions, no “just this once”:
| Category | Example | Risk |
|---|---|---|
| API keys | ANTHROPIC_API_KEY=sk-ant-... |
Immediate financial exposure |
| Passwords | Database credentials, SSH passphrases | Account compromise |
| SSH private keys | ~/.ssh/id_rsa |
Full server access |
| Personal identifiers | Names + IDs in microdata | GDPR liability |
| IRB-restricted data | Survey responses with names | Ethical/legal violation |
| Proprietary code | Licensed firm algorithms | Contract violation |
The API key / password issue is particularly common. Never put credentials in a file that Claude might read. Use a .env file, add it to .claudeignore and .gitignore, and use Sys.getenv("API_KEY") in your R code to access it.
Institutional Guidance
Before using Claude Code with research data, check three things:
Your data transfer agreement (DTA) or data license. Many dataset licenses explicitly prohibit sending data to third-party processors. BACI and CEPII data are generally permissive; survey microdata with NDAs often is not. When in doubt, the license is the authority.
Your IRB protocol. If you have IRB approval covering your data, the protocol typically specifies how data can be stored and processed. AI assistants are new enough that many older protocols don’t address them. If your protocol is silent, ask your IRB coordinator before proceeding with sensitive data.
Your institution’s data steward or DPO. The Kiel Institute, like most German research institutions, has a data protection officer (Datenschutzbeauftragter). If you’re unsure whether your data use is compliant, that’s the right person to ask. A quick email is faster than assuming.
Anthropic’s Privacy Policy
For the authoritative source on what Anthropic does and does not retain: anthropic.com/privacy
Key points as of April 2026 (check the link for the latest version): API calls are not used for training. Consumer (Claude.ai) conversations may be reviewed for safety; API/Claude Code conversations have more stringent protections. Enterprise customers can negotiate additional data processing agreements.
On MCP and Connected Services
If you connect Claude Code to your email, calendar, or cloud storage via MCP (Model Context Protocol), you are giving the tool full account access — Gmail MCP can read any email, Drive MCP reaches every file. Chris Blattman writes about this candidly.
- Full account access. Each MCP integration grants Claude Code access to your entire connected account, not just selected items.
- Shared computer risk. Anyone with terminal access on your machine can run
claudeand reach all your connected services. If you share a workstation, treat MCP very carefully or use separate user accounts. - API transit. Content from connected services passes through Anthropic’s API. It is not used for training (per current policy), but it does leave your machine.
If you set up MCP, treat Claude Code as the most-privileged app on your computer. Audit which services are connected. Disable ones you no longer use.
Blattman’s MCP setup page is the most practical walkthrough we know of, with verbatim configuration JSON for Google Workspace, Zotero, Apple apps, and WhatsApp. Read the security warnings before enabling anything beyond local file access.
→ See also: claudeblattman.com.