Notebooks¶

Interactive notebooks live in notebooks/ at the repo root. They are the primary narrative format for investigations — code, charts, and commentary together.

Viewing notebooks¶

GitHub renders .ipynb files natively. Click any link below to read the notebook in your browser without cloning.

Execution note

These notebooks load artifacts from specific experiment runs (run IDs and artifact paths are hardcoded to the original research environment). To re-run them end-to-end, first reproduce the relevant sweep with mech sweep --execute, then update the artifact path constants at the top of each notebook.

Notebook index¶

Notebook	Investigation	Status
06 — SAE Replication Crisis	Investigation #3	Complete — pairwise cosine heatmaps, distribution histograms, 4-condition comparison
07 — GPT-2 Factual Recall Story	Investigation #2	Complete — logit lens, DLA, circuit patching, SAE feature cluster
08 — Feature Splitting	Investigation #4	Complete — 4-size comparison, fidelity bar chart
09 — Refusal Audit	Investigation #1	Complete — direction quality sweep, CAA compliance plots
10 — SAE at Scale	Investigation #5	Complete — 2048-feature dictionary, feature cluster analysis
05 — Research Walkthrough	General	Tutorial — walks through a full experiment from spec to artifact
01 — Circuit Patching	General	Tutorial — activation patching basics
02 — Polysemanticity SAE	General	Tutorial — train and inspect a small SAE
03 — ACDC Lite	General	Tutorial — automated circuit discovery
04 — Agentic Loop	General	Tutorial — orchestration and iterate loop

Reproducing a notebook from scratch¶

SAE Replication Crisis (notebook 06)¶

# 1. Train 5 SAEs at layer 0
mech sweep \
  --base experiments/polysemanticity.yaml \
  --axis "parameters.seed=1,2,3,4,5" \
  --output experiments/sweeps/sae_seed_stability_layer0.yaml \
  --execute

# 2. Generate the stability report
mech analyze-sae-stability \
  --sweep experiments/sweeps/sae_seed_stability_layer0.yaml \
  --output artifacts/stability_layer0_128.json \
  --live-only

# 3. Update REPORT_PATH in notebook cell 1 to point at your new artifact
# 4. jupyter nbconvert --execute --to notebook --inplace notebooks/06_sae_replication_crisis.ipynb

GPT-2 Factual Recall (notebook 07)¶

# Reproduce the six runs (adjust run IDs after execution)
mech run --name logit-lens-factual
mech run --name direct-logit-attribution-factual
mech run --name attribution-patching-factual
mech run --name circuit-patching-factual
mech run --name polysemanticity-sae-layer9-factual
mech run --name causal-scrubbing-factual

# Then update run ID constants at top of notebook 07