Skip to content

Notebooks

Interactive notebooks live in notebooks/ at the repo root. They are the primary narrative format for investigations — code, charts, and commentary together.

Viewing notebooks

GitHub renders .ipynb files natively. Click any link below to read the notebook in your browser without cloning.

Execution note

These notebooks load artifacts from specific experiment runs (run IDs and artifact paths are hardcoded to the original research environment). To re-run them end-to-end, first reproduce the relevant sweep with mech sweep --execute, then update the artifact path constants at the top of each notebook.

Notebook index

Notebook Investigation Status
06 — SAE Replication Crisis Investigation #3 Complete — pairwise cosine heatmaps, distribution histograms, 4-condition comparison
07 — GPT-2 Factual Recall Story Investigation #2 Complete — logit lens, DLA, circuit patching, SAE feature cluster
08 — Feature Splitting Investigation #4 Complete — 4-size comparison, fidelity bar chart
09 — Refusal Audit Investigation #1 Complete — direction quality sweep, CAA compliance plots
10 — SAE at Scale Investigation #5 Complete — 2048-feature dictionary, feature cluster analysis
05 — Research Walkthrough General Tutorial — walks through a full experiment from spec to artifact
01 — Circuit Patching General Tutorial — activation patching basics
02 — Polysemanticity SAE General Tutorial — train and inspect a small SAE
03 — ACDC Lite General Tutorial — automated circuit discovery
04 — Agentic Loop General Tutorial — orchestration and iterate loop

Reproducing a notebook from scratch

SAE Replication Crisis (notebook 06)

# 1. Train 5 SAEs at layer 0
mech sweep \
  --base experiments/polysemanticity.yaml \
  --axis "parameters.seed=1,2,3,4,5" \
  --output experiments/sweeps/sae_seed_stability_layer0.yaml \
  --execute

# 2. Generate the stability report
mech analyze-sae-stability \
  --sweep experiments/sweeps/sae_seed_stability_layer0.yaml \
  --output artifacts/stability_layer0_128.json \
  --live-only

# 3. Update REPORT_PATH in notebook cell 1 to point at your new artifact
# 4. jupyter nbconvert --execute --to notebook --inplace notebooks/06_sae_replication_crisis.ipynb

GPT-2 Factual Recall (notebook 07)

# Reproduce the six runs (adjust run IDs after execution)
mech run --name logit-lens-factual
mech run --name direct-logit-attribution-factual
mech run --name attribution-patching-factual
mech run --name circuit-patching-factual
mech run --name polysemanticity-sae-layer9-factual
mech run --name causal-scrubbing-factual

# Then update run ID constants at top of notebook 07