technical-ai-safety
an archive of posts with this tag
| Oct 08, 2025 | Capability-based evals for chain-of-thought monitorability |
|---|---|
| Jun 20, 2025 | Early ideas for mechanistically investigating emergent misalignment |
| Jun 17, 2025 | Circuit tracing for chain-of-thought |
| Jun 07, 2025 | Cross-layer transcoders in a battery foundation model |