Automated Interpretability download

The automated-interpretability repository implements tools and pipelines for automatically generating, simulating, and scoring explanations of neuron (or latent feature) behavior in neural networks. Instead of relying purely on manual, ad hoc interpretability probing, this repo aims to scale interpretability by using algorithmic methods that produce candidate explanations and assess their quality. It includes a “neuron explainer” component that, given a target neuron or latent feature, proposes natural language explanations or heuristics (e.g. “this neuron activates when the input has property X”) and then simulates activation behavior across example inputs to test whether the explanation holds. The project also contains a “neuron viewer” web component for browsing neurons, explanations, and activation patterns, making it more interactive and exploratory.

Features

A neuron explainer module that proposes natural language or rule-based explanations for neuron/latent feature behavior
Simulation / scoring of explanations by comparing predicted activations vs true activations across inputs
A neuron viewer UI to browse neurons, see activations, and inspect explanations
Demo notebooks illustrating how explanations are generated and evaluated (e.g. explain_puzzles.ipynb)
Infrastructure for activation capture and analysis (e.g. modules like activations.py)
Ranking / scoring heuristics to decide which explanations are more faithful or useful

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow Automated Interpretability

Automated Interpretability Web Site

Other Useful Business Software

MongoDB Atlas runs apps anywhere

Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free

Rate This Project

User Reviews

Be the first to post a review of Automated Interpretability!

Additional Project Details

Programming Language

Python

Related Categories

Python Artificial Intelligence Software, Python Large Language Models (LLM)

Registered

2025-10-03

Similar Business Software

Windsurf Editor

The Windsurf Editor is a free AI-powered IDE and AI coding assistant that accelerates development by providing intelligent code generation and agents in over 70 programming languages and more than 40 IDEs, including VSCode, JetBrains, and Jupyter Notebooks. With Windsurf, developers can write...

See Software
Adobe Acrobat

Adobe Acrobat Studio is a leading enterprise document platform built to scale for global teams — delivering AI-powered document intelligence, trusted PDF tools, and on-brand content creation in one secure solution. Core capabilities include PDF creation, editing, conversion, annotation,...

See Software
Google Cloud Platform

Google Cloud is a cloud-based service that allows you to create anything from simple websites to complex applications for businesses of all sizes. New customers get $300 in free credits to run, test, and deploy workloads. All customers can use 25+ products for free, up to monthly usage...

See Software
Vertex AI

Build, deploy, and scale machine learning (ML) models faster, with fully managed ML tools for any use case. Through Vertex AI Workbench, Vertex AI is natively integrated with BigQuery, Dataproc, and Spark. You can use BigQuery ML to create and execute machine learning models in BigQuery...

See Software
Coursebox AI

Transform your content into engaging eLearning experiences with Coursebox, the #1 AI-powered eLearning authoring tool. Our platform automates the course creation process, allowing you to design a structured course in seconds. Simply make edits, add any missing elements, and your course is ready...

See Software
Pipedrive

Pipedrive is a web-based sales CRM (customer relationship management) software that lets sales teams track pipelines, optimize leads, manage deals and automate their entire sales process to focus on selling. Pipedrive’s simple interface empowers salespeople to streamline workflows and unite...

See Software

Report inappropriate content

Automated Interpretability

Code for Language models can explain neurons in language models paper

Get an email when there's a new version of Automated Interpretability

Features

Project Samples

Project Activity

Categories

License

Follow Automated Interpretability

User Reviews

Additional Project Details

Programming Language

Related Categories

Registered