Work overview

This is an overview of CAIS AI safety research and field-building projects.

CAIS Research Icon

Research

We conduct impactful research on AI safety.

CAIS conducts technical and conceptual research. Our team develops benchmarks and methods designed to improve the safety of existing systems. We prioritize transparency and accessibility, publishing our findings at top conferences and sharing our resources with the global community.

Learn More
CAIS Field Building Icon

Field-building

We are building the AI safety research field.

CAIS builds infrastructure and pathways into AI safety. We empower researchers with compute resources, funding, and educational materials while organizing workshops and competitions to promote safety research. Our goal is to create a thriving research ecosystem that will drive progress toward safe AI.

Learn More

Featured Research:

Representation Engineering: A Top-Down Approach to AI Transparency

Andy Zou, Long Phan*, Sarah Chen*, James Campbell*, Phillip Guo*, Richard Ren*, Alexander Pan, Xuwang Yin, Mantas Mazeika, Ann-Kathrin Dombrowski, Shashwat Goel, Nathaniel Li, Michael J. Byun, Zifan Wang, Alex Mallen, Steven Basart, Sanmi Koyejo, Dawn Song, Matt Fredrikson, J. Zico Kolter, Dan Hendrycks

View Research

/

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark

Alexander Pan*, Chan Jun Shern*, Andy Zou*, Nathaniel Li, Steven Basart, Thomas Woodside, Jonathan Ng, Hanlin Zhang, Scott Emmons, Dan Hendrycks

View Research

/

Scaling Out-of-Distribution Detection for Real-World Settings

Dan Hendrycks*, Steven Basart*, Mantas Mazeika, Andy Zou, Joe Kwon, Mohammadreza Mostajabi, Jacob Steinhardt, Dawn Song

View Research

/

Dreamlike Pictures Comprehensively Improve Safety Measures

Dan Hendrycks*, Andy Zou*, Mantas Mazeika, Leonard Tang, Bo Li, Dawn Song, and Jacob Steinhardt

View Research

Featured Field-building Projects:

SafeBench

The SafeBench competition stimulates research on new benchmarks which assess and reduce risks associated with artificial intelligence. We are providing $250,000 in prizes: five $20,000 prizes and three $50,000 prizes for top benchmarks.

View Project

Statement on AI Risk

Hundreds of AI experts and public figures express their concern about AI risk in this open letter. It was covered globally in publications like the New York Times, the Wall Street Journal, and the Washington Post.

View Project

Compute Cluster

To support progress and innovation in AI safety, we offer researchers free access to our compute cluster, which can run and train large-scale AI systems.

View Project

Philosophy Fellowship

The CAIS Philosophy Fellowship is a seven-month research program that investigates the societal implications and potential risks associated with advanced AI.

View Project

ML Safety Course

The ML Safety course offers a comprehensive introduction to ML safety, covering topics such as anomaly detection, alignment, risk engineering, and so on.

View Project