Work & Projects Summary

Research

We conduct impactful research on AI safety.

CAIS conducts technical and conceptual research. Our team develops benchmarks and methods designed to improve the safety of existing systems. We prioritize transparency and accessibility, publishing our findings at top conferences and sharing our resources with the global community.

All Research

Field-building

We conduct impactful research on AI safety.

CAIS builds infrastructure and pathways into AI safety. We empower researchers with compute resources, funding, and educational materials while organizing workshops and competitions to promote safety research. Our goal is to create a thriving research ecosystem that will drive progress toward safe AI.

All Field-building Work

Featured Research:

Representation Engineering: A Top-Down Approach to AI Transparency

Andy Zou, Long Phan*, Sarah Chen*, James Campbell*, Phillip Guo*, Richard Ren*, Alexander Pan, Xuwang Yin, Mantas Mazeika, Ann-Kathrin Dombrowski, Shashwat Goel, Nathaniel Li, Michael J. Byun, Zifan Wang, Alex Mallen, Steven Basart, Sanmi Koyejo, Dawn Song, Matt Fredrikson, J. Zico Kolter, Dan Hendrycks

View Research

/

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark

Alexander Pan*, Chan Jun Shern*, Andy Zou*, Nathaniel Li, Steven Basart, Thomas Woodside, Jonathan Ng, Hanlin Zhang, Scott Emmons, Dan Hendrycks

View Research

/

Scaling Out-of-Distribution Detection for Real-World Settings

Dan Hendrycks*, Steven Basart*, Mantas Mazeika, Andy Zou, Joe Kwon, Mohammadreza Mostajabi, Jacob Steinhardt, Dawn Song

View Research

/

Dreamlike Pictures Comprehensively Improve Safety Measures

Dan Hendrycks*, Andy Zou*, Mantas Mazeika, Leonard Tang, Bo Li, Dawn Song, and Jacob Steinhardt

View Research

Field-building Projects:

/

AI Safety, Ethics, and Society

AI Safety, Ethics and Society is a textbook and online course providing a non-technical introduction to how current AI systems work, why many experts are concerned that continued advances in AI could pose severe societal-scale risks, and how society can manage and mitigate these risks.

View Project

/

SafeBench

The SafeBench competition stimulates research on new benchmarks which assess and reduce risks associated with artificial intelligence. We are providing $250,000 in prizes: five $20,000 prizes and three $50,000 prizes for top benchmarks.

View Project

Statement on AI Risk

Hundreds of AI experts and public figures express their concern about AI risk in this open letter. It was covered globally in publications like the New York Times, the Wall Street Journal, and the Washington Post.

View Project

Compute Cluster

To support progress and innovation in AI safety, we offer researchers free access to our compute cluster, which can run and train large-scale AI systems.

View Project

Work overview

This is an overview of CAIS AI safety research and field-building projects.

Research

Field-building

Featured Research:

Representation Engineering: A Top-Down Approach to AI Transparency

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark

Scaling Out-of-Distribution Detection for Real-World Settings

Dreamlike Pictures Comprehensively Improve Safety Measures

Field-building Projects:

AI Safety, Ethics, and Society

SafeBench

Statement on AI Risk

Compute Cluster

Research

Field-building

Featured Research:

Representation Engineering: A Top-Down Approach to AI Transparency

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark

Scaling Out-of-Distribution Detection for Real-World Settings

Dreamlike Pictures Comprehensively Improve Safety Measures

Field-building Projects:

AI Safety, Ethics, and Society

SafeBench

Statement on AI Risk

Compute Cluster

Keep up to date with AI Safety

Thank you!