ALL Projects
Compute Cluster

CAIS Compute Cluster

The Center for AI Safety is providing compute resources for ML safety research.

What is the CAIS Compute Cluster?

The Center for AI Safety runs an initiative to provide free compute support for research projects in ML safety. We offer access to a GPU cluster with:

  • 256 A100 GPUs with 80GB memory
  • 1,600 Gbit/s inter-node network speeds

We hope that this initiative enables researchers to pursue novel lines of research which would be otherwise infeasible to pursue.

Any questions can be directed to compute@safe.ai

Who is Eligible for Access?

  • The CAIS Compute Cluster is specifically designed for researchers who are working on the safety of machine learning systems. For a non-exhaustive list of topics we are excited about, see Unsolved Problems in ML Safety or the ML Safety Course. We may also consider other research areas provided appropriate justification of the impact on ML safety is provided.
  • We are particularly excited to support work on LLM adversarial robustness and transparency [1, 2], and may give preference to those proposals.
  • Work which improves general capabilities or work that improves safety as a consequence of improving general capabilities are not in scope. “General capabilities” of AI refers to concepts such as a model’s accuracy on typical tasks, sequential decision making abilities in typical environments, reasoning abilities on typical problems, and so on.
  • The current application is primarily for professors.

Application Process

We expect to process applications and allocate the majority of computing resources around three application deadlines in February, June and October each year. For the current cycle, the deadline will be February 9th. Later applications may be considered if sufficient resources remain available.

Instructions on what your proposal should contain and other details required can be found in the application form. Project proposals can be brief and in many cases we expect that 250 words will be sufficient.

Applicants will need to specify how long they require access for their project, up to a maximum initial term of 4 months. Users of the cluster may request to extend their access at the end of this term, provided suitable progress is demonstrated.

Proposals will be assessed based on the following criteria:

  • Impact: if the proposed questions are successfully answered, how valuable would this be in improving the safety of AI systems?
  • Feasibility: How likely is it that this project will be successfully executed? Does the research team involved have the right experience, skills and track record to successfully execute?
  • Risks: How likely is it that this project could contribute to potentially harmful outcomes such as accelerating the development of more generally capable systems? How are such risks being managed?

Our Collaborators

We support leading experts in a diverse range of ML safety research directions, some of which are listed below.

Matthias Hein

Professor of Machine Learning, University of Tübingen

Jinwoo Shin

Professor of AI, Korean Advanced Institute of Science & Technology

Dawn Song

Professor of Computer Science, University of California Berkeley

David Wagner

Professor of Computer Science, University of California Berkeley

Percy Liang

Associate Professor of Computer Science, Stanford University

Scott Niekum

Associate Professor of Computer Science, University of Massachusetts Amherst

David Bau

Assistant Professor of Computer Science, Northeastern Khoury College

Robin Jia

Assistant Professor of Computer Science, University of Southern California

Bo Li

Assistant Professor of Computer Science, University of Illinois at Urbana-Champaign

Sergey Levine

Assistant Professor of Computer Science, UC Berkeley

Carl Vondrick

Assistant Professor of Computer Science, Columbia University

Florian Tramer

Assistant Professor of Computer Science, ETH Zurich

Yizheng Chen

Assistant Professor of Computer Science, University of Maryland

Graham Neubig

Assistant Professor of Computer Science, Carnegie Mellon University

Cihang Xie

Assistant Professor of Computer Science, UC Santa Cruz

Research produced using the CAIS compute cluster

Universal and Transferable Adversarial Attacks on Aligned Language Models

Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J. Zico Kolter, Matt Fredrikson

Publication link

Under review for conference

Representation Engineering: A Top-Down Approach to AI Transparency

Andy Zou, Long Phan*, Sarah Chen*, James Campbell*, Phillip Guo*, Richard Ren*, Alexander Pan, Xuwang Yin, Mantas Mazeika, Ann-Kathrin Dombrowski, Shashwat Goel, Nathaniel Li, Michael J. Byun, Zifan Wang, Alex Mallen, Steven Basart, Sanmi Koyejo, Dawn Song, Matt Fredrikson, J. Zico Kolter, Dan Hendrycks

Publication link

Under review for conference

Defending Against Transfer Attacks From Public Models

Chawin Sitawarin, Jaewon Chang*, David Huang*, Wesson Altoyan, David Wagner

Publication link

Under review for conference

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

Boxin Wang*, Weixin Chen*, Hengzhi Pei*, Chulin Xie*, Mintong Kang*, Chenhui Zhang*, Chejian Xu, Zidi Xiong, Ritik Dutta, Rylan Schaeffer, Sang T. Truong, Simran Arora, Mantas Mazeika, Dan Hendrycks, Zinan Lin, Yu Cheng, Sanmi Koyejo, Dawn Song, Bo Li*

Publication link

Under review for conference

Continuous Learning for Android Malware Detection

Yizheng Chen,Zhoujie Ding,David Wagner

Publication link

Under review for conference

Robust Semantic Segmentation: Strong Adversarial Attacks and Fast Training of Robust Models

Francesco Croce*, Naman D Singh*, Matthias Hein

Publication link

Under review for conference

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark

Alexander Pan*,Jun Shern Chan*,Andy Zou*,Nathaniel Li,Steven Basart,Thomas Woodside,Jonathan Ng,Hanlin Zhang,Scott Emmons,Dan Hendrycks

Publication link

Under review for conference

How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions

Lorenzo Pacchiardi*, Alex J. Chan*, Sören Mindermann, Ilan Moscovitz, Alexa Y. Pan, Yarin Gal, Owain Evans, Jan Brauner*

Publication link

Under review for conference

Out-of-context meta-learning in Large Language Models

David Krueger*,Dmitrii Krasheninnikov*,Egor Krasheninnikov

Publication link

Under review for conference

Taken out of context: On measuring situational awareness in LLMs

Lukas Berglund*, Asa Cooper Stickland*, Mikita Balesni*, Max Kaufmann*, Meg Tong*, Tomasz Korbak, Daniel Kokotajlo, Owain Evans

Publication link

Under review for conference

DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection

Yizheng Chen, Zhoujie Ding, Lamya Alowain, Xinyun Chen, David Wagner

Publication link

Under review for conference

Multi-scale Diffusion Denoised Smoothing

Jongheon Jeong, Jinwoo Shin

Publication link

Under review for conference

Testing Robustness Against Unforeseen Adversaries

Max Kaufmann*, Daniel Kang*, Yi Sun*, Steven Basart, Xuwang Yin, Mantas Mazeika, Akul Arora, Adam Dziedzic, Franziska Boenisch, Tom Brown, Jacob Steinhardt, Dan Hendrycks

Publication link

Under review for conference

Unified Concept Editing in Diffusion Models

Rohit Gandikota,Hadas Orgad,Yonatan Belinkov,Joanna Materzyńska,David Bau

Publication link

Under review for conference

Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics

Haoqin Tu*,Bingchen Zhao*,Chen Wei,Cihang Xie

Publication link

Under review for conference

Seek and You Will Not Find: Hard-To-Detect Trojans in Deep Neural Networks

Mantas Mazeika, Andy Zou, Akul Arora, Pavel Pleskov, Dawn Song, Dan Hendrycks, Bo Li, David Forsyth

Publication link

Under review for conference

Linearity of Relation Decoding in Transformer Language Models

Evan Hernandez*,Arnab Sen Sharma*,Tal Haklay,Kevin Meng,Martin Wattenberg,Jacob Andreas,Yonatan Belinkov,David Bau

Publication link

Under review for conference

BIRD: Generalizable Backdoor Detection and Removal for Deep Reinforcement Learning

Xuan Chen, Wenbo Guo, Guanhong Tao, Xiangyu Zhang, Dawn Song

Publication link

Under review for conference

Benchmarking Neural Network Proxy Robustness to Optimisation Pressure

Andy Zou, Long Phan, Nathaniel Li, Jun Shern Chan, Mantas Mazeika, Aidan O’Gara, Steven Basart, Jonathan Ng, Scott Emmons, Zico Kolter, Matt Fredrikson, Dan Hendrycks

Publication link

Under review for conference

LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models.

Marwa Abdulhai, Isadora White, Charlie Snell, Charles Sun, Joey Hong, Yuexiang Zhai, Kelvin Xu, Sergey Levine

Publication link

Under review for conference

Future Lens: Anticipating Subsequent Tokens from a Single Hidden State

Koyena Pal, Jiuding Sun, Andrew Yuan, Byron C. Wallace, David Bau

Publication link

Under review for conference

Can LLMs Follow Simple Rules?

Norman Mu,Sarah Chen,Zifan Wang,Sizhe Chen,David Karamardian,Lulwa Aljeraisy,Dan Hendrycks,David Wagner

Publication link

Under review for conference

Function Vectors in Large Language Models

Eric Todd,Millicent L. Li,Arnab Sen Sharma,Aaron Mueller,Byron C. Wallace,David Bau

Publication link

Under review for conference

D^3: Detoxing Deep Learning Dataset

Lu Yan,Siyuan Cheng,Guangyu Shen,Guanhong Tao,Kaiyuan Zhang,Xuan Chen,Yunshu Mao,Xiangyu Zhang

Publication link

Under review for conference

Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models

Deqing Fu,Tian-Qi Chen,Robin Jia,Vatsal Sharan

Publication link

Under review for conference

Universal Jailbreak Backdoors from Poisoned Human Feedback

Javier Rando, Florian Tramèr

Publication link

Under review for conference

Generalization Analogies (GENIES): A Testbed for Generalizing AI Oversight to Hard-To-Measure Domains

Joshua Clymer,Garrett Baker,Rohan Subramani,Sam Wang

Publication link

Under review for conference

The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"

Lukas Berglund, Meg Tong, Max Kaufmann, Mikita Balesni, Asa Cooper Stickland, Tomasz Korbak, Owain Evans

Publication link

Under review for conference