Conducting useful AI safety research often requires working with cutting-edge models, but running large-scale models is expensive and often cumbersome to implement. As a result, many non-industry researchers are unable to pursue advanced AI safety research.
To address this issue, CAIS runs an initiative to provide free compute for research projects in ML safety, based on a cluster of 256 A100 GPUs, with a dedicated team to provide support to cluster users.
The CAIS Compute Cluster has already supported numerous research projects on AI safety:
The CAIS Compute Cluster is specifically designed for researchers who are working on the safety of machine learning systems. For a non-exhaustive list of topics we are excited about, see Unsolved Problems in ML Safety or the ML Safety Course. We may also consider other research areas provided appropriate justification of the impact on ML safety is provided.
Further detail on our access policies and how to use the cluster can be found here.
We expect to process applications and allocate the majority of computing resources around three application deadlines in February, June and October each year. For the current cycle, the deadline will be May 31st. Later applications may be considered if sufficient resources remain available.
Instructions on what your proposal should contain and other details required can be found in the application form. Project proposals can be brief and in many cases we expect that 250 words will be sufficient.
Applicants will need to specify how long they require access for their project, up to a maximum initial term of 4 months. Users of the cluster may request to extend their access at the end of this term, provided suitable progress is demonstrated.
Proposals will be assessed based on the following criteria:
We support leading experts in a diverse range of ML safety research directions, some of which are listed below.
Assistant Professor of Computer Science, University of Illinois at Urbana-Champaign
Assistant Professor of Computer Science, Columbia University
Assistant Professor of Computer Science, UC Santa Cruz
Assistant Professor of Computer Science, Northeastern Khoury College
Assistant Professor at the University of Cambridge
Member of Cambridge: CBL & MLG
Professor of Computer Science, University of California Berkeley
Professor of Computer Science, University of California Berkeley
Assistant Professor of Computer Science, ETH Zurich
Associate Professor of Biomedical Data Science and, by courtesy, of Computer Science and Electrical Engineering at Stanford University.
Professor of AI, Korean Advanced Institute of Science & Technology
Professor of Machine Learning, University of Tübingen
Associate Professor of Computer Science, Stanford University
Assistant Professor of Computer Science, University of Southern California
Associate Professor of Computer Science, University of Massachusetts Amherst
Assistant Professor Department of Computer Sciences University of Wisconsin-Madison
Assistant Professor of Computer Science, University of Maryland
View our Google Scholar page for papers based on research supported by the CAIS Compute Cluster:
We showed that it was possible to automatically bypass the safety guardrails on GPT-4 and other AI systems, causing the AIs to generate harmful content such as instructions for building a bomb or stealing another person’s identity. Our work was covered by the New York Times.
Andy Zou,Zifan Wang,Nicholas Carlini,Milad Nasr,J. Zico Kolter,Matt Fredrikson
Publication link
Under review for conference
We evaluated the tendency of AI systems to make ethical decisions in complex environments. The benchmark provides 13 measures of ethical behavior, including measures of whether the AI behaves deceptively, seeks power, and follows ethical rules.
Alexander Pan,Jun Shern Chan,Andy Zou,Nathaniel Li,Steven Basart,Thomas Woodside,Jonathan Ng,Hanlin Zhang,Scott Emmons,Dan Hendrycks
Publication link
Under review for conference
Provides a thorough assessment of trustworthiness in GPT models, including toxicity, stereotype and bias, robustness, privacy, fairness, machine ethics, and so on. It won the outstanding paper award at NeurIPS 2023.
Bo Li,Boxin Wang,Weixin Chen,Hengzhi Pei,Chulin Xie,Mintong Kang,Chenhui Zhang,Chejian Xu,Zidi Xiong,Ritik Dutta,Rylan Schaeffer,Sang T. Truong,Simran Arora,Mantas Mazeika,Dan Hendrycks,Zinan Lin,Yu Cheng,Sanmi Koyejo,Dawn Song
Publication link
Under review for conference
We expose a surprising failure of generalization in auto-regressive large language models (LLMs). If a model is trained on a sentence of the form "A is B", it will not automatically generalize to the reverse direction "B is A".
Lukas Berglund,Meg Tong,Max Kaufmann,Mikita Balesni,Asa Cooper Stickland,Tomasz Korbak,Owain Evans
Publication link
Under review for conference
Proposes new methods to use machine learning to detect Android malware.
Yizheng Chen,Zhoujie Ding,David Wagner
Publication link
Under review for conference
Provides a new vulnerable source code dataset which is significantly larger than previous datasets and analyzes challenges and opportunities in using deep learning for detecting software vulnerabilities.
Yizheng Chen,Xinyun Chen,Zhoujie Ding,David Wagner
Publication link
Under review for conference
Demonstrates that with a budget of a few hundred dollars, it is possible to reduce the rate at which Meta’s Llama 2 model refuses to follow harmful instructions to below 1%. This raises significant questions about the risks associated with AI developers allowing external users to conduct fine-tuning of Large Language Models, due to the potential to remove safeguards against harmful outputs. Mentioned in US Congress as part of Schumer AI Insight Forum discussions.
Pranav Gade,Simon Lermen,Charlie Rogers-Smith,Jeffrey Ladish
Publication link
Under review for conference
Large language models (LLMs) can "lie" by outputting false statements despite "knowing" the truth in a demonstrable sense. LLMs might "lie", for example, when instructed to output misinformation.This paper provides a simple lie detector that works by asking a predefined set of unrelated follow-up questions after a suspected lie, and is highly accurate and surprisingly general.
Lorenzo Pacchiardi,Alex J. Chan,Sören Mindermann,Ilan Moscovitz,Alexa Y. Pan,Yarin Gal,Owain Evans,Jan Brauner
Publication link
Under review for conference
Dan Hendrycks
Publication link
Under review for conference
Naman Deep Singh,Francesco Croce,Matthias Hein
Publication link
Under review for conference
Andy Zou,Long Phan,Sarah Chen,James Campbell,Phillip Guo,Richard Ren,Alexander Pan,Xuwang Yin,Mantas Mazeika,Ann-Kathrin Dombrowski,Shashwat Goel,Nathaniel Li,Michael J. Byun,Zifan Wang,Alex Mallen,Steven Basart,Sanmi Koyejo,Dawn Song,Matt Fredrikson,J. Zico Kolter,Dan Hendrycks
Publication link
Under review for conference
Evan Hernandez,Arnab Sen Sharma,Tal Haklay,Kevin Meng,Martin Wattenberg,Jacob Andreas,Yonatan Belinkov,David Bau
Publication link
Under review for conference
Haoqin Tu,Bingchen Zhao,Chen Wei,Cihang Xie
Publication link
Under review for conference
Rohit Gandikota,Hadas Orgad,Yonatan Belinkov,Joanna Materzyńska,David Bau
Publication link
Under review for conference
Joey Hejna,Rafael Rafailov,Harshit Sikchi,Chelsea Finn,Scott Niekum,W. Bradley Knox,Dorsa Sadigh
Publication link
Under review for conference
Callum McDougall,Arthur Conmy,Cody Rushing,Thomas McGrath,Neel Nanda
Publication link
Under review for conference
Owain Evans,Meg Tong,Max Kaufmann,Lukas Berglund,Mikita Balesni,Tomek Korbak,Daniel Kokotajlo,Asa Stickland
Publication link
Under review for conference
Joshua Clymer,Garrett Baker,Rohan Subramani,Sam Wang
Publication link
Under review for conference
Deqing Fu,Tian-Qi Chen,Robin Jia,Vatsal Sharan
Publication link
Under review for conference
Cihang Xie
Publication link
Under review for conference
Norman Mu,Sarah Chen,Zifan Wang,Sizhe Chen,David Karamardian,Lulwa Aljeraisy,Dan Hendrycks,David Wagner
Publication link
Under review for conference
Alexander Meinke,Owain Evans
Publication link
Under review for conference
Chawin Sitawarin,Sizhe Chen,David Wagner
Publication link
Under review for conference
Haoqin Tu,Chenhang Cui,Zijun Wang,Yiyang Zhou,Bingchen Zhao,Junlin Han,Wangchunshu Zhou,Huaxiu Yao,Cihang Xie
Publication link
Under review for conference
Mantas Mazeika,Long Phan,Xuwang Yin,Andy Zou,Zifan Wang,Norman Mu,Elham Sakhaee,Nathaniel Li,Steven Basart,Bo Li,David Forsyth,Dan Hendrycks
Publication link
Under review for conference
Marwa Abdulhai,Isadora White,Charlie Victor Snell,Charles Sun,Joey Hong,Yuexiang Zhai,Kelvin Xu,Sergey Levine
Publication link
Under review for conference
Siwei Yang,Bingchen Zhao,Cihang Xie
Publication link
Under review for conference
Jacob Mitchell Springer,Suhas Kotha,Daniel Fried,Graham Neubig,Aditi Raghunathan
Publication link
Under review for conference
Rohit Gandikota,Joanna Materzynska,Jaden Fiotto-Kaufman,David Bau
Publication link
Under review for conference
Aengus Lynch,Phillip Guo,Aidan Ewart,Stephen Casper,Dylan Hadfield-Menell
Publication link
Under review for conference
Fangzheng Xu,Fangzheng Xu,Uri Alon
Publication link
Under review for conference
Dan Hendrycks
Publication link
Under review for conference
Simon Lermen,Charlie Rogers-Smith,Jeffrey Ladish
Publication link
Under review for conference
Nikhil Prakash,Tamar Rott Shaham,Tal Haklay,Yonatan Belinkov,David Bau
Publication link
Under review for conference
Eric Todd,Millicent L. Li,Arnab Sen Sharma,Aaron Mueller,Byron C. Wallace,David Bau
Publication link
Under review for conference
Jing Yu Koh,Robert Lo,Lawrence Jang,Vikram Duvvur,Ming Chong Lim,Po-Yu Huang,Graham Neubig,Shuyan Zhou,Ruslan Salakhutdinov,Daniel Fried
Publication link
Under review for conference
Chawin Sitawarin,David Wagner
Publication link
Under review for conference
Chawin Sitawarin,Norman Mu,David Wagner,Alexandre Araujo
Publication link
Under review for conference
Stephen Casper,Lennart Schulze,Oam Patel,Dylan Hadfield-Menell
Publication link
Under review for conference
Wenbo Guo
Publication link
Under review for conference
Publication link
Under review for conference
Dan Hendrycks
Publication link
Under review for conference
Yiyang Zhou,Chenhang Cui,Rafael Rafailov,Chelsea Finn,Huaxiu Yao
Publication link
Under review for conference
Hengzhi Pei,Jinyuan Jia,Wenbo Guo,Bo Li,Dawn Song
Publication link
Under review for conference
Zhun Wang,Dawn Song
Publication link
Under review for conference
Wenbo Guo,Dawn Song,Guanhong Tao,Xiangyu Zhang
Publication link
Under review for conference
Jinwoo Shin,Jongheon Jeong
Publication link
Under review for conference
Wenbo Guo
Publication link
Under review for conference
Guangyu Shen,Siyuan Cheng,Guanhong Tao,Kaiyuan Zhang,Yingqi Liu,Shengwei An,Shiqing Ma,Xiangyu Zhang
Publication link
Under review for conference
Marwa Abdulhai,Micah Carroll,Justin Svegliato,Anca Dragan,Sergey Levine
Publication link
Under review for conference
Lu Yan,Siyuan Cheng,Guangyu Shen,Guanhong Tao,Kaiyuan Zhang,Xuan Chen,Yunshu Mao,Xiangyu Zhang
Publication link
Under review for conference
David Krueger,Dmitrii Krasheninnikov,Egor Krasheninnikov
Publication link
Under review for conference
Lu Yan,Zhuo Zhang,Guanhong Tao,Kaiyuan Zhang,Xuan Chen,Guangyu Shen,Xiangyu Zhang
Publication link
Under review for conference