Our Work
Philosophy Fellowship

CAIS Philosophy Fellowship 2023

Clarifying the conceptual foundations of AI safety


Program Details
2023 Fellows
2023 Speakers
Fellowship News

Applications for the 2023 fellowship are now closed. Thanks to everyone who applied.

The Program

As AI capabilities continue to improve dramatically, the need for safety research has become increasingly apparent. But given the relative youth of the field, much of the conceptual groundwork has yet to be done.

The CAIS Philosophy Fellowship invites philosophers from a variety of backgrounds to acquire an in-depth understanding of the current state of AI safety and contribute to novel and field-orienting research directions.

Philosophy Fellowship Motivation

How the work of philosophers contributes to the broader sociotechnical AI safety community.

CAIS Philosophy Fellowship One

1. Conceptual problem

Identify a lack of conceptual clarity in the existing AI safety literature.

CAIS Philosophy Fellowship Two

2. Conceptual Clarification

Dissect the problem using rigorous conceptual analysis and relevant philosophical literature.

CAIS Philosophy Fellowship Three

3. Sociotechnical Orientation

Publish conceptual research to inform sociotechnical strategy.

CAIS Philosophy Fellowship One

1. Conceptual Problem

Advanced AI creates unique conceptual difficulties.

Artificial intelligence is reshaping many aspects of day-to-day life. As AI continues on the trajectory to outperform humans on a wide range of cognitive tasks, questions about their properties and potential harms grow increasingly urgent.

Conceptual Examples:

  • How can we build systems that are more likely to behave ethically in the face of a rapidly changing world?
  • What processes might shape the behavior of advanced AI systems?
  • Could advanced AI systems pose an existential risk, and if so, how?
CAIS Philosophy Fellowship Two

2. Conceptual Clarification

Academic philosophers are particularly well-positioned to address these conceptual difficulties

Philosophers are experts at thinking hard about abstract conceptual problems with no clear answers. Their expertise in working with imprecise concepts makes them the ideal candidates to address the conceptual issues that are characteristic of AI safety. 

CAIS Philosophy Fellowship Three

3. Sociotechnical Orientation

Conceptual clarity orients the broader sociotechnical landscape

Having the frameworks to analyze which concerns are the most urgent, which are the most likely candidates for serious harm, and how to navigate these risks enables researchers and key decision-makers to reassess their strategies.

Goals & Outcomes:

This fellowship addresses the need for conceptual clarification through research and field-building efforts.


Our team of philosophers critique and build on the existing conceptual AI safety literature, producing new conceptual frameworks to guide technical research. 

Thus far, our fellows have collectively produced eighteen original papers, soon to be published, covering topics including interpretability, corrigibility, and multipolar scenarios, to name a few.


We aim for the influence of this fellowship to extend beyond our current cohort, promoting and incentivizing conceptual AI safety research within the broader academic philosophy community. 

To date, our fellows have received $50,000 in funding to run a workshop connecting technical and conceptual AI safety researchers, organized numerous workshops, and created a special issue journal publication in Philosophical Studies.

2023 Fellowship:

2023 Guest Speakers:

Peter Railton

Gregory S. Kavka Distinguished Professor of Philosophy at the University of Michigan - Ann Arbor

Hilary Greaves

Professor of Philosophy at the University of Oxford, Former Director of the Global Priorities Institute

Shelly Kagan

Clark Professor of Philosophy at Yale University

Vincent Müller

Alexander von Humboldt Professor of Ethics and Philosophy of AI at the University of Erlangen-Nuremburg

L.A. Paul

Millstone Family Professor of Philosophy and Professor of Cognitive Science at Yale University

Victoria Krakovna

AI Research Scientist at DeepMind

Jacob Steinhardt

Assistant Professor of Computer Science and AI at UC Berkeley

David Krueger

Assistant Professor of Computer Science and AI at Cambridge University

Walter Sinnott-Armstrong

Chauncey Stillman Professor of Ethics at Duke University

Lara Buchak

Professor of Philosophy at Princeton University

Johann Frick

Associate Professor of Philosophy at the University of California, Berkeley

Wendell Wallach

Hastings Center senior advisor, ethicist, and scholar at Yale’s Center for Bioethics

Rohin Shah

Research Scientist at DeepMind

2023 Fellowship News

Stay up to date on the latest news and research from the CAIS Philosophy Fellowship. Sign up for email alerts and announcements of future programs.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

October 24, 2023

Draft Published - Elliott Thornley: The Shutdown Problem: Three Theorems

October 7, 2023

Publication - Jacqueline Harding: Operationalising Representation in Natural Language Processing (British Journal for the Philosophy of Science)

August 28, 2023

Draft Published - Peter Park, Simon Goldstein, and CAIS contributors.: AI Deception: A Survey of Examples, Risks, and Potential Solutions

August 9, 2023

Journal Article - Nathaniel Sharadin: Predicting and Preferring (Inquiry)

August 2, 2023

Journal Article - Simon Goldstein & Cameron Kirk-Giannini: Language Agents Reduce the Risk of Existential Catastrophe (AI & Society).

July 21, 2023

Workshop - 1st AI Impacts Workshop. This workshop, hosted by the AI & Humanity Lab at the University of Hong Kong, will focus on the topic of benchmarking for ML and AI systems. (March 14-15, 2024).

July 19, 2023

Draft Published - Mitchell Barrington: Absolutist AI.

July 14, 2023

Op-Ed - Jacqueline Harding & Cameron Kirk-Giannini: AI's Future Worries Us. So Does it's Present (Boston Globe).

July 6, 2023

Op-Ed - Nathaniel Sharadin: Hong Kong can be a leader in mitigating the dangers of AI (Hong Kong Free Press).

July 4, 2023

Blog Post - Simon Goldstein & Cameron Kirk-Giannini: A Case for AI Wellbeing (DailyNous).

June 30, 2023

Publication - Jacqueline Harding, William D'Alessandro, Nicholas Laskowski, & Robert Long: AI Language Models Cannot Replace Human Research Participants (AI & Society).

June 20, 2023

Call for Papers - Submissions for the Philosophical Studies special edition on AI Safety (edited by Cameron Kirk-Giannini and Dan Hendrycks) are due by November 1!

June 14, 2023

Draft Published - J. Dmitri Gallow: Instrumental Convergence?

June 13, 2023

June 9, 2023

June 8, 2023

Op-ed - Nathaniel Sharadin: Most AI Research Shouldn't be Publicly Released (Bulletin of Atomic Scientists).

June 6, 2023

Media - Nathaniel Sharadin on Bloomberg Radio London (from 20:00).

June 1, 2023

Media - Nathaniel Sharadin on BBC Radio (from 2:12:00).

May 31, 2023

Draft Published - Simon Goldstein: Shutdown-Seeking AI

May 31, 2023

Media - Simon Goldstein on SBS News.

May 27, 2023

Draft Published - William D'Alessandro: Is Deontological AI Safe?

May 23, 2023

Draft Published - Cameron Kirk-Giannini & Simon Goldstein: The Polarity Problem.

May 12, 2023

Draft Published - Simon Goldstein: Aggregating Utilities for Corrigible AI.

April 27, 2023

Op-ed - Simon Goldstein & Cameron Kirk-Giannini: Is it Ethical to Create Generative Agents? Is it Safe? (ABC News).

February 20, 2023

Draft Published - Elliott Thornley: There are No Coherence Theorems.