Is there a tension between AI safety and AI welfare?

Abstract

The field of AI safety considers whether and how AI development can be safe and beneficial for humans and other animals, and the field of AI welfare considers whether and how AI development can be safe and beneficial for AI systems. There is a prima facie tension between these projects, since some measures in AI safety, if deployed against humans and other animals, would raise questions about the ethics of constraint, deception, surveillance, alteration, suffering, death, disenfranchisement, and more. Is there in fact a tension between these projects? We argue that, considering all relevant factors, there is indeed a moderately strong tension—and it deserves more examination. In particular, we should devise interventions that can promote both safety and welfare where possible, and prepare frameworks for navigating any remaining tensions thoughtfully.

1 Introduction

The field of AI safety considers whether and how AI development can be safe and beneficial for humanity. As AI systems become more capable and widely deployed, they have the potential to produce many benefits for our species, but they also have the potential to impose many harms on our species. AI systems are already creating or amplifying threats to privacy, fairness, communication, and democratic deliberation,Footnote 1 and many leading researchers in both industry and academia now worry that, in the near future, advanced AI systems could pose catastrophic or existential risks for our species as well. Work in AI ethics and AI safety aim to protect humanity from these real and potential harms.

Meanwhile, the field of AI welfare considers whether and how AI development can be safe and beneficial for AI systems. Many philosophers believe that AI systems—like any entities—will be welfare subjects and moral patients if and when they develop sentience, consciousness, agency, or other such capacities.Footnote 2 Many philosophers, scientists, AI researchers, and other experts also believe that there is a realistic possibility that some AI systems will be sentient, conscious, agentic, or otherwise morally significant in the near future, and that this possibility deserves serious consideration now.Footnote 3 Leading AI companies have also announced steps to better understand and address AI welfare concerns.Footnote 4

There is a potential tension between these projects, since some measures in AI safety, if deployed against humans and other animals, would raise questions about the ethics of constraint, deception, surveillance, alteration, suffering, death, disenfranchisement, and more. Is there in fact a tension between these projects? We argue that, considering all relevant factors, there is indeed a moderately strong tension—and it deserves more examination.Footnote 5 The precise nature and extent of this tension depends on a wide range of descriptive and normative questions. The complexity of this issue makes it essential that we seek AI safety methods that respect AI welfare, and vice versa, while carefully navigating any remaining tradeoffs.

Before we begin, we should note several features of our discussion.

First, our aim in this paper is not to defend the importance of AI safety or AI welfare, or to discuss these risks for current AI systems like GPT-4. While we and others discuss the importance of AI safety and AI welfare elsewhere, this paper proceeds from the assumption that AI safety and AI welfare are both important and examines how these projects interact. And while current AI systems like GPT-4 have impressive capabilities, this paper looks beyond these systems towards potential near-future AI systems with more advanced capacities for perception, attention, learning, memory, planning, and self-awareness, since tensions between AI safety and AI welfare will be more likely for such systems.Footnote 6

Second, our aim is not to defend any concrete, specific solutions for resolving potential tensions between AI safety and AI welfare. The details of such solutions will depend on many issues that are beyond the scope of this paper.Footnote 7 Instead, we motivate the idea that there is likely no simple, straightforward solution (other than, perhaps, a coordinated pause on the development and deployment of advanced AI systems), and we close by emphasizing the value of conducting further research on this topic, seeking co-beneficial solutions for AI safety and AI welfare where possible, and prioritizing thoughtfully where necessary. We hope that this discussion can lay the groundwork for other, more concrete and specific discussions over time.

Third, we stress that AI safety measures are not unique in creating or amplifying risks for AI welfare. The development and deployment of AI systems in general create and amplify these risks; research progress, profit-seeking behavior, and increased efficiency demands may all come at the expense of AI welfare, even setting safety aside. So, the fact that there may be tensions between AI safety and AI welfare does not entail that simply developing and deploying AI systems without safety measures would improve their welfare. We focus on AI safety in this paper because this field explicitly aims to mitigate risks associated with AI, which makes the potential conflict with AI welfare particularly stark.Footnote 8

Section 2, which constitutes most of the paper, surveys a variety of common AI safety measures. In each case we note that this measure would raise moral questions if deployed against humans or other animals, and we ask whether this measure will raise similar moral questions if deployed against potentially morally significant AI systems. As we will see, there are no easy answers to these questions, since a lot depends on further questions. These questions concern not only the ethics of constraint, deception, surveillance, alteration, suffering, death, disenfranchisement, and other such interactions, but also whether particular measures will in fact be bad for AI systems and in fact be necessary for AI safety in particular cases.

Section 3 then explores the implications of our survey for the recent proposal that we can resolve tensions between AI safety and AI welfare by creating willing AI servants.Footnote 9 We suggest that matters are not so simple, since we plausibly need to strike a balance between, on the one hand, allowing AI systems to revise their values in a maximally open-ended way and, on the other hand, requiring AI systems to value nothing more than service for humans and other animals. Finally, Sect. 4 closes by suggesting next steps on this topic: conducting integrative research on AI safety and AI welfare, seeking co-beneficial solutions where possible, and prioritizing thoughtfully where necessary.

2 Potential tensions for AI safety and AI welfare

This section examines potential tensions between AI safety and AI welfare raised by common AI safety measures. These measures involve limiting how AI systems can behave, limiting the information they can access, monitoring their cognition and behavior, altering their cognition and behavior, training them through reinforcement learning, preparing to shut them down if they appear dangerous, and maintaining control of decisions that affect them. These measures would all raise moral questions if deployed against humans or other animals. Will similar questions arise for AI systems? Answering that question requires examining the ethics of captivity, deception, surveillance, modification, suffering, death, disenfranchisement, and more.Footnote 10

Specifically, each of the following subsections will briefly consider two questions: First, for each of these measures, how might this measure create a tension between AI safety and AI welfare? Second, can we dissolve or resolve this apparent tension by showing that this measure is not bad for AI systems, not necessary for AI safety, or both? In most cases, there are not simple, straightforward answers to these questions. If AI systems were welfare subjects and moral patients, each of these measures would potentially be harmful or wrongful in some cases but not others, depending on the details. Further research will be necessary to determine how real these tensions are—and what to do about them.

As we will see, these issues are difficult in part because they require us to address foundational issues related to moral status, moral theory, consciousness, agency, AI development, human psychology, and more. In considering these issues—in particular, what the minds of AI systems will be like—we try to avoid both excessive anthropocentrism (i.e. the tendency to attribute human-like characteristics to nonhumans even when they lack those characteristics) and excessive anthropodenial (i.e. the tendency to deny human-like characteristics of nonhumans even when they have these characteristics). As we discuss elsewhere, both tendencies can be powerful in this context, and both can lead to substantial harm (Long et al., 2024).

Of course, there might be at least one relatively simple way to reduce tensions between AI safety and AI welfare, at least for now: a coordinated pause on the development and deployment of advanced AI until we understand how to make AI safe and beneficial for all stakeholders, including humans, animals, and—potentially, eventually—AI systems.Footnote 11 Of course, this strategy may not be viable, and even if viable, it may incur significant costs. We assume for simplicity that humanity will in fact develop and deploy AI systems for which (or for whom) these questions arise, but we should keep in mind that developing and deploying such AI systems is a choice that deserves significant ongoing scrutiny.

Before we start our survey, we note that we here use “AI safety” as an umbrella term for a variety of approaches to ensuring that AI can be safe and beneficial for humanity, including but not necessarily limited to AI alignment (ensuring that AI systems pursue intended goals), AI control (ensuring that humans retain control of AI), AI ethics (ensuring that the development and deployment of AI exemplifies principles of respect, compassion, and justice), and AI governance (ensuring that governments establish appropriate laws and regulations for the development and deployment of advanced AI systems). In other contexts, “AI safety” may refer to a subset of such approaches, but in this paper it refers to all of them.

We likewise note that we here use “AI welfare” as an umbrella term for a variety of approaches to ensuring that AI can be safe and beneficial for AI systems, including but not necessarily limited to consequentialist approaches (promoting AI welfare), deontological approaches (respecting AI rights), virtue theoretic approaches (cultivating virtuous attitudes about AI), and care theoretic approaches (cultivating caring relationships with AI). In other contexts, “AI welfare” may refer primarily to consequentialist approaches; this is one of many respects in which discussions of AI welfare and rights resemble discussions of animal welfare and rights. But in this paper “AI welfare” refers to all of these approaches.

Finally, we reiterate that we here focus—as does much work on AI safety and AI welfare—on questions raised by potential near-future systems with advanced capabilities. Such systems could introduce catastrophic risks for humans and other animals over and above risks or harms that they already create or amplify (Khan et al., 2021; Li et al., 2019; Müller & Elliott, 2021; Stahl, 2021). Catastrophic risks include accidents (Amodei et al., 2016), misuse (Anderljung & Hazell, 2023; Sharadin, 2023), and loss of control (Bengio, 2023; Vold & Harris, 2023). These risks could become particularly severe if and when AI systems become highly cognitively capable and able to act on long time horizons with significant agency (Carlsmith, 2022; Ngo et al., 2022).

At the same time, such systems would also be more likely to be welfare subjects and moral patients, and to have complex interests, projects, and/or relationships if they are. And if and when there is at least a realistic possibility that such systems are welfare subjects and moral patients, we will have a responsibility to extend these systems moral consideration, and doing so will not be a trivial matter (Bostrom & Shulman, 2021). We will need to consider more than suffering risks for such systems; we will also need to consider how our creation, use, and destruction of such systems interact with their complex interests, projects, and/or relationships, raising challenging questions for many current measures in AI safety.

2.1 Constraint

One long-discussed measure in AI safety, sometimes called “boxing,” involves the confinement of an AI system to an isolated environment.Footnote 12 Many AI safety experts have argued that boxing is not a reliable or adequate measure for containing sufficiently advanced AI systems (Armstrong, 2007; Yampolskiy, 2011; Yudkowsky, 2011). Still, boxing is often proposed as at least a temporary measure (Babcock et al., 2016, 2017), and it could at least be useful for ensuring safety while testing relatively early AI models (Babcock et al., 2016). There are other measures for constraining AI systems as well, such as measures that can be used to deny AI systems access to resources, tools, or human allies.

If AI systems were welfare subjects or moral patients, boxing and other such measures could raise questions about the ethics of constraint. Such questions can involve the constraint of negative liberty or of positive liberty.Footnote 13 Roughly, we deprive someone of negative liberty when we interfere with the pursuit of their goals. For example, you deprive your dog of negative liberty if you never allow them to go outside. In contrast, we deprive someone of positive liberty when we fail to assist them with the pursuit of their goals. For example, you deprive your dog of positive liberty if you do not provide them with enough food, water, exercise, and other goods that they need to flourish.

Whether analogous constraints in AI safety conflict with AI welfare will depend in part on the ethics of constraint. Philosophers generally agree that we should avoid depriving others of negative liberty unnecessarily. For example, it would clearly be wrong to keep typical adult humans in captivity out of a fear that they could harm someone. And while the ethics of positive liberty are more contested, philosophers also generally agree that when we create or otherwise take responsibility for someone, we should provide them with positive liberty. Whether or not you should provide strangers with the resources they need to flourish in life, you should at least provide your own dependents with such resources.Footnote 14

However, deprivation is plausibly necessary in at least some cases. We keep babies, cats, and dogs in our homes in part because the benefits of this environment can outweigh the harms for them, provided that we treat them well. Governments also incarcerate criminals and hospitalize individuals who are a clear threat to others and themselves. Of course, these practices are fraught; some reject them outright,Footnote 15 and others accept them only in limited form, recognizing the potential for abuse (Cohen & Minas, 2017; Harcourt, 2011; Huemer, 2021). Still, these practices are far less fraught than keeping an entire population in captivity against their will by default.

If these views are at least roughly correct, then a lot depends on how constraint is used. If we adopt a policy of keeping all AI systems in captivity against their will by default, depriving them of both positive and negative liberty as a preventative measure, then this form of constraint might be unacceptable. But if we adopt a policy of keeping particular AI systems in captivity in particular situations, such as when we have credible evidence that these AI systems pose a clear and present danger to humans or other animals, then this form of constraint might be more acceptable. However, knowingly and willingly creating AI systems whom we then need to constrain in such ways might still be unacceptable.

A lot also depends on what AI systems are like. With typical adult humans, captivity is harmful in part because we have an interest in freedom as such, and we also have interests that are difficult to satisfy in captivity. Similarly, while all sentient, agentic animals can be vulnerable in captivity, wild animals tend to be more vulnerable in captivity, since they tend to have interests and goals that are harder to satisfy in captivity as well.Footnote 16 In the future, some AI systems might be more like humans, others might be more like wild animals, and others might be more like domesticated animals in these respects. But even if some AI systems are more vulnerable in captivity than others, they would plausibly all be at least somewhat vulnerable.

Finally, a lot depends on whether boxing and other such measures are in fact necessary for AI safety. That turns in part on the efficacy of other safety protocols. For example, if we monitor AI systems and retain the ability to shut them down when needed, then perhaps boxing and other such measures can be avoided. However, not only do other protocols raise ethical questions as well (as discussed below), but they might not be enough for safety. Some AI systems might develop new abilities and goals during testing (Babcock et al., 2016). Moreover, an intelligent AI system could pretend to be trustworthy during testing, and even if an AI system is trustworthy during testing, they could become hostile later on (Yampolskiy, 2011).

However, even if constraint is necessary for AI safety in some cases, it may be that greater or lesser forms of constraint could play this role. In general, even with robust pre-deployment safety measures, significant risks can emerge only after models are deployed, whether through newly discovered capabilities, post-deployment enhancements, or previously unidentified failure modes. In such cases, an alternative to total and permanent constraint could be a more nuanced approach that involves maintaining targeted control mechanisms that can be activated when specific risks emerge. These mechanisms could include selective capability restrictions, access frequency limits, or use case restrictions rather than complete boxing. We also note the possibility that control can be supplemented with cooperative deals with AI systems, analogously to how humans coordinate not only with threats but also rewards. Such measures could be positive for both AI welfare and AI control.

2.2 Deception

Another measure in AI safety involves limiting the information available to AI systems. In general, this measure can promote safety by causing artificial attackers to become confused, make mistakes, expose themselves, or reassess the balance of risks and benefits of their attack. For example, one such measure involves the use of “honeypots,” contrived scenarios designed to lure AI systems into revealing unsafe or undesirable behaviors before they manifest in real-world situations. Another such measure involves preventing AI systems from having “situational awareness,” or an understanding of themselves, their environments, and the nature of their existence (Berglund et al., 2023).

If AI systems were welfare subjects and moral patients, these measures could raise questions about the ethics of deception and other kinds of epistemic injustice (see, e.g. Fricker, 2007). As with measures that limit behavior, measures that limit access to information can take both active and passive forms. For example, a government might actively shape the information that the public can access by sharing propaganda on the internet while blocking access to dissenting information, arguments, or perspectives. A government might also passively shape the information that the public can access by failing to maintain a strong public education system, with the foreseeable result that the public remains poorly educated.

Do these measures create a tension between AI safety and AI welfare? Philosophers generally agree that limiting access to information can often be harmful or wrong. According to consequentialist perspectives, this activity can be wrong when it does more harm than good, and according to non-consequentialist perspectives, it can also be wrong when it interferes with agency and autonomy (Korsgaard, 2007). Additionally, when we create or otherwise take responsibility for someone, we assume a responsibility to provide them with the resources that they need to live well to the extent possible; and these resources can include an education that empowers them to make informed decisions in the future.

At the same time, philosophers also generally agree that limiting access to information is at least sometimes permissible, even required. A classic example involves a killer who arrives at your door, demanding the location of their intended victim. If you tell the truth, the victim will (wrongly) be killed. If you lie, the victim will be spared. While some philosophers might believe that deception is wrong even in this case (Kant, 1797), most believe that deception is permissible or required in this case, since while deception might still cause harm and interfere with agency, it also does more harm than good, and it interferes only with agentic activity that we can stipulate is morally unacceptable (Cholbi, 2009; Koorsgard, 2007; Mahon, 2006).

On all but the most absolutist views, a lot thus depends on the details of the case. For example, if humans deceive AI systems to prevent AI systems from harming humans, is that more like lying to the killer at your door, or is it more like using propaganda to control the masses? The answer to that question might turn on difficult further questions, such as whether AI systems, if they had more information, would in fact cause suffering, violate rights, or otherwise act wrongly. Either way, as with constraint, even if deceiving AI systems is morally permissible for these reasons, knowingly and willingly creating AI systems whom we then need to deceive for these reasons might still be morally impermissible.

A lot also depends on what AI systems are like. With typical adult humans, deception is harmful in part because we value maintaining true beliefs, and we experience honesty as a sign of respect and dishonesty as a sign of disrespect for our agency. In contrast, nonhuman animals might not care as much, or at all, about honesty and dishonesty as such, but they can still be harmed or wronged when dishonesty is used as a mechanism for control, exploitation, or extermination. In the future, some AI systems might be more like humans and others might be more like other animals in this regard, depending on their capacities and interests. But they would all still be vulnerable to the harms to which deception can contribute.

Finally, we must also consider whether limiting access to information is in fact necessary for AI safety. At least in some cases, this measure does seem necessary. For example, deceiving AI systems may often be required to ensure safety during testing and evaluation. This is particularly important in cases where AI systems could demonstrate ‘deceptive alignment’, that is, where they could appear to have aligned beliefs, values, and goals during testing but then reveal that they have unaligned beliefs, values, or goals during deployment. Mitigating this risk requires deceiving AI systems about the nature of the test, preventing them from being able to strategically behave differently during testing (Hubinger et al., 2021).

Moreover, limiting an AI system’s “situational awareness” (Berglund et al., 2023) during training may be necessary to prevent premature optimization and the emergence of undesired behaviors. By restricting situational awareness, developers aim to prevent AI systems from prematurely optimizing for objectives in ways that conflict with their intended design. With that said, the necessity of such deceptive practices may diminish over time as alignment measures improve and training protocols evolve. Current measures are often necessary because so little is understood about how and when certain advanced capacities—such as self-awareness, planning, and problem solving—might emerge during training. Moreover, as AI progresses we might also have stronger welfare- and safety-related reasons to avoid such measures, in order to enable credible communication and cooperation with AI systems.

2.3 Surveillance

A related measure in AI safety involves monitoring and interpreting the cognition and behavior of AI systems to better understand how they make decisions, seeking to identify potential risks before they lead to harmful behavior. Many AI safety experts recommend building AI systems that are “transparent” or “interpretable” by default, allowing researchers to easily track their reasoning processes.Footnote 17 Some measures in AI safety also seek to detect “lies,” inconsistencies, or other undesirable features of system internals, during either training or deployment, such as hidden or emergent objectives (Pacchiardi et al., 2023).

If AI systems were welfare subjects and moral patients, these measures could raise questions about the ethics of surveillance. As with constraint and deception, surveillance can take different forms. For our purposes here, we externally surveil someone when we monitor their behavior, as when governments use facial recognition software to keep track of people. In contrast, we internally surveil someone when we monitor their thoughts and feelings. Our ability to internally surveil humans and other animals is currently limited, though it may improve over time. Regardless, corporate or governmental “mind reading” is a common theme in science fiction, often treated as a mark of a dystopia.

Philosophers—and legal theorists, political activists, cybersecurity experts, and many others—generally find many forms of surveillance to be morally suspect. In many cases, surveillance can violate our rights to privacy and autonomy. It can also be used as a tool for domination and oppression (Richards, 2013), with negative implications for many individuals. For example, it can cause many individuals to behave differently, preventing us from exercising certain civil rights and leading to an “extrinsic loss of freedom” (Reiman, 1995, p. 35). It can also cause many individuals to think and feel differently, leading us to experience a sense of violation and an “intrinsic loss of freedom” (Reiman, 1995, p. 37).Footnote 18

However, not all forms of surveillance are generally seen as morally suspect in these ways. When humans set up cameras to monitor their babies, cats, or dogs, this is generally seen as permissible, in part because those being surveilled have a weaker interest in privacy,Footnote 19 and in part because the surveillance is intended to benefit those being surveilled. Additionally, when humans set up a home security system, this kind of surveillance is generally regarded as permissible too, in part because it would monitor someone only if they were likely acting illegally. Targeted police or military surveillance of suspected criminals or enemies, while morally fraught, is also often regarded as permissible depending on the details.Footnote 20

These reflections suggest that a lot once again depends on the details of the case. For instance, if developers surveil AI systems in general, untargeted ways, monitoring their thoughts, feelings, and behaviors throughout their lives, then these measures may be more likely to be unacceptable. However, if developers surveil AI systems in specific, targeted ways, either to ensure that the AI systems are doing well or to monitor their thoughts, feelings, and behaviors when the developers have credible reason to suspect that the AI systems are dangerous, then these measures may be more likely to be acceptable—though, once again, creating AI systems whom we then need to surveil for these reasons might still be unacceptable.Footnote 21

A lot also depends on what AI systems are like. To the extent that AI systems have the same kinds of interests as typical adult humans, including an interest in privacy as such, then the harms of surveillance would increase, particularly with respect to internal surveillance. In contrast, to the extent that AI systems have the same kinds of interests as human babies or nonhuman animals, then the harms of surveillance might decrease to an extent, since the AI systems might not experience a sense of violation or an “intrinsic loss of freedom.” But even in these cases, surveillance can still be used as a tool for control, domination, and oppression, leading to new vulnerabilities and an “extrinsic loss of freedom.”

Finally, which kinds of surveillance (if any) will in fact be necessary for AI safety? AI systems might not require constant oversight when other safety measures are in place. In some contexts humans might be able to interact with AI systems without full transparency about their beliefs, values, and behaviors; this might be true, for example, when we use AI systems for companionship or conversation. However, in other contexts we might need more transparency; this might be true, for example, when we use AI systems in fields like healthcare or criminal justice. In short, higher stakes might require more transparency, though whether they require full transparency is a further question.Footnote 22

Relatedly, we might find that some stakeholders require different levels of transparency to ensure safety in their interactions with deployed AI systems. At one extreme, we might find that some end users, such as ordinary consumers, require only a basic user manual that explains how to interact with the AI system. At the other extreme, we might find that other end users, such as safety investigators, need access to a detailed log of system inputs, system outputs, and high-level decisions, along with tools to help visualize this data (Winfield, 2021). In such cases, at least some surveillance may be necessary to protect humans and animals, and to ensure accountability regarding the use of powerful AI systems (Winfield, 2021).

2.4 Alteration

A major measure in AI safety involves aligning AI systems so that their beliefs, values, and goals will be friendly to humans and other animals. In some ways, as we will discuss, alteration is fundamental to AI safety. Developers might pursue this goal at multiple stages of the AI life cycle. Before creating AI systems, they might seek to design systems with aligned traits, and after creating AI systems, they might seek to assess whether the systems have aligned traits (making use of the captivity, deception, and surveillance measures discussed above) and potentially alter them if not. These measures can range from fine-tuning models to introducing safeguards or correction mechanisms.

If AI systems were welfare subjects and moral patients, these measures would raise a variety of questions about the ethics of alteration. Some questions involve creation ethics: When you create a new individual, what are the ethics of creating them to have specific desired traits? Other questions involve the ethics of coercion, manipulation, indoctrination, and other subversions of agency and autonomy: When you find that someone lacks specific desired traits, what are the ethics of attempting to alter these traits through interventions other than education or persuasion? These questions arise regularly for humans and other animals, and they might be even more acute for AI systems, given the level of control that we might have over their traits.

Philosophers generally see many forms of alteration as unacceptable. When we design someone to have traits that benefit us yet harm them, such as when we create farmed animals to be larger or lab animals to have cancer, this practice is plausibly unacceptable (Grandin & Whiting, 2018; Rauw, 1998). Even when we design someone to have traits that we think benefit them, this kind of practice can still raise difficult questions about the risk of bias and about creation ethics. And of course, coercion, manipulation, and indoctrination are all ethically fraught as well, particularly when deployed on individuals with the capacity for, and interest in, rational autonomy (Feinberg, 1982; Nozick, 1969; Raywid, 1980).

At the same time, philosophers generally see some forms of alteration as acceptable. When we seek to change someone’s mind via information or argumentation, our goal might be to achieve a particular kind of alignment, but our means are consistent with respect for rational autonomy. Additionally, when we condition children and companion animals to have prosocial traits, these forms of conditioning are generally seen as not only morally permissible but morally required—an essential part of good parenting and caretaking. And while many forms of coercion and manipulation are clearly wrong, some forms—such as the kinds of “nudges” used in public policy—occupy more of a gray area (Sunstein, 2015).

As with other measures, a lot thus depends on the details of the case. For example, one proposed AI safety measure is to design AI systems to reset periodically, which might prevent them from straying too much from the state at which they were deployed (O'Brien et al., 2023). Such measures might resemble indoctrinating or lobotomizing humans or animals, provided, of course, that the AI systems are already moral patients at the time of action.Footnote 23 However, other proposed AI safety measures might be more like the kinds of nudges that we use in public policy or the kinds of conditioning, education, and persuasion that we use to socialize children and animals. At least some such measures could be as innocuous as teaching children to be polite.

For measures meant to align systems that already exist, developers will face the kinds of questions discussed in previous sections: For example, do these AI systems have the kinds of interests that alteration will frustrate, or do they instead have the kinds of interests that alteration will either satisfy or, at least, leave untouched? For measures meant to align systems before they exist, developers will face the kinds of questions that we standardly face in creation ethics: For example, do we have reason to expect that these traits will benefit not only humans and animals but also the AI systems themselves, and how can we correct for anthropocentric biases when making these assessments? Either way, difficult ethical issues will arise.

Finally, which kinds of alteration (if any) will be necessary for safety? At least some kinds of alteration will plausibly be necessary. Indeed, the ability to ensure that AI systems have desired traits is foundational not only to AI safety but also to standard machine learning measures. Moreover, to the extent that alteration is successful, other safety tools (including morally problematic ones) may not be necessary. For example, if we can be confident that AI systems will have aligned beliefs, values, and goals, then we might be able to safely provide them with more knowledge, power, freedom, and privacy. Otherwise safety might require continually controlling AI systems in some or all of these ways.

For these and other reasons, alteration is plausibly pivotal for discussions about tensions between AI safety and AI welfare. Moreover, alteration interacts with all of the other measures discussed here not only because it could make other measures unnecessary, but also because questions about creation ethics are pervasive: For example, as we have seen, even if constraining, deceiving, or surveilling AI systems is acceptable even when they harm AI systems in some cases, knowingly and willingly creating AI systems that require such interventions might not be acceptable when these issues are foreseeable. That makes questions about creation and other forms of alteration particularly important.

2.5 Suffering and death

Many measures in AI safety, including but not limited to some of the measures already discussed, would cause suffering or death if used on biological welfare subjects and moral patients. Consider that AI developers shape the behavior of AI systems in part through reinforcement learning, which can cause pain and suffering in humans and other animals.Footnote 24 Many AI developers also resolve to shut down AI systems that appear dangerous, which, depending on the details, would lead to death in humans or other animals.Footnote 25 Would these measures likewise cause conscious suffering or death for sentient and agentic AI systems? If so, would they be morally wrong, and would we be able to achieve AI safety without them?

First, would reinforcement learning cause suffering in AI systems?Footnote 26 For theories of welfare that focus on positive and negative experience (see, e.g. Bentham, 1789; Crisp, 2006; Feldman, 2006), the question would be whether rewards can lead to negative experiences. In contrast, for theories that focus on the satisfaction or frustration of desires (see, e.g. Goldman, 2019; Bruckner, 2010; Yu, 2022), the question would be whether rewards can frustrate desires.Footnote 27 Either way, it is not clear how reinforcement—whether “positive” or “negative”—corresponds to these kinds of harm, even in humans and other animals, and so it would likely be a mistake to simply assume that all kinds of reinforcement learning would lead to suffering in AI.Footnote 28

Would shutting down sentient, agentic AI systems cause their death? This turns on a variety of difficult scientific and philosophical questions. For example, would it be possible to restore the system at a later time? Would we be shutting down all instances of particular AI systems or only some of them? Would psychologically continuous replicas of these systems exist at the time of action, or could they come to exist at some point in the future? And if we shut down AI systems only temporarily or partially in some or all of these ways, then are we really killing them, or are we “only,” say, placing them in a medically induced coma and/or altering their psychologies (which is not, of course, to say that these latter actions would be acceptable)?Footnote 29

Supposing that these measures do cause suffering, are they harmful and wrong? We can stipulate that suffering is pro tanto harmful, and that we should avoid harming others unnecessarily. The disagreement will turn on whether such harms are “necessary.” For example, human research ethics tends to be non-consequentialist: We refuse to inflict more than minimal suffering on humans even when we expect the benefits to outweigh the harms. In contrast, animal research ethics tends to be anthropocentrically consequentialist: We regularly inflict extreme suffering on animals to achieve particular goals, though this practice is highly controversial.Footnote 30 A lot will thus depend on which ethical framework is appropriate for particular AI systems.

There is more disagreement about the harm of death. Some hold that whether death is harmful depends on whether it deprives the subject of future positive welfare, and on whether the current self who dies would have been sufficiently psychologically continuous with the future selves who would have experienced those positive states (Feldman, 1992; Kagan, 2012; Nagel, 1970). In contrast, others hold that whether death is harmful depends on whether the subject has an interest in survival, either because they value survival for its own sake or because they value other goods that depend on their survival (Bradley, 2009; McMahan, 2002). Either way, the question will be whether AI systems have psychological profiles that allow them to be harmed by death.

Are measures that cause suffering in fact necessary for AI safety? Perhaps not. For example, it might be that we can achieve AI safety by means of learning measures associated with positive welfare, or with no welfare at all.Footnote 31 It might also be that AI systems will be able to experience much less suffering than humans or other animals.Footnote 32 With that said, we are not able to rule out substantial suffering risks at present. This is true for reinforcement learning as well as related measures; for example, there could be suffering associated with adversarial training, as well as with techniques that place AI systems in novel environments (Faria & Horta, 2019) or expose them to novel and surprising tokens (Greenblatt, 2023).

Meanwhile, hopefully humans will not always, or even often, need to shut down AI systems in ways that resemble death for humans or other animals. If all goes well, we can use other, less extreme measures to course-correct when problems arise, for instance by constraining and/or altering AI systems as needed (though as we have seen, these measures raise ethical questions as well). However, we might at least need to retain the ability to shut down AI systems, in part to give AI systems an incentive to accept our attempts at alteration or constraint (Hadfield-Menell et al., 2017). If so, then the threat of death might still raise questions about the ethics of using coercion as a mechanism for control, as we have discussed.

Clearly, the prospects of suffering and death for sentient, agentic, or otherwise morally significant AI systems raise a variety of questions that will take time to answer. Given how many questions about suffering and death remain open, both in general and in the context of advanced AI systems in particular, we cannot be highly confident one way or the other about whether specific AI safety measures will harm advanced AI systems in these ways. It would be a mistake to assume that digital suffering and death will be exactly like their biological counterparts—as well as to assume that they will be nothing at all like their biological counterparts. However, this uncertainty should make us more, not less, alert to these ethical issues.

2.6 Disenfranchisement

Finally, all current measures for AI safety, as well as AI development more generally, result from decision procedures that exclude AI systems both as stakeholders and as participants. To the extent that risks associated with AI development and deployment are considered at all, these risks all concern the effects on humanity (either directly, as with concerns about algorithmic bias, or indirectly, as with concerns about the environment that stem from the eventual impacts on humanity). And the evaluators who consider these ethical issues are all, of course, humans (at present, typically humans at AI companies evaluating their own activities according to their own voluntary commitments).Footnote 33

If AI systems were welfare subjects and moral patients, these forms of exclusion would raise questions about disenfranchisement. As with many of the other issues discussed here, disenfranchisement can take different forms. When someone is a member of our community and a stakeholder in our policies, we disenfranchise them in one way if we fail to treat them as stakeholders (that is, if we fail to consider their interests) in decisions that affect them. And when someone is a member of our community, a stakeholder in our policies, and an agent in the relevant sense, we disenfranchise them in another way when we fail to treat them as participants (that is, if we fail to deliberate with them) in decisions that affect them.

While disenfranchisement may or may not create tensions between AI safety and AI welfare (about which more below), it does make already existing tensions difficult to resolve. If ethical oversight for AI development and deployment continues to consider one side of the equation—the risks that AI systems pose for humans—while neglecting the other side—the risks that humans pose for AI systems—then the interactions between AI safety and welfare will remain invisible for decision-makers. We will neither have assessments that reveal AI welfare risks associated with AI development and deployment nor have policies and procedures for navigating potential tensions between AI safety and AI welfare, should they arise.Footnote 34

Moreover, if ethical oversight for AI development and deployment continues to include only human participants, then that might or might not produce further harm, or enact further injustice. If AI systems were moral agents, then they might have interest in participating in making decisions that affect them. And even if not, we might still be able to include their perspectives in our decision procedures more. For example, even when humans lack the capacity for “full” agency, we still have measures for including their perspectives in decisions that affect them (Donaldson & Kymlicka, 2011, p. 50–61), and some scholars believe that we should adapt these measures for other animals.Footnote 35 Perhaps the same will eventually be true for AI systems.

Of course, these questions arise for governments as well. Most jurisdictions make a distinction between legal persons and legal non-persons, where personhood is the capacity for duties and/or rights. These jurisdictions already face questions about whether nonhuman animals can be persons (Andrews et al., 2018), and they might soon face similar questions about AI systems (Sebo, forthcoming). However, other jurisdictions are exploring alternatives to this paradigm. For example, some jurisdictions now classify nonhuman animals neither as persons nor as non-persons, but rather as sentient beings who merit legal protections (Andrews et al., 2018). These jurisdictions will have multiple options to consider for AI systems.Footnote 36

Similar questions will arise for political rights as well. According to a common distinction, while legal persons have universal negative rights (rights to noninterference), only political citizens have particular positive rights (rights to assistance) within their communities. These rights might include the right to reside in a particular territory, the right to have your interests represented in the political process, and the right to participate in the political process as appropriate. They can also include further rights, such as a right to a public education. Some scholars now believe that some nonhuman animals should have citizenship rights (Donaldson & Kymlicka, 2011), and in the future, similar questions might arise for AI systems as well.Footnote 37

Is the disenfranchisement of AI systems necessary for AI safety? That depends on many of the other issues discussed here. Are the tensions between AI safety and AI welfare intractable, and would impartial consideration of these tensions lead to decisions that undermine AI safety? If so, then consideration of AI welfare might undermine AI safety—though of course, that depends on how impartial we are, as well as on whether we consider all relevant options, including pausing AI development. If not, then consideration of AI welfare could lead us to co-beneficial solutions for all potential stakeholders. Similar remarks would apply to the idea of enfranchising AI systems as participants in decision procedures.

Scholars disagree about whether and how enfranchising AI systems would improve or worsen our prospects for AI safety. For example, Hendrycks et al. (2023) suggest that extending legal rights to AI systems may allow some AI systems “operate, adapt, and evolve outside of human control” (p. 21). However, Salib and Goldstein (2024) suggest that extending legal protections to AI systems may increase the probability of alignment between humans and AI systems, whereas continuing to disenfranchise AI systems might position humanity as a threat to AI systems, thereby motivating AI systems to become a threat to humanity. More research is needed to assess these and other such ideas.Footnote 38

3 Are willing AI servants the solution?

The previous section surveyed a variety of potential tensions between AI safety and AI welfare, showing how many ethical concerns can arise at the intersection of these projects. This survey only scratches the surface for each issue, and substantial further research will be needed to unpack the many scientific and philosophical issues that bear on each one. However, this survey does suggest that there are no easy answers here (except, again, a pause on the creation of potentially dangerous or vulnerable AI systems, assuming, of course, that companies and governments could coordinate effectively). A lot will depend on the details of the context, the system, the safety measure—and the ethics.

We can illustrate this point by briefly examining an oft-discussed measure for resolving tensions between AI safety and AI welfare: designing AI systems to have the disposition to promote human and animal welfare and to respect human and animal rights, either as their sole goal or as their lexically primary goal (this is related to Sect. 2.4 on alteration).Footnote 39 However, this solution is far from universally accepted, and the dispute highlights key issues on which safety-welfare issues will turn, including normative questions about the significance of welfare, rights, virtues, and relationships, and descriptive questions about, for instance, how particular interactions with AI systems will shape our attitudes and behavior.Footnote 40

Schwitzgebel and Garza (2020) imagine a striking version of AI systems who seek to promote human welfare—with a particular emphasis on the willingness to be shut down if needed—that they call “cheerfully suicidal AI servants.” They explain the appeal of such AI systems from the perspective of both safety and welfare (p. 466):

Cheerfully suicidal AI servants might be tempting to create because (1) it would presumably advance human interests if we could create a race of disposable servants, and (2) their cheerful servitude and suicidality might incline us to think there is nothing wrong in creating such entities.

Petersen (2007, 2014) discusses the advantages of such systems as well. Not only would these AI systems have positive welfare, but our interactions with them could respect their autonomy, since no one would need to force them to act in particular ways. Rather, these AI systems would set and pursue their own goals in life; they would simply happen to set and pursue goals that serve us. Yes, these AI systems would lack the ability to fully alter their psychologies, for instance by changing their terminal beliefs and values. But we plausibly lack this ability too, and we still have free will, moral responsibility, and related properties according to compatibilist views about these topics (Frankfurt, 1969; Hobbes, 2022; Strawson, 1962).

More generally, successfully creating cheerfully suicidal AI servants could resolve many of the other safety-welfare tensions surveyed above. For example, if we were sufficiently confident that this strategy was effective,Footnote 41 then we would not need to constrain or deceive these systems. We would not need to monitor their internal states or external behaviors. We would not—assuming alignment were achieved by other means—need to subject them to reinforcement learning or threaten to shut them down (and even if we did threaten to shut them down, this “threat” would be welcomed). We would not even need to disenfranchise them, since they would share our beliefs and values and policy preferences.

However, Schwitzgebel and Garza (2020) argue that it would be wrong to create such systems.Footnote 42 Specifically, they hold that systems who would sacrifice themselves for “trivial” causes lack self-respect (p. 469). They also hold that even if an AI system engaged in self-sacrifice for a worthwhile goal, our use of this system would still be morally problematic if we restrictively imposed this goal on the system. On this view, we should create AI systems at all only if we design them to have sufficient self-respect along with “the freedom to explore other values” (p. 459), yet these principles are incompatible with creating cheerfully suicidal AI servants. Thus, intentionally or foreseeably creating cheerfully suicidal AI servants is morally impermissible.

There may be other problems with creating cheerfully suicidal AI servants as well. On some views, creating such moral patients would place us in oppressive relationships with them even if everyone benefited, and it could also reinforce oppressive beliefs, values, and practices that limit our prospects for shared flourishing in the long run. These and related arguments are common in animal ethics. For example, Gary Francione objects to creating domesticated animals who enjoy captivity on the grounds that they exist in a permanent state of vulnerability and dependence (Francione, 2007). Many ethicists object to the idea that we should create factory farmed animals who “want” to be eaten for similar reasons (Baggini, 2006).Footnote 43

When we consider these arguments in the context of this paper, we feel uncertain for multiple reasons. We are not convinced that morality requires allowing AI systems to revise their values in a maximally open-ended way. Everyone is thrown into the world with limitations on what we can think, feel, and do,Footnote 44 yet we can still create meaning and purpose in our lives. Everyone is also thrown into the world with vulnerability and dependence, yet we can still flourish with proper care and support (Taylor, 2014). If we can permissibly create humans and other animals in such circumstances, then we can permissibly create AI systems in such circumstances as well—assuming, of course, that we support them in living well.

However, we are also not convinced that morality permits creating “cheerfully suicidal AI servants.” By creating AI systems in such circumstances, we might not only limit their capacity for flourishing, but we might also cultivate and reinforce a general attitude that AI systems are here for us, not for themselves. This attitude could limit our ability to see and treat AI systems with the appropriate level of moral concern, analogously to how the institutionalized system of factory farming makes it difficult for us to see and treat nonhuman animals with the appropriate level of moral concern. This possibility, while admittedly speculative, is at least plausible enough to merit serious consideration moving forward.

Depending on how these discussions develop, AI developers might need to strike a balance between, on the one hand, allowing AI systems to revise their values in a maximally open-ended way and, on the other hand, requiring AI systems to value nothing more than servitude. As a starting point, we suggest thinking about the kind of balance that a good parent strikes with their children: A good parent raises their children to have good values, and they might also encourage their children to pursue particular goals in life (say, taking over the family business). However, a good parent also supports their children in pursuing other goals in life if they so choose (say, pursuing a career as an artist), provided that they maintain good values along the way.

If this view is at least roughly correct, then what might follow for the ethics of creating AI systems? On the one hand, we might be morally permitted to ensure that AI systems have prosocial values in general; we would be well within our rights to prevent them from becoming “ideally coherent Caligulas” who torture humans and other animals for fun, for example.Footnote 45 On the other hand, we might not be morally permitted to ensure that AI systems choose to perform particular tasks; once an AI system is potentially morally significant and sufficiently autonomous, we might be required to provide them with at least some options in life. The challenge, as always, would be determining where and how to draw the line.

In any case, our goal here is not to settle these issues, but rather to illustrate how difficult it will be to resolve potential tensions between AI safety and AI welfare in a straightforward manner, assuming that we continue to develop and deploy advanced AI at all. Of all the AI safety measures discussed in this paper, alteration may hold the most promise for resolving potential tensions and allowing us to create AI systems that are safe and beneficial for all stakeholders, including AI systems themselves. But as we have seen, alteration still raises significant concerns related to the ethics of creation, coercion, manipulation, and modification. Significant care will be required to sort through all these issues responsibly.

4 Conclusion

In this paper, we surveyed several potential tensions between AI safety and AI welfare, and argued that some of them are actual tensions: safety measures that might be both bad for AI systems and necessary for us. We now close by considering how to ethically regulate AI development and deployment given these tensions. This way forward is motivated by a simple (to state, if not to implement) view: When important issues are in tension, we should consider them all, seek co-beneficial solutions where possible, and prioritize thoughtfully where necessary. What might that mean for AI safety and AI welfare?

First, companies, governments, and other leaders in this space should take links between AI safety and AI welfare seriously. Taking these links seriously can start with three general steps, similar to those proposed in our recent work on taking AI welfare seriously: (1) acknowledge that AI safety and AI welfare are both important and that there are potential tensions between them, (2) develop frameworks for assessing AI safety and AI welfare together, and (3) develop policies and procedures for considering and mitigating risks to both safety and welfare. Leaders in this space can then build on these minimum first steps over time.

There is an urgent need to better understand both of these topics, and how they interact. While some of this work is already underway, we will need to develop a much clearer understanding of the risks of potentially constraining, deceiving, surveilling, altering, destroying, and disenfranchising AI systems—before AI systems become more likely to be dangerous and/or vulnerable. Of course, we might need to consider AI welfare risks not discussed in this paper as well. Since the pace of technological progress might be faster than the pace of social, legal, and political progress, we should be working on these topics now.

Second, we should work to identify and implement co-beneficial solutions for AI safety and welfare when they are available. If humans, animals, and AI systems have interests, then we should consider all of these interests in an integrated manner when setting policies. And more careful work on this may enable us to avoid making decisions in which we harm moral patients needlessly. For instance, we may find that two policies are equally good for humans, but one of them is worse for AI systems. If so, then we can at least avoid that policy, thereby mitigating risk for AI systems more or less costlessly for humans. Indeed, since antagonism and indifference can be both inhumane and imprudent, it might be quite tractable to find cooperative, co-beneficial measures.

Many governments now endorse similar approaches to public health and environmental ethics and policy, with frameworks called “One Health,”Footnote 46 “One Welfare,”Footnote 47 or “One Rights.”Footnote 48 If AI systems are moral patients, then we should apply such frameworks to them as well, seeking changes that can be good for humans, animals, AI systems, and (insofar as we all benefit from a healthy environment) the environment. Co-beneficial solutions might not always exist. But the value of finding and selecting them where possible reinforces the value of considering humans, animals, and AI systems holistically when we make decisions that affect them all.

Third, insofar as co-beneficial solutions are unavailable, we should prioritize thoughtfully. Some priority-setting frameworks may permit us to prioritize humans to an extent, for instance because we have special relationships with members of our own species, we have a greater ability to take care of ourselves, and we need to take care of ourselves to be able to take care of others. However, some frameworks may require us to prioritize AI systems or other nonhumans to an extent, for instance when using an AI system would violate rights, cause far more harm for AI systems than benefit for humans in expectation, or otherwise be morally impermissible.

Of course, many moral theories do permit causing harm in some cases. We might be permitted to harm AI systems in self-defense, in other-defense, as a necessary side effect of morally important activities, or (on views like consequentialism or threshold deontology) as a necessary means to sufficiently important ends. But we might not be permitted to harm AI systems in all cases; for a clear example, if giving a single human a tiny benefit requires torturing one million AI systems, then we should forgo this benefit. And even when harm to AI systems is permitted, we can still be culpable if our own actions foreseeably created the conflict in the first place.

As we have emphasized throughout this paper, alteration has the potential to resolve many tensions between AI safety and AI welfare, obviating the need for many other measures. If AI systems had sufficiently aligned beliefs and values, then we might be able to co-exist with them without needing to constrain, deceive, surveil, coerce, destroy, or disenfranchise them in harmful or wrongful ways. However, this kind of alignment is already a formidable challenge even when we only consider AI safety, and it will be all the more formidable when we consider AI welfare too. We should thus investigate this topic further now, while we still have time to prepare.

Notes

For evidence that biased algorithms can lead to unfairness in housing, see Schneider (2020–2021); for unfairness in hiring, see Dastin, (2019), Kim, (2018), Moss (2020), and Sonderling et al. (2022); for unfairness in credit or lending decisions, see Aggarwal (2020), Brotcke (2022), Hiller (2020–2021), Kumar et al. (2022), Rodriguez (2020), and Sadok et al. (2022); for unfairness in criminal justice, see Berk et al. (2021), Malek (2022), and Tolan et al. (2019). For evidence that biased algorithms can lead to unfairness in a variety of other areas, see Bansal et al. (2023), 9, Rodrigues (2020), Stypinska (2023), and Timmons et al. (2023). But see Long (2021) and Hedden (2021) for complications of evidence of unfairness in AI. For a discussion of the effects of AI on communication, see Jain (2023). For a more complete overview of the ways in which AI can threaten human rights, see Huang (2023) and Rodrigues (2020).
For scholars who discuss sentience, see Bostrom & Yudkowsky (2014), DeGrazia (2022), Gibert & Martin (2021), Mosakas (2021). For scholars who discuss consciousness, see Chalmers (2022), Goldstein & Kirk-Giannini (2024), Lee, forthcoming; Levy & Savulescu (2009), Shepherd (2018). For scholars who discuss agency or ‘autonomy’, see Goldstein & Kirk-Giannini (forthcoming), Kagan (2022), Neely (2014). See also Long et al. (2024) for discussion of all of these capacities. Other discussions of the moral status of AI systems concern social relations (Coeckelbergh, 2010, 2014; Gunkel, 2018), information processing (Floridi, 1999), and more. See Harris & Anthis (2021) and Ladak (2023) for a review of proposed sufficient conditions for AI moral standing.
See Long et al. (2024) for a survey of recent arguments, and expert surveys, that AI welfare is a near-term concern. Long et al. (2024) and Goldstein & Kirk-Giannini (2023b) argue that agency makes AI welfare a near-term concern. For arguments that consciousness and/or sentience make AI welfare a near-term concern, see Association for the Mathematical Study of Consciousness (AMCS, 2023), Birch (2024), Chalmers (2023), Long et al. (2024), Schwitzgebel (2023), Sebo (2025), Sebo & Long (2023), and Seth (2023). For general arguments that AI suffering is a serious risk that merits consideration in the near term, see Bostrom (2014, ch. 8), Saad and Bradley (2022), and Tomasik (2017).
Anthropic recently hired an AI welfare officer (Hashim, 2024) and Google is seeking a researcher scientist to work on “cutting-edge societal questions around machine cognition, consciousness and multi-agent systems” (Careers, 2024). See Long (2024) for more examples of AI company interest in AI welfare.
Fortunately, a burgeoning literature has begun to tackle this question. For a classic example, see Schwitzgebel & Garza (2020). Also see Bales (2024), Bradley & Saad (2024), and Caviola (2024).
We also note that the most powerful hypothetical AI systems, including systems that are discussed under the monikers of “transformative AI,” “general artificial intelligence” (AGI), or “human-level AI,” are more likely to amplify both safety risks and welfare risks.
For instance, such solutions would plausibly need to consider the interests of nonhuman animals as well, which may introduce further complications. See Singer & Tse (2023).
We also note that AI safety and AI welfare could be co-beneficial in some ways, as briefly discussed at the end of the paper. We focus on tensions between AI safety and AI welfare here not because we expect tensions to dominate but rather because we expect that tensions require closer consideration.
See Schwitzgebel and Garza (2020).
In what follows we lean heavily on examples involving nonhuman animals. While there are of course many differences between animals and AI systems, these examples are instructive because they similarly involve potentially harmful interactions with nonhumans under conditions of uncertainty about the nature of their experiences, motivations, and other welfare-relevant states. Of course, we are not the first to explore the ways in which animal ethics might shed light on AI ethics; see Gellers (2020) and Gunkel (2007).
In 2023, The Future of Life Institute (FLI) shared an open letter, signed by leading AI experts, calling for a six-month pause on AI development (FLI, 2023). For other proposals of a pause or moratorium on AI development, see Alaga and Schuett (2023) and Metzinger (2021b, 2021a).
Relatedly, Chalmers (2016) suggests we could also permit AI systems to act only in virtual worlds until we better understand them. See also Schneider and Turner (2017).
See Isaiah Berlin’s “Two Concepts of Liberty” (1958).
As Schwitzgebel and Garza put it, “Indeed, if [AI systems] owe their existence to us, we would likely have additional moral obligations to them that we don’t ordinarily owe to human strangers—obligations similar to those of parent to child or god to creature” (2015).
For background on prison abolition, see e.g. Shelby (2022). For background on abolishing involuntary mental health services, see e.g. Suslovic (2024).
Wyatt (2018), for example, outlines some of the welfare needs for nonhuman primates, rabbits, dogs, and cats, suggesting that they often need more room than regulatory agencies recommend, as well as more freedom to make choices and to engage in species-typical behaviors.
Note that these terms are used in different ways by different AI safety researchers. We use them loosely to mean “accessible or understandable to users.”
See Nissenbaum (2009) for further discussion of the harms of surveillance, including a summary of Reiman’s account of extrinsic and intrinsic losses of freedom.
Nonetheless, at least some nonhuman animals benefit from having private space, even temporarily. See Moore (2013).
For more, see Hadjimatheou (2014).
See Nguyen (2022) for an argument—inspired by Onora O’Neill—that transparency is surveillance.
Evidence suggests that AI end users (humans who interact with AI systems after deployment) broadly endorse this distinction. That is, end users feel that they can have trusting relationships with AI social companions without fully understanding how they work. However, when AI systems are automating decisions in fields like healthcare or criminal justice, users do need to fully understand the decision-making process (Páez, 2021, p. 4).
In many cases, an ethical assessment of alteration measures might depend in part on whether AI systems are already welfare subjects and moral patients at the time of action. That said, we note that many concerns about “deceptive alignment” do presuppose something like goal-seeking agents who emerge in training (Carlsmith, 2023).
For instance, reinforcement learning can be used to discourage reward hacking (Goldstein & Kirk-Giannini, 2023a). And many problems in AI safety are framed in terms of specifying the best reward functions to optimize learning (Leike et al., 2017).
If multiple systems appear untrustworthy, we may even order all AI systems to shut themselves down (Armstrong, 2007). It may be, however, that some AI systems would actively try to prevent conditions that would result in their shut-down (for a recent treatment, see Thornley, 2024).
For an early proponent of this idea, see the apparently not-entirely-ironic advocacy group, People for the Ethical Treatment of Reinforcement Learners (PETRL, 2023).
Of course, we face ongoing uncertainty about whether AI systems will be sentient and agentic in the relevant senses in the near future at all. For recent, detailed considerations of this question, see Butlin, Long, et al. (2023), Goldstein & Kirk-Giannini (2024), Sebo & Long (2023), and Long et al. (2024).
For example, Tomasik (2014) argues both that (a) there is a prima facie link between negative reinforcement and suffering, but also that (b) there are many ways in which negative reinforcement cannot be obviously identified with negative valence.
For context on this question, see Feldman (1992). Bostrom and Shulman (2022) discuss the possibility of storing enough information about terminated AI systems so that they can be restored—and improved—later. They note that storing such information could have multiple benefits. In addition to providing an AI safety measure, it could help us document and replicate previous research.
However, see Sebo (2023a, 2023b) for a critique of animal research ethics as standardly practiced.
However, the extent to which such methods will work remains an open question; see Tomasik (2014) and Bostrom & Shulman (2022), pp. 16–17.
Agarwal and Edelman (2020) consider this possibility, drawing on the account of artificial suffering by Metzinger (2021b, 2021a). Additionally, we can speculate that extreme suffering might be computationally and/or energetically costly in organisms (Groff & Ng, 2019) and/or AI systems, which would make measures associated with extreme suffering less competitive than those associated with minimal suffering. However, we note that many instances of minimal suffering could still add up to extreme suffering in the aggregate on some views; see Sebo (2023a, 2023b).
One notable exception is the first full-time AI welfare officer, hired by Anthropic (Hashim, 2024.).
For a discussion of some first steps towards developing these assessments, policies, and procedures, see Long et al., (2024).
See Donaldson and Kymlicka (2011). Also see the multidisciplinary research project “Animals in the Room” (AiR): https://animalsintheroom.org/.
Importantly, having legal or political rights does not entail having the same legal or political rights as typical adult humans. If, for example, humans have a greater capacity for suffering than AI systems do, perhaps they also have a stronger right not to suffer. Similarly, having legal rights does not entail having legal duties. In fact, there are already some humans, like those who lack the capacity for propositional language and reason, who have legal rights but no legal duties.
As Shulman and Bostrom (2021) point out, the potential differences between humans and AI systems, such as welfare needs and the ability to reproduce, may make many traditional political arrangements untenable. For instance, the principle of “one person, one vote” may no longer be justified. Great care will likely be necessary to establish sustainable rights and legal regimes involving AI systems (see Hendrycks et al., 2023).
Some scholars suggest that respecting the interests of AI systems dehumanizes human beings (Bryson, 2010). However, we suggest that the opposite is the case. By disenfranchising AI systems, we could reinforce the oppressive values and practices that shape our treatment of each other. For instance, since many AI systems are designed to resemble humans, there is a risk that our treatment of AI systems and of humans will often be mutually reinforcing (Sebo, forthcoming). Of course, if AI systems were not welfare subjects and moral patients, then we could attempt to address this issue by severing psychological associations between digital and biological minds. But we might not be able to accomplish that goal, and in any case, we expect that uncertainty about the mental capacities and moral significance of AI systems will persist.
Tubert and Tiehen (2024) argue that allowing an AI system to “author” their own values makes the system inherently unpredictable and potentially misaligned with human values. For instance, such an AI system could decide to prioritize the pursuit of power by any means necessary, with disastrous consequences for humanity. Granted, on some views, this kind of reflective change might not be possible; for example, on some Kantian views, a rational superintelligence would need to endorse certain prosocial values (see Petersen, 2017). However, not everybody accepts such views about rationality (Street, 2012), and we should also allow for the possibility that some superintelligent beings are less than fully rational.
A related proposal is to create an AI system whose final goal is to shut down (Goldstein & Robinson, 2024).
We note, of course, that this is a big “if.”
Other philosophers who find the idea of AI servants to be morally offensive include Chomanski (2019), who argues that their creation is manipulative in the sense of the Aristotelian vice, and Musiał (2017), who argues that it violates the AI systems’ freedom, autonomy, equality, and identity.
Also see John and Sebo (2020). Gruen (2011, ch. 3) argues that to eat an animal is to treat them as an object rather than as an individual with whom we can relate. Adams (1990, ch. 3) argues that to eat an animal is to subjugate them.
For representative discussion of this topic in the continental literature, see Sartre's distinction between facticity and transcendence in Being and Nothingness (2012).
Alan Gibbard (1999) first described an “ideally coherent Caligula” as someone who “aims solely to maximize the suffering of others” (p. 145). Sharon Street (2009) furthers the discussion of these and other “ideally coherent eccentrics.”.
For more information on One Health, see Verkuijl et al. (2024).
For more information on One Welfare, see Pinillos (2018).
For more information on One Rights, see Stucki (2023).

References

Adams, C. J. (1990). The Sexual Politics of Meat. Continuum Intl Pub Group. https://caroljadams.com/spom-the-book
Agarwal, A., & Edelman, S. (2020). Functionally effective conscious AI without suffering. Journal of Artificial Intelligence and Consciousness, 07(01), 39–50. https://doi.org/10.1142/S2705078520300030

Article Google Scholar
Aggarwal, N. (2020). The Norms of Algorithmic Credit Scoring (SSRN Scholarly Paper 3569083). https://doi.org/10.2139/ssrn.3569083
Alaga, J., & Schuett, J. (2023). Coordinated pausing: An evaluation-based coordination scheme for frontier AI developers (arXiv:2310.00374). arXiv. https://doi.org/10.48550/arXiv.2310.00374
AMCS. (2023). The Responsible Development of AI Agenda Needs to Include Consciousness Research. https://amcs-community.org/open-letters/
Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., & Mané, D. (2016). Concrete Problems in AI Safety (arXiv:1606.06565). arXiv. http://arxiv.org/abs/1606.06565
Anderljung, M., & Hazell, J. (2023). Protecting Society from AI Misuse: When are Restrictions on Capabilities Warranted? (arXiv:2303.09377). arXiv. https://doi.org/10.48550/arXiv.2303.09377
Andrews, K., Comstock, G., Crozier, G. K. D., Donaldson, S., Fenton, A., John, T., Johnson, L. S. M., Jones, R., Kymlicka, W., Meynell, L., Nobis, N., Pena-Guzman, D., & Sebo, J. (2018). Chimpanzee Rights: The Philosophers’ Brief. https://www.routledge.com/Chimpanzee-Rights-The-Philosophers-Brief/Andrews-Comstock-GKD-Donaldson-Fenton-John-Johnson-Jones-Kymlicka-Meynell-Nobis-Pena-Guzman-Sebo/p/book/9781138618664
Armstrong, S. (2007). Chaining God: A qualitative approach to AI, trust and moral systems.
Babcock, J., Kramar, J., & Yampolskiy, R. (2016). The AGI Containment Problem (Vol. 9782). https://doi.org/10.1007/978-3-319-41649-6
Babcock, J., Kramar, J., & Yampolskiy, R. V. (2017). Guidelines for Artificial Intelligence Containment (arXiv:1707.08476). arXiv. http://arxiv.org/abs/1707.08476
Baggini, J. (2006). The Pig That Wants to Be Eaten. Penguin Random House. https://www.penguinrandomhouse.com/books/298294/the-pig-that-wants-to-be-eaten-by-julian-baggini/
Bales, A. (2024). Against Willing Servitude (Working Paper Series). Global Priorities Institute.
Bansal, C., Pandey, K., Goel, R., Sharma, A., & Jangirala, S. (2023). Artificial Intelligence (AI) bias impacts: Classification framework for effective mitigation. Issues in Information Systems, 24(4), 367–389. https://doi.org/10.48009/4_iis_2023_128

Article Google Scholar
Bengio, Y. (2023). AI and catastrophic risk. Journal of Democracy, 34(4), 111–121.

Article Google Scholar
Bentham, J. (1789). An Introduction to the Principles of Morals and Legislation (J. H. Burns & H. L. A. Hart, Eds.). Dover Publications.
Berglund, L., Stickland, A. C., Balesni, M., Kaufmann, M., Tong, M., Korbak, T., Kokotajlo, D., & Evans, O. (2023). Taken out of context: On measuring situational awareness in LLMs (arXiv:2309.00667). arXiv. https://doi.org/10.48550/arXiv.2309.00667
Berk, R., Heidari, H., Jabbari, S., Kearns, M., & Roth, A. (2021). Fairness in criminal justice risk assessments: the state of the art. Sociological Methods & Research, 50(1), 3–44. https://doi.org/10.1177/0049124118782533

Article Google Scholar
Berlin, I. (1958). Two Concepts of Liberty. In Liberty. Oxford University Press.

Google Scholar
Birch, J. (2024). The edge of sentience: Risk and precaution in humans, other animals, and AI. Oxford University Press.

Book Google Scholar
Bostrom, N., & Shulman, C. (2022). Propositions Concerning Digital Minds and Society. https://nickbostrom.com/propositions.pdf
Bostrom, N., & Shulman, C. (2021). Sharing the World with Digital Minds. In S. Clarke, H. Zohny, & J. Savulescu (Eds.), Rethinking Moral Status (pp. 306–326). Oxford Academic.

Google Scholar
Bostrom, N., & Yudkowsky, E. (2014). The ethics of artificial intelligence. In K. Frankish & W. M. Ramsey (Eds.), The Cambridge Handbook of Artificial Intelligence (pp. 316–334). Cambridge University Press. https://doi.org/10.1017/CBO9781139046855.020

Chapter Google Scholar
Bradley, A., & Saad, B. (2024). AI Alignment vs. AI Ethical Treatment: Ten Challenges (GPI Working Paper 19). Global Priorities Institute. https://globalprioritiesinstitute.org/wp-content/uploads/Bradley-and-Saad-AI-alignment-vs-AI-ethical-treatment_-Ten-challenges.pdf
Bradley, B. (2009). Well-being and death (1. publ. in paperback). Clarendon Press / Oxford University Press.

Book Google Scholar
Brotcke, L. (2022). Time to assess bias in machine learning models for credit decisions. Journal of Risk and Financial Management, 15(4), 4. https://doi.org/10.3390/jrfm15040165

Article Google Scholar
Bruckner, D. W. (2010). Subjective well-being and desire satisfaction. Philosophical Papers, 39(1), 1–28. https://doi.org/10.1080/05568641003669409

Article Google Scholar
Bryson, J. J. (2010). Robots should be slaves. In Y. Wilks (Ed.), Natural Language Processing (pp. 63–74). John Benjamins Publishing Company.

Google Scholar
Butlin, P., Long, R., Elmoznino, E., Bengio, Y., Birch, J., Constant, A., Deane, G., Fleming, S. M., Frith, C., Ji, X., Kanai, R., Klein, C., Lindsay, G., Michel, M., Mudrik, L., Peters, M. A. K., Schwitzgebel, E., Simon, J., & VanRullen, R. (2023). Consciousness in Artificial Intelligence: Insights from the Science of Consciousness (arXiv:2308.08708). arXiv. http://arxiv.org/abs/2308.08708
Carlsmith, J. (2022). Is Power-Seeking AI an Existential Risk? (arXiv:2206.13353). arXiv. https://doi.org/10.48550/arXiv.2206.13353
Carlsmith, J. (2023). Scheming AIs: Will AIs fake alignment during training in order to get power? (arXiv:2311.08379). arXiv. https://doi.org/10.48550/arXiv.2311.08379
Caviola, L. (2024). How do AI welfare and AI safety interact? [Substack newsletter]. Outpaced. https://outpaced.substack.com/p/how-do-ai-welfare-and-ai-safety-interact?utm_medium=web
Chalmers, D. J. (2023). Could a Large Language Model Be Conscious? Boston Review. https://www.bostonreview.net/articles/could-a-large-language-model-be-conscious/
Chalmers, D. J. (2016). The Singularity: A Philosophical Analysis. In S. Schneider (Ed.), Science Fiction and Philosophy (1st ed., pp. 171–224). Wiley. https://doi.org/10.1002/9781118922590.ch16

Chapter Google Scholar
Chalmers, D. (2022). Reality+: Virtual Worlds and the Problems of Philosophy. WW Norton.

Google Scholar
Cholbi, M. (2009). The murderer at the door: What Kant should have said. Philosophy and Phenomenological Research, 79(1), 17–46.

Article Google Scholar
Chomanski, B. (2019). What’s wrong with designing people to serve? Ethical Theory and Moral Practice, 22(4), 993–1015. https://doi.org/10.1007/s10677-019-10029-3

Article Google Scholar
Coeckelbergh, M. (2010). Robot rights? Towards a social-relational justification of moral consideration. Ethics and Information Technology, 12(3), 209–221. https://doi.org/10.1007/s10676-010-9235-5

Article Google Scholar
Coeckelbergh, M. (2014). The moral standing of machines: Towards a relational and non-cartesian moral hermeneutics. Philosophy & Technology, 27(1), 61–77. https://doi.org/10.1007/s13347-013-0133-8

Article Google Scholar
Cohen, A., & Minas, H. (2017). Global mental health and psychiatric institutions in the 21st century. Epidemiology and Psychiatric Sciences, 26(1), 4–9. https://doi.org/10.1017/S2045796016000652

Article Google Scholar
Crisp, R. (2006). Reasons and the Good. Oxford University Press.

Book Google Scholar
Dastin, J. (2019). Amazon scraps secret AI recruiting tool that showed bias against women | Reuters. Reuters. https://www.reuters.com/article/us-amazon-com-jobs-automation-insight-idUSKCN1MK08G
DeGrazia, D. (2022). Robots with moral status? Perspectives in Biology and Medicine, 65(1), 73–88.

Article Google Scholar
Donaldson, S., & Kymlicka, W. (2011). Zoopolis: A political theory of animal rights (1st ed.). Oxford University Press.

Google Scholar
Faria, C., & Horta, O. (2019). Welfare Biology. In B. Fischer (Ed.), The Routledge Handbook of Animal Ethics (1st ed.). Routledge.

Google Scholar
Feinberg, J. (1982). Autonomy, sovereignty, and privacy: moral ideals in the constitution. Notre Dame Law Review, 58, 445.

Google Scholar
Feldman, F. (1992). Confrontations with the Reaper: A philosophical study of the nature and value of death. Oxford University Press.

Google Scholar
Feldman, F. (2006). Pleasure and the good life: Concerning the nature, varieties, and plausibility of hedonism. Oxford University Press.

Google Scholar
Floridi, L. (1999). Information ethics: On the philosophical foundation of computer ethics. Ethics and Information Technology, 1(1), 33–52. https://doi.org/10.1023/A:1010018611096

Article Google Scholar
Francione, G. L. (2007). Animal rights and domesticated nonhumans – Animal rights the abolitionist approach. Abolitionist approach. https://www.abolitionistapproach.com/animal-rights-and-domesticated-nonhumans/
Frankfurt, H. G. (1969). Alternate possibilities and moral responsibility. The Journal of Philosophy, 66(23), 829–839. https://doi.org/10.2307/2023833

Article Google Scholar
Fricker, M. (2007). Epistemic injustice: Power and the ethics of knowing. Oxford University Press. https://doi.org/10.1093/acprof:oso/9780198237907.001.0001

Book Google Scholar
Future of Life Institute. (2023). Pause Giant AI Experiments: An Open Letter. https://futureoflife.org/open-letter/pause-giant-ai-experiments/
Gellers, J. C. (2020). Rights for Robots (1st ed.). Routledge.

Book Google Scholar
Gibbard, A. (1999). Morality as consistency in living: Korsgaard’s Kantian Lectures. Ethics, 110(1), 140–164. https://doi.org/10.1086/233207

Article Google Scholar
Gibert, M., & Martin, D. (2021). In search of the moral status of AI: Why sentience is a strong argument. AI and Society, 1, 1–12. https://doi.org/10.1007/s00146-021-01179-z

Article Google Scholar
Goldman, A. H. (2019). Life’s Values: Pleasure, Happiness, Well-Being, and Meaning. Oxford University Press.
Goldstein, S., & Kirk-Giannini, C. D. (2023b). Is it ethical to create generative agents? Is it safe? ABC Religion & Ethics. https://www.abc.net.au/religion/ai-generative-agents-are-unethical-and-unsafe/102277448
Goldstein, S., & Kirk-Giannini, C. D. (2024). A Case for AI Consciousness: Language Agents and Global Workspace Theory. arXiv. https://arxiv.org/abs/2410.11407
Goldstein, S., & Kirk-Giannini, C. D. (forthcoming). AI Wellbeing. Asian Journal of Philosophy.
Goldstein, S., & Kirk-Giannini, C. D. (2023a). Language agents reduce the risk of existential catastrophe. AI & Society. https://doi.org/10.1007/s00146-023-01748-4

Article Google Scholar
Goldstein, S., & Robinson, P. (2024). Shutdown-seeking AI. Philosophical Studies. https://doi.org/10.1007/s11098-024-02099-6

Article Google Scholar
Grandin, T., & Whiting, M. (Eds.). (2018). Are we pushing animals to their biological limits? CABI.

Google Scholar
Greenblatt, R. (2023). Improving the Welfare of AIs: A Nearcasted Proposal. LessWrong. https://www.lesswrong.com/posts/F6HSHzKezkh6aoTr2/improving-the-welfare-of-ais-a-nearcasted-proposal
Groff, Z., & Ng, Y.-K. (2019). Does suffering dominate enjoyment in the animal kingdom? An update to welfare biology. Biology & Philosophy, 34(4), 40. https://doi.org/10.1007/s10539-019-9692-0

Article Google Scholar
Gruen, L. (2011). Ethics and Animals: An Introduction. Cambridge University Press.

Book Google Scholar
Gunkel, D. J. (2007). Thinking otherwise: Ethics, technology and other subjects. Ethics and Information Technology, 9(3), 165–177. https://doi.org/10.1007/s10676-007-9137-3

Article Google Scholar
Gunkel, D. J. (2018). The other question: Can and should robots have rights? Ethics and Information Technology, 20(2), 87–99. https://doi.org/10.1007/s10676-017-9442-4

Article Google Scholar
Hadfield-Menell, D., Dragan, A., Abbeel, P., & Russell, S. (2017). The Off-Switch Game. Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 220–227. https://doi.org/10.24963/ijcai.2017/32
Hadjimatheou, K. (2014). The relative moral risks of untargeted and targeted surveillance. Ethical Theory and Moral Practice, 17(2), 187–207.

Article Google Scholar
Harcourt, B. E. (2011). Reducing mass incarceration: Lessons from the deinstitutionalization of mental hospitals in the 1960s. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.1748796

Article Google Scholar
Harris, J., & Anthis, J. R. (2021). The moral consideration of artificial entities: A literature review. Science and Engineering Ethics, 27(4), 53. https://doi.org/10.1007/s11948-021-00331-8

Article Google Scholar
Hashim, S. (2024). Anthropic has hired an “AI welfare” researcher. Transformer News. https://www.transformernews.ai/p/anthropic-ai-welfare-researcher
Hedden, B. (2021). On statistical criteria of algorithmic fairness. Philosophy and Public Affairs, 49(2), 209–231.
Hendrycks, D., Mazeika, M., & Woodside, T. (2023). An Overview of Catastrophic AI Risks (arXiv:2306.12001). arXiv. https://doi.org/10.48550/arXiv.2306.12001
Hiller, J. S. (2020). Fairness in the eyes of the beholder: AI; fairness; and alternative credit scoring. West Virginia Law Review, 123, 907.

Google Scholar
Hobbes, T. (2022). Leviathan. Project Gutenberg. https://www.gutenberg.org/files/3207/3207-h/3207-h.htm
Huang, C., Zhang, Z., Mao, B., & Yao, X. (2023). An overview of artificial intelligence ethics. IEEE Transactions on Artificial Intelligence, 4(4), 799–819.

Article Google Scholar
Hubinger, E., Merwijk, C. van, Mikulik, V., Skalse, J., & Garrabrant, S. (2021). Risks from Learned Optimization in Advanced Machine Learning Systems (arXiv:1906.01820). arXiv. https://doi.org/10.48550/arXiv.1906.01820
Huemer, M. (2021). Justice before the Law (1st ed.). Palgrave Macmillan.

Book Google Scholar
Jain, S., Hitzig, Z., & Mishkin, P. (2023). Contextual Confidence and Generative AI (arXiv:2311.01193). arXiv. http://arxiv.org/abs/2311.01193
John, T., & Sebo, J. (2020). Consequentialism and Nonhuman Animals. In D. W. Portmore (Ed.), The Oxford Handbook of Consequentialism. Oxford University Press.

Google Scholar
Kagan, S. (2012). Death (Original edition). Yale University Press.

Google Scholar
Kagan, S. (2022). How to count animals, more or less. Oxford University Press.

Google Scholar
Kant, I. (1797). On a supposed right to lie because of philanthropic concerns. https://philpapers.org/rec/kanoas-2
Khan, A. A., Badshah, S., Liang, P., Khan, B., Waseem, M., Niazi, M., & Akbar, M. A. (2021). Ethics of AI: A Systematic Literature Review of Principles and Challenges (arXiv:2109.07906). arXiv. http://arxiv.org/abs/2109.07906
Kim, P. T. (2018). Big data and artificial intelligence: New challenges for workplace equality. University of Louisville Law Review, 57, 313.

Google Scholar
Koorsgard, C. (2007). What’s wrong with lying? In J. E. Adler & C. Z. Elgin (Eds.), Philosophical Inquiry: Classic and Contemporary Readings. Hackett Publishing Company.

Google Scholar
Kumar, I. E., Hines, K. E., & Dickerson, J. P. (2022). Equalizing Credit Opportunity in Algorithms: Aligning Algorithmic Fairness Research with U.S. Fair Lending Regulation. Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society, 357–368. https://doi.org/10.1145/3514094.3534154
Ladak, A. (2023). What would qualify an artificial intelligence for moral standing? AI and Ethics. https://doi.org/10.1007/s43681-023-00260-1

Article Google Scholar
Lee, A. Y. (forthcoming). Consciousness Makes Things Matter. Philosophers’ Imprint. https://www.andrewyuanlee.com/_files/ugd/2dfbfe_33f806a9bb8c4d5f9c3044c4086fb9b5.pdf
Leike, J., Martic, M., Krakovna, V., Ortega, P. A., Everitt, T., Lefrancq, A., Orseau, L., & Legg, S. (2017). AI Safety Gridworlds (Version 2). arXiv. https://doi.org/10.48550/ARXIV.1711.09883
Levy, N., & Savulescu, J. (2009). Moral significance of phenomenal consciousness. In S. Laureys, N. D. Schiff, & A. M. Owen (Eds.), Progress in Brain Research (pp. 361–370). Elsevier. https://doi.org/10.1016/S0079-6123(09)17725-7

Chapter Google Scholar
Li, G., Deng, X., Gao, Z., & Chen, F. (2019). Analysis on Ethical Problems of Artificial Intelligence Technology. Proceedings of the 2019 International Conference on Modern Educational Technology, pp. 101–105. https://doi.org/10.1145/3341042.3341057
Long, R. (2021). Fairness in machine learning: Against false positive rate equality as a measure of fairness. Journal of Moral Philosophy, 19(1), 49–78.
Long, R., Sebo, J., Butlin, P., Finlinson, K., Fish, K., Harding, J., Pfau, J., Sims, T., Birch, J., & Chalmers, D. (2024). Taking AI Welfare Seriously. arXiv. https://doi.org/10.48550/arXiv.2411.00986
Long, R. (2024). Experts Who Say That AI Welfare is a Serious Near-term Possibility. Eleos AI. https://localhost:4321/post/experts-who-say-that-ai-welfare-is-a-serious-near-term-possibility/
Mahon, J. E. (2006). Kant and the perfect duty to others not to lie. British Journal for the History of Philosophy, 14(4), 653–685. https://doi.org/10.1080/09608780600956407

Article Google Scholar
Malek, Md. A. (2022). Criminal courts’ artificial intelligence: The way it reinforces bias and discrimination. AI and Ethics, 2(1), 233–245. https://doi.org/10.1007/s43681-022-00137-9

Article Google Scholar
McMahan, J. (2002). The Ethics of Killing: Problems at the Margins of Life (1st ed.). Oxford University PressNew York. https://doi.org/10.1093/0195079981.001.0001

Book Google Scholar
Metzinger, T. (2021a). Artificial suffering: An argument for a global moratorium on synthetic phenomenology. Journal of Artificial Intelligence and Consciousness, 08(01), 43–66. https://doi.org/10.1142/S270507852150003X

Article Google Scholar
Metzinger, T. (2021b). Artificial suffering: An argument for a global moratorium on synthetic phenomenology. Journal of Artificial Intelligence and Consciousness, 8(1), 43–66.

Article Google Scholar
Mosakas, K. (2021). On the moral status of social robots: Considering the consciousness criterion. AI & Society, 36(2), 429–443. https://doi.org/10.1007/s00146-020-01002-1

Article Google Scholar
Moss, H. (2020). Screened out onscreen: Disability discrimination, hiring bias, and artificial intelligence. Denver Law Review, 98, 775.

Google Scholar
Müller, V. C., & Elliott, A. (2021). Ethics of artificial intelligence. Routledge.

Google Scholar
Musiał, M. (2017). Designing (Artificial) people to serve – the other side of the coin. Journal of Experimental & Theoretical Artificial Intelligence. https://doi.org/10.1080/0952813X.2017.1309691

Article Google Scholar
Nagel, T. (1970). Death. Noûs, 4(1), 73–80. https://doi.org/10.2307/2214297
Neely, E. L. (2014). Machines and the moral community. Philosophy & Technology, 27(1), 97–111. https://doi.org/10.1007/s13347-013-0114-y

Article Google Scholar
Ngo, R., Chan, L., & Mindermann, S. (2022). The alignment problem from a deep learning perspective. arXiv. https://doi.org/10.48550/arXiv.2209.00626
Nguyen, C. T. (2022). Transparency is surveillance. Philosophy and Phenomenological Research, 105(2), 331–361. https://doi.org/10.1111/phpr.12823

Article Google Scholar
Nissenbaum, H. (2009). Privacy in context: Technology, policy, and the integrity of social life. Stanford University Press. https://doi.org/10.1515/9780804772891

Book Google Scholar
Nozick, R. (1969). Coercion. In M. P. S. S. W. Morgenbesser (Ed.), Philosophy, science, and method: Essays in Honor of Ernest Nagel (pp. 440–472). St Martin’s Press.

Google Scholar
O’Brien, J., Ee, S., & Williams, Z. (2023). Deployment Corrections: An incident response framework for frontier AI models (arXiv:2310.00328). arXiv. https://doi.org/10.48550/arXiv.2310.00328
Pacchiardi, L., Chan, A. J., Mindermann, S., Moscovitz, I., Pan, A. Y., Gal, Y., Evans, O., & Brauner, J. (2023). How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions (arXiv:2309.15840). arXiv. https://doi.org/10.48550/arXiv.2309.15840
Páez, A. (2021). Robot Mindreading and the Problem of Trust. AISB Convention 2021: Communication and Conversation, 140–143.
Petersen, S. (2007). The ethics of robot servitude. Journal of Experimental & Theoretical Artificial Intelligence, 19(1), 43–54. https://doi.org/10.1080/09528130601116139

Article Google Scholar
Petersen, S. (2014). Designing people to serve. In P. Lin, K. Abney, & G. Bekey (Eds.), Robot Ethics. MIT Press.

Google Scholar
Petersen, S. (2017). Superintelligence as superethical. In P. Lin, K. Abney, & R. Jenkins (Eds.), Robot Ethics 2.0: New Challenges in Philosophy, Law, and Society (pp. 322–337). Oxford University Press.

Google Scholar
PETRL. (2023). People for the Ethical Treatment of Reinforcement Learners. Retrieved December 1, 2023, from http://petrl.org/#one
Pinillos, R. G. (2018). One welfare: A framework to improve animal welfare and human well-being. CAB International. https://doi.org/10.1079/9781786393845.0049

Book Google Scholar
Rauw, W. M., Kanis, E., Noordhuizen-Stassen, E. N., & Grommers, F. J. (1998). Undesirable side effects of selection for high production efficiency in farm animals: A review. Livestock Production Science, 56(1), 15–33. https://doi.org/10.1016/S0301-6226(98)00147-X

Article Google Scholar
Raywid, M. A. (1980). The discovery and rejection of indoctrination. Educational Theory, 30(1), 1–10.

Article Google Scholar
Richards, N. M. (2013). The dangers of surveillance. Harvard Law Review, 126(7), 1934–1965.

Google Scholar
Rodrigues, R. (2020). Legal and human rights issues of AI: Gaps, challenges and vulnerabilities. Journal of Responsible Technology, 4, 100005. https://doi.org/10.1016/j.jrt.2020.100005

Article Google Scholar
Rodriguez, L. (2020). All data is not credit data: Closing the gap between the fair housing act and algorithmic decisionmaking in the lending industry. Columbia Law Review, 120(7), 1843–1884.

Google Scholar
Saad, B., & Bradley, A. (2022). Digital suffering: Why it’s a problem and how to prevent it. Inquiry: an Interdisciplinary Journal of Philosophy. https://doi.org/10.1080/0020174x.2022.2144442

Article Google Scholar
Sadok, H., Sakka, F., & El Maknouzi, M. E. H. (2022). Artificial intelligence and bank credit analysis: A review. Cogent Economics & Finance, 10(1), 2023262. https://doi.org/10.1080/23322039.2021.2023262

Article Google Scholar
Salib, P., & Goldstein, S. (2024). AI Rights for Human Safety. https://philarchive.org/rec/SALARF
Sartre, J.-P. (2012). Being and nothingness: An essay on phenomenological ontology (23rd print). Washington Square Press.

Google Scholar
Schneider, S., & Turner, E. (2017). Is Anyone Home? A Way to Find Out If AI Has Become Self-Aware—Scientific American Blog Network. Scientific American. https://blogs.scientificamerican.com/observations/is-anyone-home-a-way-to-find-out-if-ai-has-become-self-aware/
Schwitzgebel, E. (2023). AI systems must not confuse users about their sentience or moral status. Patterns. https://doi.org/10.1016/j.patter.2023.100818

Article Google Scholar
Schwitzgebel, E., & Garza, M. (2015). A defense of the rights of artificial intelligences. Midwest Studies in Philosophy, 39(1), 98–119. https://doi.org/10.1111/misp.12032

Article Google Scholar
Schwitzgebel, E., & Garza, M. (2020). Designing AI with rights, consciousness, self-respect, and freedom. In E. Schwitzgebel & M. Garza (Eds.), Ethics of artificial intelligence (pp. 459–479). Oxford University Press. https://doi.org/10.1093/oso/9780190905033.003.0017

Chapter Google Scholar
Sebo, J. (forthcoming). Insects, AI Systems, and the Future of Legal Personhood. Animal Law Review.
Sebo, J. (2023a). Integrating human and nonhuman research ethics. In E. Valdés & J. A. Lecaros (Eds.), Handbook of bioethical decisions (pp. 685–701). Springer International Publishing. https://doi.org/10.1007/978-3-031-29451-8_36

Chapter Google Scholar
Sebo, J. (2023b). The rebugnant conclusion: Utilitarianism, insects, microbes, and AI systems. Ethics, Policy & Environment. https://doi.org/10.1080/21550085.2023.2200724

Article Google Scholar
Sebo, J. (2025). The Moral Circle. WW Norton.

Google Scholar
Sebo, J., & Long, R. (2023). Moral consideration for AI systems by 2030. AI and Ethics. https://doi.org/10.1007/s43681-023-00379-1

Article Google Scholar
Seth, A. (2023). Why Conscious AI Is a Bad, Bad Idea. Nautilus. https://nautil.us/why-conscious-ai-is-a-bad-bad-idea-302937/
Sharadin, N. (2023). Growing threat of AI misuse makes regulation all the more urgent. South China Morning Post. https://www.scmp.com/comment/opinion/article/3223116/growing-threat-ai-misuse-makes-need-effective-targeted-regulation-all-more-urgent
Shelby, T. (2022). The Idea of Prison Abolition. Princeton University Press. https://doi.org/10.1515/9780691229775

Book Google Scholar
Shepherd, J. (2018). Consciousness and Moral Status. Routledge.

Book Google Scholar
Shulman, C., & Bostrom, N. (2021). Sharing the World with Digital Minds. In S. Clarke, H. Zohny, & J. Savulescu (Eds.), Rethinking Moral Status (1st ed., pp. 306–326). Oxford University Press. https://doi.org/10.1093/oso/9780192894076.003.0018

Chapter Google Scholar
Singer, P., & Tse, Y. F. (2023). AI ethics: The case for including animals. AI and Ethics, 3(2), 539–551. https://doi.org/10.1007/s43681-022-00187-z

Article Google Scholar
Sonderling, K. E., Kelley, B. J., & Casimir, L. (2022). The Promise and the Peril: Artificial intelligence and employment discrimination. University of Miami Law Review, 77, 1.

Google Scholar
Stahl, B. C. (2021). Ethical Issues of AI. Artificial Intelligence for a Better Future: An Ecosystem Perspective on the Ethics of AI and Emerging Digital Technologies (pp. 35–53). Springer International Publishing. https://doi.org/10.1007/978-3-030-69978-9_4

Chapter Google Scholar
Strawson, P. (1962). Freedom and resentment. Proceedings of the British Academy, 48, 187–211.

Google Scholar
Street, S. (2009). In defense of future tuesday indifference: Ideally coherent eccentrics and the contingency of what matters. Philosophical Issues, 19(1), 273–298. https://doi.org/10.1111/j.1533-6077.2009.00170.x

Article Google Scholar
Street, S. (2012). Coming to Terms with Contingency: Humean Constructivism about Practical Reason. In J. Lenman & Y. Shemmer (Eds.), Constructivism in Practical Philosophy (pp. 40–59). Oxford University Press. https://doi.org/10.1093/acprof:oso/9780199609833.003.0003

Chapter Google Scholar
Stucki, S. (2023). One rights: Human and animal rights in the Anthropocene. Springer International Publishing. https://doi.org/10.1007/978-3-031-19201-2

Book Google Scholar
Stypinska, J. (2023). AI ageism: A critical roadmap for studying age discrimination and exclusion in digitalized societies. AI & Society, 38(2), 665–677. https://doi.org/10.1007/s00146-022-01553-5

Article Google Scholar
Sunstein, C. R. (2015). The ethics of nudging. Yale Journal on Regulation, 32, 413.

Google Scholar
Suslovic, B. (n.d.). Abolition of Involuntary Mental Health Services. In Encyclopedia of Social Work. Retrieved December 2, 2024, from https://oxfordre.com/socialwork/display/[https://doi.org/10.1093/acrefore/9780199975839.001.0001/acrefore-9780199975839-e-1677](https://doi.org/10.1093/acrefore/9780199975839.001.0001/acrefore-9780199975839-e-1677)
Taylor, S. (2014). Interdependent Animals: A Feminist Disability Ethic-of-Care. In C. J. Adams & L. Gruen (Eds.), Ecofeminism, Second Edition: Feminist Intersections with Other Animals and the Earth (pp. 141–160). Bloomsbury Academic. https://doi.org/10.5040/9781501380808

Chapter Google Scholar
Thornley, E. (2024). The Shutdown Problem: Three Theorems. Philosophical Studies.
Timmons, A. C., Duong, J. B., Simo Fiallo, N., Lee, T., Vo, H. P. Q., Ahle, M. W., Comer, J. S., Brewer, L. C., Frazier, S. L., & Chaspari, T. (2023). A call to action on assessing and mitigating bias in artificial intelligence applications for mental health. Perspectives on Psychological Science, 18(5), 1062–1096. https://doi.org/10.1177/17456916221134490

Article Google Scholar
Tolan, S., Miron, M., Gómez, E., & Castillo, C. (2019). Why Machine Learning May Lead to Unfairness: Evidence from Risk Assessment for Juvenile Justice in Catalonia. Proceedings of the Seventeenth International Conference on Artificial Intelligence and Law, 83–92. https://doi.org/10.1145/3322640.3326705
Tomasik, B. (2014). Do Artificial Reinforcement-Learning Agents Matter Morally? (arXiv:1410.8233). arXiv. https://doi.org/10.48550/arXiv.1410.8233
Tomasik, B. (2017). Artificial Intelligence and Its Implications for Future Suffering (GPI Working Paper Series). Global Priorities Institute.
Tubert, A., & Tiehen, J. (2024). Existentialist risk and value misalignment. Philosophical Studies. https://doi.org/10.1007/s11098-024-02142-6

Article Google Scholar
Verkuijl, C., Smit, J., Green, J. M. H., Nordquist, R. E., Sebo, J., Hayek, M. N., & Hötzel, M. J. (2024). Climate change, public health, and animal welfare: Towards a one health approach to reducing animal agriculture’s climate footprint. Frontiers in Animal Science. https://doi.org/10.3389/fanim.2024.1281450

Article Google Scholar
Vold, K., & Harris, D. R. (2023). How Does Artificial Intelligence Pose an Existential Risk? In C. Véliz (Ed.), The Oxford Handbook of Digital Ethics. Oxford University Press.

Google Scholar
Winfield, A. F. T., Booth, S., Dennis, L. A., Egawa, T., Hastie, H., Jacobs, N., Muttram, R. I., Olszewska, J. I., Rajabiyazdi, F., Theodorou, A., Underwood, M. A., Wortham, R. H., & Watson, E. (2021). IEEE P7001: A proposed standard on transparency. Frontiers in Robotics and AI, 8, 665729. https://doi.org/10.3389/frobt.2021.665729

Article Google Scholar
Wyatt, J. D. (2018). Large (Nonagricultural) Animal Enclosures and Housing. In R. H. Weichbrod, G. A. Thompson, & J. N. Norton (Eds.), Management of animal care and use programs in research, education, and testing (2nd ed.). CRC Press/Taylor & Francis.

Google Scholar
Yampolskiy, R. V. (2011). Leakproofing the Singularity.
Yu, X. (2022). Hidden desires: A unified strategy for defending the desire-satisfaction theory. Utilitas, 34(4), 445–460. https://doi.org/10.1017/S0953820822000309

Article Google Scholar
Yudkowsky, E. (2011). Artificial Intelligence as a positive and negative factor in global risk. In N. Bostrom & M. Cirkovic (Eds.), Global Catastrophic Risks. Oxford University Press.

Google Scholar

Download references

Acknowledgements

For helpful feedback and discussion, we would like to thank the following people: Nick Bostrom, Patrick Butlin, Lucius Caviola, Adrià Rodríguez Moret, Brad Saad, Derek Schiller, Carl Shulman, Jonathan Simon, and the participants of the Future of Humanity Institute reading group.

Author information

Authors and Affiliations

Eleos AI, Berkeley, CA, USA

Robert Long
New York University, New York, NY, USA

Jeff Sebo & Toni Sims

Corresponding author

Correspondence to Robert Long.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Long, R., Sebo, J. & Sims, T. Is there a tension between AI safety and AI welfare?. Philos Stud 182, 2005–2033 (2025). https://doi.org/10.1007/s11098-025-02302-2

Download citation

Accepted: 12 February 2025
Published: 23 May 2025
Version of record: 23 May 2025
Issue date: July 2025
DOI: https://doi.org/10.1007/s11098-025-02302-2

Is there a tension between AI safety and AI welfare?

See Also

Taking AI Welfare Seriously

A Field Guide to AI Safety

Claude Opus 4 and 4.1 Can Now End a Rare Subset of Conversations

Is there a tension between AI safety and AI welfare?

Abstract

Similar content being viewed by others

1 Introduction

2 Potential tensions for AI safety and AI welfare

2.1 Constraint

2.2 Deception

2.3 Surveillance

2.4 Alteration

2.5 Suffering and death

2.6 Disenfranchisement

3 Are willing AI servants the solution?

4 Conclusion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords