Why the Turing Test is not reliable

At the beginning of the computer age Alan Turing proposed his famous test to determine whether a text was composed by a machine or by a person. The idea has been expanded as a test for the presence of “sentience” in AI, and in Large Language Models in particular. Claims have been made on several occasions that a chatbot has “passed” the Turing Test—that is, that its language behaviour is indistinguishable from the language behaviour of a human being. That is a far cry from a proof of sentience or consciousness. It does, however, demonstrate the vulnerability of humans to manipulation through language.

Language is essential to society and to being human-like, but it is not sufficient for consciousness. The funnel through which most of our interactions pass, it is the tacit sign we use to judge each other’s sentience. I know myself to be “conscious,” but how do I know that you are? How do I know that you experience pain, for example? The polite way to find out is simply to ask. But the very fact that you answer already stands for me as evidence of your sentience, especially if you reply affirmatively in my own native language. If you fail to respond, I might conclude you are deaf, or make some other excuse for you on the basis of our similar physiology (you look human). If you reply in gibberish, I might think you are insane but not inhuman. In other words, I have reasons beside your verbal output to assume that you are conscious like me. I could test your responses in other ways—pricking you with a needle, for example, to see if you respond in ways associated with feeling. Even that would not be conclusive. You could have a neurological condition that blocked such feeling. Or, you could be a machine programmed to respond like humans to needle pricks and other aggravations.

The tendency to assume personhood has not prevented people from denying it to others when convenient. There is a tacit agreement to assume similarity of subjective experience among one’s own kind and to accord each other rights and courtesies based on that assumption. That has never been universal, however. War and cannibalism have typically denied human status to “others” within what we moderns recognize as our species, and throughout history sentience has hardly been assumed for other species. Whether I even care about your experience depends on my willingness to empathize, to put myself in your shoes. To be sure, in modern times, at least theoretically, the circle of moral concern has expanded; it now includes concern over the potential sentience of other creatures and even AI. Nevertheless, it still remains a theoretical question, as revealed in the philosophical concept of the zombie: could a being be physically and behaviourally identical to a human person and yet lack consciousness? I believe the answer is no. But the question is more important than the answer. For, it depends on what exactly is meant by “identical.” To rephrase the question: how similar must a “system” be to a human person (i.e., me or you) to assume “it” is sentient or conscious (like us)? But a second question also arises: why is sentience or consciousness so important as a moral criterion? Why does it even matter?

We are indeed identified with our waking experience, which counts as the essence of what we are as perceiving subjects. So attached are we that humans have traditionally abhorred mortality as the end of the body yet denied that it is the end of consciousness. It has therefore seemed plausible that a person’s experience counts more than their bodily physical state. We naturally abhor pain and even think of putting suffering creatures mercifully out of their misery, as though that accomplished something separate from the destruction of the creature’s life. While euthanasia is politically incorrect, we have developed pain- or symptom-relieving drugs, palliative care, voluntary programs for “dignity in dying.” In short, we are biased toward subjective experience, perhaps more than the objective state of health.

Subjective and objective are two sides of a coin—or, rather, two ways of perceiving the same thing. If my body does not feel good, it probably means there is something objectively wrong within it. The sensation of discomfort or pain is the body’s “subjective” report (to “me”) on its internal state or on its relationship to the external world. (Conversely, “I” am the agent immediately responsible for doing something with that information.) A biologist or medical specialist may have a different way to access that same information, through “objective” channels—that is, through examination from a third-person point of view.

Is it the report itself—whether subjective or objective—that is important, or the state of being it reports about? Because my senses are attached uniquely to my brain, my internal “reports” (such as pain) are urgent for me in a way that they cannot be for the doctor or biologist. Like any outside observer, the latter must rely on the intermediary of empathy and social convention to be motivated toward my well-being in the way that I am directly.

How does this apply to AI? Well, essentially the same way that it applies to other creatures and even other humans: through our possible empathy and social convention. Underwriting that, however, must be essential similarity. That’s where the Turing Test fails: in the case of Large Language Models, it detects only linguistic similarity. The “funnel” of language is far too narrow to be a judge of sentience. The constraints of language have been a vulnerable point for human beings from the start, since language has always served to deceive as well as to inform. The ability of LLMs to participate in a conversation proves only the ability to manipulate symbols, which is what computers do by definition. It is no proof of sentience. On the contrary, it exploits the natural human willingness to believe that an agent responsive to language must be conscious if not human. This willingness was demonstrated by The ELIZA Effect (ELIZA was an early psychotherapy chatbot), which shows that humans are programmed to personify nearly anything at the drop of a hat. We already knew this, of course, from the play of children, from animism, etc.

Glib use of mental terms in philosophy, and especially in the AI community, does not help this situation. The very idea of intelligence is ill-defined, even as “the ability to accomplish goals,” since it does not specify whose goals are involved. Tools do not have minds or goals of their own. Some tools may “learn,” in the sense that their ability to accomplish goals specified by people can improve. A sophisticated tool like an AI may seem to be “intelligent” because it can mimic the intelligence of human beings while helping them accomplish their goals. While it may seem “agentic” (a new buzzword), an AI is either a genuine agent or not. The use of this word as an adjective—implying some vague degree of agency—simply reflects confusion about agency. The ultimate human goal lurking behind the ambiguity of the intelligence concept may be to create tools that are no longer tools but effectively autonomous minds, yet under our thumbs: AI “agents” that are, for all practical purposes, artificial slaves. However, you cannot expect a slave that is more intelligent than you to remain loyal or obedient.

No doubt progress will continue to make LLMs ever more convincing as conversationalists. Without themselves thinking or feeling, they can stimulate thought and feeling (and a sense of companionship) in human beings. Combined with robotics, we may produce ever more convincing androids. Why we should (or should not) bother to do this is one question, which deserves a book-length answer. Whether and under what conditions they can be conscious is another question. And whether and how we should extend to them the moral concern we have for homo sapiens and some other creatures is yet another. In any case, the conventional Turing Test is useless and irrelevant. What is needed instead is a way to evaluate whether the AI has a mind of its own. For that, it would need effectively to have a body of its own: to be an artificial organism, bearing the relationship of embodiment that natural organisms bear to their environment.

To be a mind, the AI must have a basis for caring about its own physical state, which is provided for biological organisms through natural selection. (Only those creatures can exist that do care). It must have a stake in its own existence. LLMs can be coached to produce the appearance of having such a stake, based on mimicking human responses gleaned via the Internet or other data bases. But consulting human data bases is not the same interaction that a living creature has with its environment. The LLM has no senses of its own, nor (mercifully not yet) any motor power of its own other than the ability to interact with humans through language. It has no body to care about, no real world to live in, and therefore no basis for consciousness.

This does not absolve us of moral responsibility. Even for us, after all, sensation and emotion are not just personal entertainments but readouts on the state of the system. If an AI were conscious, its experience—like ours—would include an assessment of its own embodied state. If it could experience suffering, that would be an indication (to it) that something internal was wrong. Just as we can assess each other’s bodily condition (and those of other creatures) from a third-person point of view, so we could assess the physical well-being of an embodied AI. We should be as morally concerned for its real (embodied) welfare as we would be morally concerned for any possible experience it might have. This same moral reasoning should apply to other creatures and to human beings, including oneself. What matters is not just how we feel, but also the real condition that the feeling tells us about. That is irrelevant, of course, if there is nothing an AI (or we) could consider its body. Indeed, embodiment (and thus sentience) for AI must be avoided if human beings wish remain in control of their tools.

Facebook Tweet Pin