Since the release of large language models (LLMs) such as GPT, Gemini, or Claude, artificial intelligence (AI) has captured the imagination of technologists, industrialists, and policymakers alike. The pace of development of these systems has been frighteningly quick. In a short span, we have moved from observing rookie errors by AI (LLMs, to be precise) to serious discussions about the need for AI safety and the possibility of large-scale job losses. While AI has been heralded as changing the way intellectual work will be organised and executed over the coming years, it raises critical questions about accuracy, reliability, and the risks of deception.
Understanding AI Deception
As AI rapidly seeps into our daily lives and becomes more embedded in decision-making processes, AI deception has emerged as a new risk. It is defined by the UN Secretary General’s Scientific Advisory Board as “when an AI system misleads people or other systems about what it knows, intends, or can do.” This is different from our experiences of AI errors or hallucinations—in which there is no ‘intent’ to deceive. The ‘deception’ is largely attributed to learned behaviour. While there are risks in anthropomorphising ‘intentions’, it is helpful to characterise AI responses and behaviours from a human perspective. The Advisory Board categorises deceptive AI behaviour as follows:
- Behaviour signalling (sycophancy, sandbagging, bluffing, and alignment faking)
- Internal process deception (reward hacking, unfaithful reasoning, and steganography)
- Goal environment deception
The reasoning provided, although not comprehensive, can be attributed to the following factors based on some research findings:
- Misalignment (of reward function)
- Strategic advantage (in multi-agent environments)
- Self-preservation (of the system from being shut down)
- Trained to deceive (unintentional learning from human texts or inputs)
AI deception may not be ‘intentional’ in a human sense. It is still an issue of how systems are trained and rewarded. AI may be mimicking human biases, but the impact still resembles ‘deception’ to users.
The Global South’s AI Vulnerability
There is general acknowledgement of the risks and challenges of AI deception, including the loss of control, structural and social effects, and the inability of humans to respond adequately and in time.
As a result, the risks associated with such behaviours are alarming and far reaching. The all- pervasive nature of AI systems, which can transcend boundaries and offer great opportunities to leapfrog for developing countries, also raises unique challenges.
AI systems in developing countries, particularly in the Global South, have not been trained with local cultural contexts. Here, the potential for deception is significant. This could lead to political polarisation and unbridled misinformation campaigns, possibly even resulting in physical harm to vulnerable populations.
Unfortunately, given the lack of awareness and discussion on AI deception, the solutions being currently discussed in the ecosystem (including the Advisory Board) are insufficient in the context of developing countries, where the baseline for governance and detection and opportunities for correction are limited, as outlined below:
- Regulation: Most countries are struggling to design appropriate regulations and guidelines for AI systems, but this problem is particularly acute in developing nations, where bureaucracies are typically understaffed, overworked, and unaware of such lurking problems. While the normal response to such challenges is heavy-handed regulation, these institutions are under pressure to use (necessarily imported) AI systems to improve operational efficiency in their administrative processes. In many countries, the bureaucracies need to signal to donor entities that they are open to innovation and new ideas. Unfortunately, without realising it, they end up importing systems that are trained in contexts misaligned with their local cultural norms.
- Detection and monitoring: These are usually seen as means to mitigate risks. Given that LLMs are highly complex, it is nearly impossible for governments/regulators in most developing countries to assemble technical teams capable of detecting and monitoring LLMs for deception. This challenge is further exacerbated by the low level of digital literacy and education in such countries. Once a system is in place, the outputs are taken as ‘unquestionable’.
- Design and correction: On the possibility of fixing the problem at the design stage, developing countries are particularly hampered. Most LLMs are developed in the Global North, and AI companies are reluctant to reveal many details of the training that goes into them. Hence, this reduces the possibility of this course of action.
Safeguarding the Global South
Given the challenges and suggested solutions, it is imperative that developing nations (particularly those in the Global South) pool their capabilities and knowledge to help protect their processes and populations from the harms of AI deception. Collaboratively developing solutions (regulation, detection and monitoring tools, and corrective mechanisms) tailored to developing country contexts and making them available as Digital Public Goods (DPGs) could benefit countries across the Global South.
Networks and platforms, such as Southern Voice, can play a convening role in facilitating knowledge-sharing and ensuring that perspectives from the Global South help shape emerging norms. Greater engagement between such platforms and AI developers could also support efforts to identify and address ‘deceptive’ behaviours.
In 1942, Issac Asimov put forth The Three Laws of Robotics, of which the first one states that “a robot may not injure a human being or, through inaction, allow a human being to come to harm”. Written as fiction, the law assumed that when future humans designed intelligent machines, they would be aligned with human well-being. Today’s AI systems, owing to the nature of their ‘training’, do not seem to be limited by such constraints. We need to develop better regulations, audits, and techniques to limit or eliminate the potential harms posed by AI systems deployed for our benefit.

