Artificial Intelligence is evolving beyond narrow, task-specific applications into agentic AI—systems capable of making autonomous decisions, adapting to dynamic environments and taking independent actions to achieve goals. This paradigm shift presents unprecedented opportunities for automation, efficiency and innovation. However, as organisations move toward deploying AI agents in critical operations, technology leaders must address several fundamental concerns.
For CTOs and tech executives in general, the question is no longer whether to implement agentic AI but how to do so responsibly and securely. The risks of unchecked autonomy, biased decision-making and unpredictable behaviour demand a structured approach to AI governance, validation and human oversight.
This article explores the core challenges of agentic AI, backed by real-world case studies, and outlines the best mitigation strategies to ensure safe, accountable and effective AI deployment.
In 2023, Samsung engineers inadvertently leaked confidential company code by using ChatGPT to optimise their programming scripts. The AI model retained sensitive trade secrets, which could have been accessed by OpenAI or other users, highlighting the risks of AI-enabled data leaks.
When users share data with AI chatbots, it is stored on the servers of companies like OpenAI, Microsoft and Google—often without a straightforward way to access or delete it. This raises concerns about sensitive information being shared with chatbots like ChatGPT that could unintentionally become accessible to other users.
By default, ChatGPT saves chat history and uses conversations to improve its models. While users can manually disable this feature, it’s unclear whether the setting applies to past conversations retroactively or if it’s working at all because it is virtually impossible to audit data that OpenAI and other providers use to train their models.
Technology leaders face a dilemma here: We either act in good faith and use products or ban the use of Gen AI tools as Samsung did. If we do use those products, we must accept three possibilities:
That’s, unfortunately, the reality because we have limited control over data protection when using a third-party SaaS. But what can we do to prevent Agentic AI systems from acting erratically?
Agentic AI systems and AI in general could act unpredictably. Often, this refers to pursuing objectives misaligned with our intentions. This concern is even more emphasised in high-stakes scenarios because we entrust a complex code with the “black box” feature to make decisions on our behalf.
The malfunctioning can cause an array of implications. For example:
On March 18, 2018, an Uber self-driving test vehicle in Tempe, Arizona, struck and killed a pedestrian, Elaine Herzberg. This was the first recorded fatality involving a fully autonomous vehicle, raising serious concerns about loss of control in AI-driven systems. The vehicle’s onboard AI was designed to detect and react to obstacles autonomously, but a failure in decision-making and override mechanisms led to a tragic accident.
The AI incorrectly classified the pedestrian as an unknown object rather than a human, delaying its response. To make things worse, Uber had disabled the vehicle’s built-in emergency braking system, relying entirely on AI-driven decision-making. However, the system was tuned to reduce false positives, meaning it hesitated before deciding to stop which turned out to be a fatal miscalculation.
A human safety driver was present but not paying attention at the critical moment, as AI was expected to handle the situation. The software did eventually order the car to brake 1.3 seconds before the collision but it was too late.
This incident just goes to show that blind reliance on Agentic AI — programmed by humans — can have devastating outcomes.
A good example is OpenAI’s approach with reinforcement learning from human feedback (RLHF). This method uses active human guidance to shape the system’s behaviour, ensuring that its autonomous decisions align with human intentions.
In autonomous vehicle development, for example, companies like Tesla include manual steering wheel overrides, allowing drivers to take control when necessary.
IBM’s Watson Health, for example, uses explainable AI to assist doctors in diagnosing diseases by showing the reasoning behind its recommendations. The approach builds trust in its outputs because users have more control over the AI.
A good example here is DeepMind’s AlphaGo which was tested in millions of simulated games. The extensive training allowed researchers to fine-tune its behaviour and prevent erratic strategies.
As much as it can be difficult sometimes, following industry standards and regulatory frameworks ensures the safe development and deployment of agentic AI. That said, both developers and end users should continuously work with policymakers and standards organisations to enforce safety protocols and regular audits.
And the prerequisite for that is monitoring and updating; in other words, deploying systems with continuous monitoring capabilities to detect and address deviations from expected behaviour. For example, AWS and Azure allow developers to update and retrain deployed models to maintain performance and control.
Agentic AI systems face ethical dilemmas, such as deciding whose safety to prioritise or whether to follow instructions that conflict with moral principles. Decisions may not align with societal values, leading to public backlash or regulatory scrutiny.
In 2016, Facebook experienced this backlash when the company faced criticism after its News Feed algorithm inadvertently promoted fake news and divisive content, raising concerns about the ethical implications of its design. It was a blatant example of a total lack of oversight of the algorithm’s impact on public discourse and a complete absence of ethical considerations. The algorithm simply prioritised engagement over truth.
To mitigate this, Facebook implemented fact-checking partnerships with third-party organisations to address misinformation and started conducting regular ethical reviews to identify and mitigate unintended harms. Additional tools were developed to prioritise high-quality information and limit the spread of harmful content.
Google’s AI Principles explicitly prohibit building AI systems that cause harm or reinforce bias, ensuring ethical guardrails. They collaborated with ethicists, domain experts and diverse stakeholders to define moral principles and embed them into the AI’s decision-making algorithms.
As we already said, OpenAI employed RLHF for ChatGPT, which involves training the model to align its responses with user-defined ethical standards. It is a proven approach to ensure AI systems reflect human values. It is done through regular feedback from diverse groups of users because it’s imperative to have an AI system that reflects a broad range of perspectives.
Microsoft’s AI, Ethics, and Effects in Engineering and Research (Aether) committee regularly reviews the company’s AI projects for ethical risks. The committee conducts regular ethical audits and AI impact assessments (AIIAs) to evaluate the social, environmental and moral implications of AI deployments. This is the practice that can be utilised by every organisation simply by establishing independent review boards to assess ethical risks and provide actionable recommendations.
Already mentioned IBM’s Watson Health faced criticism for recommending different cancer treatments based on biased training data. The company addressed this by revising datasets and involving clinicians in the training process. In other words, to eliminate bias from the algorithms:
Similar to IBM’s example, DARPA’s Explainable AI (XAI) program focuses on developing systems that justify their decisions, enabling users to identify ethical concerns. These systems utilise tools like LIME (Local Interpretable Model-agnostic Explanations) to make AI decisions interpretable and assess their ethical soundness.
Autonomous vehicle companies like Waymo conduct ethical scenario testing to evaluate how their systems handle life-critical situations, that is, whom to prioritise in a potential collision. They do that in simulated environments to explore how they respond to ethical dilemmas before deployment. These simulations mimic real-world ethical conflicts and analyse the system’s decision-making process.
Agentic AI systems can be manipulated, hacked or even weaponised, with autonomous decision-making amplifying their destructive potential. We all saw that ChatGPT-powered gun on YouTube, didn’t we?
In 2020, the SolarWinds cyberattack demonstrated the risks associated with compromised AI supply chains. Malicious actors injected malware into the Orion software platform, impacting thousands of clients, including government agencies.
This case demonstrated a serious lack of robust monitoring in the software update process and insufficient measures to detect and prevent supply chain attacks. To mitigate this and reestablish trust, the company had to implement code-signing practices and enhanced monitoring tools while partnering with security agencies and third-party audits.
We must identify potential threats specific to the AI system and its deployment environment, including adversarial attacks and data poisoning. To achieve that, we can use comprehensive threat modelling techniques, such as STRIDE (Spoofing, Tampering, Repudiation, Information disclosure, Denial of service, Elevation of privilege), to evaluate risks and develop countermeasures.
Google DeepMind, for instance, employs advanced threat modelling for AI systems to assess and mitigate vulnerabilities.
OpenAI adopted secure development practices to minimise risks in GPT-based models, including API rate-limiting to prevent misuse. They employ techniques such as differential privacy and secure multiparty computation to protect sensitive data used in AI training and deployment.
Tesla tests its autonomous vehicle systems against adversarial inputs, such as altered road signs, to ensure the AI behaves correctly in manipulated environments. They use adversarial examples to evaluate how the system reacts to maliciously crafted inputs. These simulations of real-world attacks have two goals:
By default, AI systems should integrate robust monitoring and alert mechanisms, enabling swift responses to potential security threats. They detect anomalies and security breaches that are sent to dedicated incident response teams that utilise protocols to address security incidents as they occur.
Back to basic cybersecurity – limit access to AI systems and their underlying infrastructure using strong authentication methods and role-based access controls. Zero-trust policies are still the best first line of defence.
The additional mitigation strategies are:
It’s often difficult to understand or explain the decisions made by complex AI systems, creating a “black box” problem. This causes challenges in assigning responsibility for errors or harm and complicates regulatory compliance and legal proceedings.
The COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) AI system was used in US courts to predict the likelihood of criminal reoffending. However, an investigative report found that COMPAS was biased against African Americans and lacked transparency in its decision-making. The report identified three major problems:
Based on this case, AI models in legal decision-making now require:
So by implementing explainability, auditing, human oversight, regulatory compliance and stakeholder engagement, AI systems can become more accountable and transparent.
Tesla’s Autopilot system, an advanced driver-assistance AI, has been involved in multiple fatal accidents where drivers over-relied on AI and disengaged from driving responsibilities. Despite the manufacturer’s warning, drivers believed the system was fully autonomous and even ignored alerts prompting them to keep their hands on the wheel.
The problem was that the Autopilot did not always escalate warnings forcefully in the events when drivers became unresponsive.
To solve this issue, Tesla now requires drivers to periodically touch the steering wheel to ensure engagement. The system was also updated to activate more aggressive visual and auditory warnings if the driver fails to take control.
But there is another underlying problem. Over-reliance on agentic AI can lead to the erosion of critical human skills caused by blind trust in automated systems. This can easily lead to system-wide failures when AI malfunctions that can even turn deadly.
AI should assist rather than replace human decision-makers, especially in high-risk sectors. Human operators must maintain their expertise and should not entirely rely on or become dependent on AI. For example, after the Air France Flight 447 crash in 2009, where pilots failed to react properly when autopilot disengaged, airlines introduced mandatory manual flying hours to prevent skill degradation. The same thing could happen to software development and software evolution if we fail to timely address this problem.
To sum up, to prevent dependence and over-reliance on agentic AI, organisations should:
Agentic AI systems may fail to make consistent, accurate decisions in dynamic, uncertain or adversarial environments. Consequently, they may cause catastrophic errors in critical domains.
Regardless, AI-powered chatbots are increasingly used for medical symptom analysis for example. However, AI lacks real-world clinical experience, hallucinates, can fail to identify rare conditions and has no self-checking mechanism. In other words, most LLMs we use daily do not verify their own answers before outputting query results.
Let’s use case studies and real-world examples to see how to improve accuracy so we can rely more on Agentic AI.
Google’s Med-PaLM 2, for instance, initially struggled with accuracy due to biased training data. The company was forced to improve reliability by training on diverse multi-institutional datasets.
Uber’s self-driving car fatally struck a pedestrian in 2018 due to poor real-world validation. Waymo, by contrast, conducted millions of real-world and simulated test miles, reducing failure rates before public deployment. Waymo proved that AI models must undergo rigorous validation and real-world scenario testing before deployment.
IBM Watson for Oncology initially provided incorrect treatment recommendations due to limited training data. The company introduced real-time physician feedback loops, allowing the model to improve through expert corrections. AI could now detect errors and self-correct in real time thanks to feedback loops and improved confidence scoring.
Another way to improve the decision accuracy of Agentic AI is to use multiple AI models. It’s called ensemble learning where multiple models provide independent predictions and vote on final decisions while using backup rule-based systems for high-risk decisions. The best example is NASA’s Mars Rover AI Navigation which uses redundant AI models to cross-validate terrain analysis before making navigation decisions. This prevents mission-critical failures caused by single-model inaccuracies.
Arguably the best approach to developing a reliable and accurate Agentic AI is to force the AI to explain its decisions and flag uncertain predictions for human review. This can be done by incorporating XAI techniques and implementing confidence thresholds that trigger human intervention for low-confidence results. For example, Healthcare AI (DeepMind’s Kidney Disease Prediction) flagged high-risk cases with explainability reports, allowing doctors to verify predictions before acting.
The bottom line is that AI should never operate autonomously in critical situations. In other words, deploy AI as decision support rather than an autonomous agent and mandate manual approval for AI-generated recommendations in high-risk industries. It brings us back to the Boeing 737 MAX MCAS incident where a faulty AI-driven flight stabilisation system overrode pilot inputs, leading to fatal crashes.
To improve reliability and accuracy, organisations should:
Agentic AI presents immense opportunities but also introduces critical risks such as:
To mitigate these, technology leaders must prioritise human oversight, robust security measures and explainability while enforcing strict governance frameworks.
AI should be an assistive tool, not an autonomous decision-maker in high-risk domains. In other words, human expertise remains central.
Success in deploying agentic AI hinges on continuous validation, adversarial testing, regulatory alignment and adaptive learning models. Organisations that proactively address these challenges will drive trustworthy, resilient and high-impact AI adoption, positioning themselves as industry leaders in safe and scalable AI innovation.
90 Things You Need To Know To Become an Effective CTO
London
2nd Floor, 20 St Thomas St, SE1 9RS
Copyright © 2024 - CTO Academy Ltd