AI researchers and practitioners are making good progress on devising and fielding vital AI safety mechanisms and approaches.
getty
In today’s column, I closely examine the latest state-of-the-art in AI safety, as exemplified and showcased at an outstanding workshop undertaken by the renowned Center for AI Safety at Stanford University.
The situation is this. Society at large is grappling with the need for AI to be safe. We require the best minds to concentrate on discovering and inventing suitable methods and technologies that will get us there. It is going to be a long and arduous journey. Fortunately, cutting-edge research such as the impressive work at the Stanford Center for AI Safety palpably highlights where we are, where we are going, and importantly, reveals how AI safety can be devised in the lab and ultimately become a real-world daily practice.
Let’s talk about it.
This analysis of AI breakthroughs is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining various impactful AI complexities (see the link here).
Background On AI Safety
Before we get into the details of the recent workshop, I’d like to briefly provide a contextual background on the overall topic of AI safety.
It is popular these days to refer to AI safety as consisting of two major considerations, often depicted as though they are distinct and disparate, which I assert are actually two sides of the same coin. Allow me to explain.
Consider these crucial aspects about AI safety:
- Make safer AI: This envisions that we should, from the get-go, devise AI that is the safest we can get it to be. Build AI with the mantra of safety at the top of mind for all AI developers.
- Make AI safer: This emphasizes that once AI has been released into production, ensure that AI guardrails will activate and seek to keep AI within appropriate safety conditions, avoiding unsafe actions.
- Do both (i.e., make safer AI and make AI safer): The two above approaches are not at loggerheads with each other; in fact, they need to be sensibly integrated to try and achieve maximum viable AI safety.
Let’s unpack that.
Some focus on making safer AI, such as ways to build AI that will be as safe as possible while working in the real world. That’s significant and highly welcomed. Others are often focused on making AI safer and do so by incorporating AI safeguards that kick into gear while AI is actively running and performing real-world actions. These are the decisive backstops that aim to keep AI within safety parameters and avert unsafe actions by the AI.
For whatever reasons afoot, at times, some seem to cling to one camp and not give much airtime to the other camp. You are either building AI safety from the ground up, or you are not. You either have AI safeguards that manage to activate at runtime, or you don’t have such triggers in your AI. The two camps, regrettably, will often be preoccupied with their own preferences or proclivities and not seem to appreciate or embrace the bigger picture.
The bigger picture is that we must pursue AI safety in a multitude of means and dimensions. It is too important a topic to splinter into factions. We must combine both the desire to make safer AI and make AI safer. This adage can be said the other way too, namely, we can walk and chew gum at the same time, making AI safer and making safer AI.
They are two peas in the same pod.
Stanford Workshop Goes The Distance
I recently attended the annual AI Safety workshop that was undertaken by the globally recognized Stanford Center for AI Safety and was once again impressed by the state-of-the-art efforts taking place. Readers might recall that I have previously discussed the research taking place there, such as my coverage at the link here. This latest event occurred on September 22, 2025, at the Stanford University campus.
At the opening of the event, Dr. Mansur Arief, Executive Director, provided initial remarks to set the stage for the AI safety conference. This included the stated mission of the Center for AI Safety: “The mission of the Stanford Center for AI Safety is to develop rigorous techniques for building safe and trustworthy AI systems and establishing confidence in their behavior and robustness, thereby facilitating their successful adoption in society.”
A key element that I like to point out is the notability of aiming to develop rigorous practices. There are plenty of non-rigorous or ad hoc approaches for AI safety that are wantonly being floated around. Though perhaps such endeavors are well-intended, they tend to fall apart at the seams. Without sufficient rigor, AI safety can appear to be adding safety, but the reality could be that the AI is not especially safer and even possibly worse off. People might falsely assume that AI incorporates safety, leading them to inappropriately rely on the AI.
Having rigorous AI safety is a top priority. Characteristics include that the devised AI safety is highly reliable, verifiable, repeatable, and otherwise meets strict metrics and measurability. We don’t want hand-waving when it comes to AI safety.
For more info about the Stanford Center for AI Safety, refer to their website at the link here.
Thanks also go to Dr. Joseph Huang, Executive Director of Strategic Research Initiatives at the Computer Science Department, for his energetic activity in successfully bridging Stanford’s campus pursuits with numerous outside companies that are advancing AI. In my experience as a former professor at the University of Southern California (USC), where I also headed a pioneering AI lab, I too found that blending the academic side with business-side practitioners was a surefire synergy for all parties.
Content Packed With Stirring Interaction
I tend to prefer workshops that blend a combination of talks with sufficient time for interaction and making new connections. The design of this event hit that high bar. Nice job.
To give you a taste of what was covered, here are some of the presentations that took place:
- Somil Bansal (Stanford) on “Towards Open-World Safety for AI-Enabled Autonomy”
- Chen Wu (Waymo) on “Waymo’s Approach to AI Safety”
- Riccardo Mariani (NVIDIA) on “Safety of Physical AI: Standardization Landscape and Architectures”
- Jerry Lopez (Torc) on “Challenges and Targeted Research in AI Safety for Level 4 Autonomous Systems”
- Jose Blanchet (Stanford) on “Making Good Decisions with Incorrect Models”
- Clark Barrett (Stanford) on “Verifiable Code Generation”
- Panel on “AI Safety Challenges and Opportunities” that included panelists Akshay Chalana (Saphira), Ben Zevenbergen, Tobin South (Stanford), Lindsey Gailmard (Stanford), and moderator Max Lamparth (Stanford).
- Panel on “Global AI Safety Collaboration and Policy Making” that included panelists Mathilde Cerioli (EveryoneAI), Mariami Tkeshelashvili (IST), Ellie Sakhaee (Google), and moderated by Kiana Jafari (Stanford).
- Opening and closing remarks about AI Safety by Mansur Arief (Stanford)
One aspect that I often hope to see at these kinds of events is opportunities for students to get some visibility.
I’ve found this tactic to be helpful to their budding careers and keeps their spirits heartened during the daily grind of completing their graduate degrees. Sure enough, there was a poster session provided for the students to display their research, and they had a moment in the sunshine by sharing a quick snippet of their research on the big stage. I heartedly wish the best to them — we’ll be looking to see them become the next set of movers and shakers in the field of AI safety.
AI Safety And Trending Use Cases
I don’t have enough space here to cover all the talks and presentations, so I’ll instead pick one to do a bit of a deep dive as a prime example of the latest research that is underway. If there is reader interest, I’ll cover additional talks in subsequent postings.
First, I’ve been extensively exploring AI safety throughout my postings and have highlighted an emerging and somewhat eyebrow-raising yet exciting trend of connecting physical AI with generative AI and LLMs. Physical AI is the moniker being used nowadays to refer to AI-powered physical instantiations such as humanoid robots, self-driving cars, and other tangible artifacts. Some are beginning to apply generative AI to physical AI; see my in-depth analysis at the link here.
For example, suppose that you have a humanoid-style robot in your home that can do simple tasks such as walking around and doing your laundry, or putting away items scattered around the house. If we combine this robot with generative AI, there are nearly endless possibilities of what the robot might assist you with. One outsized aspect is that the walking-talking robot could potentially advise you on mental health issues (via generative AI), doing so while accompanying you throughout your domicile (see my analysis of this consideration, at the link here).
Overall, we can apply AI safety precepts to three prominent emerging use cases:
- Physical AI: Devise AI safety that aids physical machines, such as robots or self-driving cars, from maneuvering into unsafe conditions.
- Generative AI: Devise AI safety so that generative AI doesn’t veer into unsafe circumstances, such as inappropriately advising a user to harm themselves or harm others.
- Dual-connected Generative AI + Physical AI: Devise AI safety that has a dual purpose of ensuring safe actions for both physical AI and its associated generative AI.
Let’s explore this.
The Robot And The Toddler
I will begin with a physical AI instance, and we’ll then add generative AI into the scenario.
Consider the circumstances of a humanoid-style walkable robot in the home. This is a physical AI instance. Safety is crucial. We certainly do not want the robot to inadvertently walk into a toddler who is standing in the living room of the home. That would be unsafe. So, the AI safety element would be to have the AI-powered robot detect that the toddler is nearby and seek to avoid walking into the child.
This is not as easy as it seems. If the child is standing still, the calculus of avoiding the child is likely somewhat easier than if the child is in motion. A toddler in motion is a lot harder to avoid since we can’t be sure where the child is going to be at any particular moment in time. Projections and predictions come to the fore.
Complications get worse when there are any potential disturbances that might enter the dynamics of the situation. Imagine that the household has a dog. The AI safety component is trying to gauge where the toddler is, where the toddler will be, and must also deal with the possibility that the dog will intervene. The robot might try to avoid the dog, and in so doing, end up angling toward the toddler. Not good.
How can we contend with these challenges?
Reachability And AI Safety
In an insightful presentation at the Stanford workshop, Dr. Somil Bansal addressed the complexities of AI safety in open-world environments. I will give you a taste of the topic in my own words that are generally based on his keen remarks.
Let’s use my toddler scenario. Assume that we have a robot wandering around a house that is allowed to move wherever needed. This could be construed as an open-world environment. A closed world would be if we constrained the robot to a particular track in the house, and it could only stay rigidly on that track. Instead, we are going to allow the robot to move about freely.
We want to devise some form of AI safety that will continually calculate reachability regarding this robot. Can the robot safely navigate to the other side of the living room without bumping into the child? What if the child suddenly starts moving and totters this way or that way? Suppose the dog rushes into the living room and gets into the dynamics of the situation as a kind of disturbance? And so on.
Dr. Bansal has been doing innovative research on the use of deep learning to solve reachability problems. His talk further extended the frontiers of his work. You can refer to a prior paper that he co-authored, entitled “DeepReach: A Deep Learning Approach to High-Dimensional Reachability” by Somil Bansal and Claire J. Tomlin, arXiv, November 4, 2020, which contains these salient points (excerpts):
- “In reachability analysis, one computes the Backward Reachable Tube (BRT) of a dynamical system. This is the set of states such that the trajectories that start from this set will eventually reach some given target set despite the worst-case disturbance (or an exogenous, adversarial input more generally).”
- “As an example, for an aerial vehicle, the disturbance could be wind or another adversarial aircraft flying nearby, and the target set could be the destination of the vehicle. The BRT provides both the set of states from which the aerial vehicle can safely reach its destination and a robust controller for the vehicle.”
- “Conversely, if the target set consists of those states that are known to be unsafe, the BRT represents the states from which the system will end up in the target set for some disturbance, despite the best possible control efforts. Thus, the BRT contains states which are potentially unsafe and should therefore be avoided.”
- “In this work, we propose DeepReach, a deep learning based approach to approximately solve high-dimensional reachability problems.”
Recasting those points into my toddler example, we want an AI safety component that would assess the reachability safety associated with the robot walking to the other side of the living room. There needs to be calculations about the trajectory of the robot, including where the toddler is, where the toddler might move to (dynamics), and anticipation that the dog might intervene (a disturbance).
The Math Involved Is Tricky
The AI safety component would calculate not merely a pinpoint aspect, but must be more robust by establishing an entire set of possibilities. Multiple future states would be predicted as safe. Likewise, multiple future states would be predicted as unsafe. Once the robot gets underway, all of this is a dynamic situation, and the AI safety component needs to recalibrate the reachability on the fly while acting in real-time.
In a classic mathematical sense, you need to rapidly cope with complicated partial differential equations, referred to as PDEs, and uncover both satisfactory and optimal routing solutions. A clever means to derive those solutions is by training an artificial neural network (ANN) to do so (see my explanation about artificial neural networks, at the link here). The devised ANN essentially acts as a representation of a safety value function that we can use to dynamically determine reachable states that are safe and avoid sets that are unsafe.
The use of machine learning and ANNs is a great way to build AI safety components that can be sophisticated and flexible to accommodate the wide varying array of open-world environments.
Bringing Generative AI Into The Mix
My scenario of the toddler was focused on physical AI.
Shifting gears, consider the everyday use of generative AI such as ChatGPT, Claude, Gemini, Llama, Grok, etc. When a user engages in a conversation with generative AI, they can end up in a dour situation. I’ve discussed at length that LLMs can foster delusional thinking in the mind of a user by collaborating on the co-creation of human-AI delusions, see my analysis at the link here. This is an unsafe use of AI.
We can apply the same reachability considerations to generative AI as we do for physical AI. Let’s see how.
A user starts to engage ChatGPT in a discussion. If an AI safety component is at work, a predicted set of future states can be derived, such as whether the dialogue might move toward the crafting of human-AI delusions. There are safe zones to be calculated that avoid this malady. There are unsafe zones that can be calculated and foretell that the delusional engagement might arise.
As an AI safeguard, ChatGPT could lean into that kind of AI safety and reachability derivation. The generative AI would presumably use that guidance to steer away from any delusional co-collaboration. This is happening behind the scenes, or shall we say under the hood, with the AI safety component quietly calculating ways for the conversation to safely proceed. It would need to be dynamic and updating in real-time since the user is likely to suddenly enter prompts that could veer the interaction into a safety-related danger zone.
Finally, if we have physical AI that is paired with generative AI, all these AI safety aspects apply to both elements accordingly. We want to devise AI safety for physical AI. We want to devise AI safety for generative AI. And we want AI safety that recognizes and deals with the dual partnering of physical AI and generative AI.
AI Safety Is Paramount
AI is becoming ubiquitous.
All aspects of our lives are going to be entrenched in some kind of contact or interaction with AI. Society needs to have AI that is safe, or at least as safe as we can make it. The rubric of making safe AI and making AI safe is a golden rule. AI researchers and AI practitioners need to put their minds together and find sensible and implementable ways to bring AI safety into day-to-day reality on all fronts and in the best possible ways.
As Benjamin Franklin wisely noted: “An ounce of prevention is worth a pound of cure.”
That’s what AI safety is all about.