Google (NASDAQ: GOOGL) has announced its latest artificial intelligence (AI) models that give its robots the ability to perceive, reason, and learn, advancing its quest to bring AI agents into the physical world. The new models also enable the robots to search the Internet and share learned skills with other agents.
Meanwhile, Microsoft (NASDAQ: MSFT) is accelerating its AI reinvention, with CEO Satya Nadella admitting he’s “haunted” by the impending danger that the technology poses to his company’s market dominance. The Redmond-based company announced its first AI models under its in-house AI division, but Nadella says this might not be enough to preserve the company’s position as the world’s largest tech giant.
Google’s reasoning robots
AI has taken great leaps since ChatGPT debuted to a historic reception three years ago, with two in three people now using the technology in the workplace. However, the technology’s application in the physical world has remained limited because the leap from code to machines has been costly and complex.
Not anymore. Google has launched Gemini Robotics 1.5 and Gemini Robotics-ER 1.5, its two latest models in its Gemini Robotics family, which merge language, vision, and action, enabling robots to understand and interact with their physical surroundings.
Gemini Robotics 1.5 is a visual-language-action (VLA) model that DeepMind says is the most capable in the market. It enables the robots to process instructions and visual information into motor commands. The robot can ‘think’ before taking an action, and all the decisions are documented for easy auditing.
Gemini Robotics-ER 1.5, on the other hand, is a vision-language model that handles the reasoning and creates the steps required for the robot to complete an assigned task. DeepMind describes this model as a “high-level brain” optimized to make logical decisions in the physical world. This model can also interact with elements around it in natural language and search the Internet for any information.
Essentially, Gemini Robotics-ER 1.5 reasons, perceives, learns, and seeks the best tools for a task, then directs Gemini Robotics 1.5 through natural language instructions.
According to DeepMind, its Gemini Robotics-ER 1.5 beat all its rivals from OpenAI and more, emerging as the leader on 15 academic benchmarks.
Beyond reasoning, the new models give the robots the ability to learn across embodiments, a crucial factor given their vastly different shapes, sizes, and use cases they come in. DeepMind says the models can transfer motions learned by one robot to all the others.
The models “mark an important milestone towards solving AGI in the physical world,” the company says.
“This is a foundational step toward building robots that can navigate the complexities of the physical world with intelligence and dexterity, and ultimately, become more helpful and integrated into our lives,” it added.
Microsoft’s AI reinvention
Microsoft is one of the largest players in the AI revolution, with a significant stake in OpenAI, a dominance in enterprise AI hosting on Azure, the widely used Copilot, and a slew of partnerships and acquisitions.
However, CEO Nadella says the company faces an existential risk from AI. In a recent town hall meeting with employees, Nadella said he’s constantly worried that the company might be rendered obsolete by new AI tools if it fails to keep up. He referred to Digital Equipment Corporation as an example; the American firm was a pioneer in developing personal computers, but was overtaken by rapid advancements led by IBM (NASDAQ: IBM) and others.
Nadella’s concerns come on the backdrop of layoffs that affected 9,000 employees in July, its biggest in a decade. Internal documents revealed that the layoffs were intended to facilitate a pivot to becoming an AI-focused company.
As Nadella pointed out, the company wants to position itself as the industry leader in AI, and recently, it launched its first two AI models under its in-house AI division, known as Microsoft AI.
The two—MAI-Voice-1 AI and MAI-1-preview—are Microsoft’s first stab at in-house AI models after years of relying on the affiliated OpenAI and others. According to Mustafa Suleyman, who leads Microsoft AI, they are the first of many as the company seeks to compete with other market leaders.
“We are one of the largest companies in the world. We have to be able to have the in-house expertise to create the strongest models in the world,” he told one outlet.
MAI-Voice-1 AI is a natural speech generation model, and the company says it can generate a one-minute audio in under a second on one GPU. The model is now powering Copilot Daily and Podcasts.
MAI-1-preview targets text use cases and is being integrated into the Copilot AI assistant, which has been relying on OpenAI’s models.
In order for artificial intelligence (AI) to work right within the law and thrive in the face of growing challenges, it needs to integrate an enterprise blockchain system that ensures data input quality and ownership—allowing it to keep data safe while also guaranteeing the immutability of data. Check out CoinGeek’s coverage on this emerging tech to learn more why Enterprise blockchain will be the backbone of AI.
Watch: Adding the human touch behind AI
title=”YouTube video player” frameborder=”0″ allow=”accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share” referrerpolicy=”strict-origin-when-cross-origin” allowfullscreen=””>
Source: https://coingeek.com/google-bots-reason-search-web-microsoft-speeds-up-ai-reinvention/