AI is developing all the time. Here are some picks from several articles what is expected to happen in AI and around it in 2025. Here are picks from various articles, the texts are picks from the article edited and in some cases translated for clarity.
AI in 2025: Five Defining Themes
https://news.sap.com/2025/01/ai-in-2025-defining-themes/
Artificial intelligence (AI) is accelerating at an astonishing pace, quickly moving from emerging technologies to impacting how businesses run. From building AI agents to interacting with technology in ways that feel more like a natural conversation, AI technologies are poised to transform how we work.
But what exactly lies ahead?
1. Agentic AI: Goodbye Agent Washing, Welcome Multi-Agent Systems
AI agents are currently in their infancy. While many software vendors are releasing and labeling the first “AI agents” based on simple conversational document search, advanced AI agents that will be able to plan, reason, use tools, collaborate with humans and other agents, and iteratively reflect on progress until they achieve their objective are on the horizon. The year 2025 will see them rapidly evolve and act more autonomously. More specifically, 2025 will see AI agents deployed more readily “under the hood,” driving complex agentic workflows.
In short, AI will handle mundane, high-volume tasks while the value of human judgement, creativity, and quality outcomes will increase.
2. Models: No Context, No Value
Large language models (LLMs) will continue to become a commodity for vanilla generative AI tasks, a trend that has already started. LLMs are drawing on an increasingly tapped pool of public data scraped from the internet. This will only worsen, and companies must learn to adapt their models to unique, content-rich data sources.
We will also see a greater variety of foundation models that fulfill different purposes. Take, for example, physics-informed neural networks (PINNs), which generate outcomes based on predictions grounded in physical reality or robotics. PINNs are set to gain more importance in the job market because they will enable autonomous robots to navigate and execute tasks in the real world.
Models will increasingly become more multimodal, meaning an AI system can process information from various input types.
3. Adoption: From Buzz to Business
While 2024 was all about introducing AI use cases and their value for organizations and individuals alike, 2025 will see the industry’s unprecedented adoption of AI specifically for businesses. More people will understand when and how to use AI, and the technology will mature to the point where it can deal with critical business issues such as managing multi-national complexities. Many companies will also gain practical experience working for the first time through issues like AI-specific legal and data privacy terms (compared to when companies started moving to the cloud 10 years ago), building the foundation for applying the technology to business processes.
4. User Experience: AI Is Becoming the New UI
AI’s next frontier is seamlessly unifying people, data, and processes to amplify business outcomes. In 2025, we will see increased adoption of AI across the workforce as people discover the benefits of humans plus AI.
This means disrupting the classical user experience from system-led interactions to intent-based, people-led conversations with AI acting in the background. AI copilots will become the new UI for engaging with a system, making software more accessible and easier for people. AI won’t be limited to one app; it might even replace them one day. With AI, frontend, backend, browser, and apps are blurring. This is like giving your AI “arms, legs, and eyes.”
5. Regulation: Innovate, Then Regulate
It’s fair to say that governments worldwide are struggling to keep pace with the rapid advancements in AI technology and to develop meaningful regulatory frameworks that set appropriate guardrails for AI without compromising innovation.
12 AI predictions for 2025
This year we’ve seen AI move from pilots into production use cases. In 2025, they’ll expand into fully-scaled, enterprise-wide deployments.
https://www.cio.com/article/3630070/12-ai-predictions-for-2025.html
This year we’ve seen AI move from pilots into production use cases. In 2025, they’ll expand into fully-scaled, enterprise-wide deployments.
1. Small language models and edge computing
Most of the attention this year and last has been on the big language models — specifically on ChatGPT in its various permutations, as well as competitors like Anthropic’s Claude and Meta’s Llama models. But for many business use cases, LLMs are overkill and are too expensive, and too slow, for practical use.
“Looking ahead to 2025, I expect small language models, specifically custom models, to become a more common solution for many businesses,”
2. AI will approach human reasoning ability
In mid-September, OpenAI released a new series of models that thinks through problems much like a person would, it claims. The company says it can achieve PhD-level performance in challenging benchmark tests in physics, chemistry, and biology. For example, the previous best model, GPT-4o, could only solve 13% of the problems on the International Mathematics Olympiad, while the new reasoning model solved 83%.
If AI can reason better, then it will make it possible for AI agents to understand our intent, translate that into a series of steps, and do things on our behalf, says Gartner analyst Arun Chandrasekaran. “Reasoning also helps us use AI as more of a decision support system,”
3. Massive growth in proven use cases
This year, we’ve seen some use cases proven to have ROI, says Monteiro. In 2025, those use cases will see massive adoption, especially if the AI technology is integrated into the software platforms that companies are already using, making it very simple to adopt.
“The fields of customer service, marketing, and customer development are going to see massive adoption,”
4. The evolution of agile development
The agile manifesto was released in 2001 and, since then, the development philosophy has steadily gained over the previous waterfall style of software development.
“For the last 15 years or so, it’s been the de-facto standard for how modern software development works,”
5. Increased regulation
At the end of September, California governor Gavin Newsom signed a law requiring gen AI developers to disclose the data they used to train their systems, which applies to developers who make gen AI systems publicly available to Californians. Developers must comply by the start of 2026.
There are also regulations about the use of deep fakes, facial recognition, and more. The most comprehensive law, the EU’s AI Act, which went into effect last summer, is also something that companies will have to comply with starting in mid-2026, so, again, 2025 is the year when they will need to get ready.
6. AI will become accessible and ubiquitous
With gen AI, people are still at the stage of trying to figure out what gen AI is, how it works, and how to use it.
“There’s going to be a lot less of that,” he says. But gen AI will become ubiquitous and seamlessly woven into workflows, the way the internet is today.
7. Agents will begin replacing services
Software has evolved from big, monolithic systems running on mainframes, to desktop apps, to distributed, service-based architectures, web applications, and mobile apps. Now, it will evolve again, says Malhotra. “Agents are the next phase,” he says. Agents can be more loosely coupled than services, making these architectures more flexible, resilient and smart. And that will bring with it a completely new stack of tools and development processes.
8. The rise of agentic assistants
In addition to agents replacing software components, we’ll also see the rise of agentic assistants, adds Malhotra. Take for example that task of keeping up with regulations.
Today, consultants get continuing education to stay abreast of new laws, or reach out to colleagues who are already experts in them. It takes time for the new knowledge to disseminate and be fully absorbed by employees.
“But an AI agent can be instantly updated to ensure that all our work is compliant with the new laws,” says Malhotra. “This isn’t science fiction.”
9. Multi-agent systems
Sure, AI agents are interesting. But things are going to get really interesting when agents start talking to each other, says Babak Hodjat, CTO of AI at Cognizant. It won’t happen overnight, of course, and companies will need to be careful that these agentic systems don’t go off the rails.
Companies such as Sailes and Salesforce are already developing multi-agent workflows.
10. Multi-modal AI
Humans and the companies we build are multi-modal. We read and write text, we speak and listen, we see and we draw. And we do all these things through time, so we understand that some things come before other things. Today’s AI models are, for the most part, fragmentary. One can create images, another can only handle text, and some recent ones can understand or produce video.
11. Multi-model routing
Not to be confused with multi-modal AI, multi-modal routing is when companies use more than one LLM to power their gen AI applications. Different AI models are better at different things, and some are cheaper than others, or have lower latency. And then there’s the matter of having all your eggs in one basket.
“A number of CIOs I’ve spoken with recently are thinking about the old ERP days of vendor lock,” says Brett Barton, global AI practice leader at Unisys. “And it’s top of mind for many as they look at their application portfolio, specifically as it relates to cloud and AI capabilities.”
Diversifying away from using just a single model for all use cases means a company is less dependent on any one provider and can be more flexible as circumstances change.
12. Mass customization of enterprise software
Today, only the largest companies, with the deepest pockets, get to have custom software developed specifically for them. It’s just not economically feasible to build large systems for small use cases.
“Right now, people are all using the same version of Teams or Slack or what have you,” says Ernst & Young’s Malhotra. “Microsoft can’t make a custom version just for me.” But once AI begins to accelerate the speed of software development while reducing costs, it starts to become much more feasible.
9 IT resolutions for 2025
https://www.cio.com/article/3629833/9-it-resolutions-for-2025.html
1. Innovate
“We’re embracing innovation,”
2. Double down on harnessing the power of AI
Not surprisingly, getting more out of AI is top of mind for many CIOs.
“I am excited about the potential of generative AI, particularly in the security space,”
3. And ensure effective and secure AI rollouts
“AI is everywhere, and while its benefits are extensive, implementing it effectively across a corporation presents challenges. Balancing the rollout with proper training, adoption, and careful measurement of costs and benefits is essential, particularly while securing company assets in tandem,”
4. Focus on responsible AI
The possibilities of AI grow by the day — but so do the risks.
“My resolution is to mature in our execution of responsible AI,”
“AI is the new gold and in order to truly maximize it’s potential, we must first have the proper guardrails in place. Taking a human-first approach to AI will help ensure our state can maintain ethics while taking advantage of the new AI innovations.”
5. Deliver value from generative AI
As organizations move from experimenting and testing generative AI use cases, they’re looking for gen AI to deliver real business value.
“As we go into 2025, we’ll continue to see the evolution of gen AI. But it’s no longer about just standing it up. It’s more about optimizing and maximizing the value we’re getting out of gen AI,”
6. Empower global talent
Although harnessing AI is a top objective for Morgan Stanley’s Wetmur, she says she’s equally committed to harnessing the power of people.
7. Create a wholistic learning culture
Wetmur has another talent-related objective: to create a learning culture — not just in her own department but across all divisions.
8. Deliver better digital experiences
Deltek’s Cilsick has her sights set on improving her company’s digital employee experience, believing that a better DEX will yield benefits in multiple ways.
Cilsick says she first wants to bring in new technologies and automation to “make things as easy as possible,” mirroring the digital experiences most workers have when using consumer technologies.
“It’s really about leveraging tech to make sure [employees] are more efficient and productive,”
“In 2025 my primary focus as CIO will be on transforming operational efficiency, maximizing business productivity, and enhancing employee experiences,”
9. Position the company for long-term success
Lieberman wants to look beyond 2025, saying another resolution for the year is “to develop a longer-term view of our technology roadmap so that we can strategically decide where to invest our resources.”
“My resolutions for 2025 reflect the evolving needs of our organization, the opportunities presented by AI and emerging technologies, and the necessity to balance innovation with operational efficiency,”
Lieberman aims to develop AI capabilities to automate routine tasks.
“Bots will handle common inquiries ranging from sales account summaries to HR benefits, reducing response times and freeing up resources for strategic initiatives,”
Not just hype — here are real-world use cases for AI agents
https://venturebeat.com/ai/not-just-hype-here-are-real-world-use-cases-for-ai-agents/
Just seven or eight months ago, when a customer called in to or emailed Baca Systems with a service question, a human agent handling the query would begin searching for similar cases in the system and analyzing technical documents.
This process would take roughly five to seven minutes; then the agent could offer the “first meaningful response” and finally begin troubleshooting.
But now, with AI agents powered by Salesforce, that time has been shortened to as few as five to 10 seconds.
Now, instead of having to sift through databases for previous customer calls and similar cases, human reps can ask the AI agent to find the relevant information. The AI runs in the background and allows humans to respond right away, Russo noted.
AI can serve as a sales development representative (SDR) to send out general inquires and emails, have a back-and-forth dialogue, then pass the prospect to a member of the sales team, Russo explained.
But once the company implements Salesforce’s Agentforce, a customer needing to modify an order will be able to communicate their needs with AI in natural language, and the AI agent will automatically make adjustments. When more complex issues come up — such as a reconfiguration of an order or an all-out venue change — the AI agent will quickly push the matter up to a human rep.
Open Source in 2025: Strap In, Disruption Straight Ahead
Look for new tensions to arise in the New Year over licensing, the open source AI definition, security and compliance, and how to pay volunteer maintainers.
https://thenewstack.io/open-source-in-2025-strap-in-disruption-straight-ahead/
The trend of widely used open source software moving to more restrictive licensing isn’t new.
In addition to the demands of late-stage capitalism and impatient investors in companies built on open source tools, other outside factors are pressuring the open source world. There’s the promise/threat of generative AI, for instance. Or the shifting geopolitical landscape, which brings new security concerns and governance regulations.
What’s ahead for open source in 2025?
More Consolidation, More Licensing Changes
The Open Source AI Debate: Just Getting Started
Security and Compliance Concerns Will Rise
Paying Maintainers: More Cash, Creativity Needed
Kyberturvallisuuden ja tekoälyn tärkeimmät trendit 2025
https://www.uusiteknologia.fi/2024/11/20/kyberturvallisuuden-ja-tekoalyn-tarkeimmat-trendit-2025/
1. Cyber infrastructure will be centered on a single, unified security platform
2. Big data will give an edge against new entrants
3. AI’s integrated role in 2025 means building trust, governance engagement, and a new kind of leadership
4. Businesses will adopt secure enterprise browsers more widely
5. AI’s energy implications will be more widely recognized in 2025
6. Quantum realities will become clearer in 2025
7. Security and marketing leaders will work more closely together
Presentation: For 2025, ‘AI eats the world’.
https://www.ben-evans.com/presentations
Just like other technologies that have gone before, such as cloud and cybersecurity automation, right now AI lacks maturity.
https://www.securityweek.com/ai-implementing-the-right-technology-for-the-right-use-case/
If 2023 and 2024 were the years of exploration, hype and excitement around AI, 2025 (and 2026) will be the year(s) that organizations start to focus on specific use cases for the most productive implementations of AI and, more importantly, to understand how to implement guardrails and governance so that it is viewed as less of a risk by security teams and more of a benefit to the organization.
Businesses are developing applications that add Large Language Model (LLM) capabilities to provide superior functionality and advanced personalization
Employees are using third party GenAI tools for research and productivity purposes
Developers are leveraging AI-powered code assistants to code faster and meet challenging production deadlines
Companies are building their own LLMs for internal use cases and commercial purposes.
AI is still maturing
However, just like other technologies that have gone before, such as cloud and cybersecurity automation, right now AI lacks maturity. Right now, we very much see AI in this “peak of inflated expectations” phase and predict that it will dip into the “trough of disillusionment”, where organizations realize that it is not the silver bullet they thought it would be. In fact, there are already signs of cynicism as decision-makers are bombarded with marketing messages from vendors and struggle to discern what is a genuine use case and what is not relevant for their organization.
There is also regulation that will come into force, such as the EU AI Act, which is a comprehensive legal framework that sets out rules for the development and use of AI.
AI certainly won’t solve every problem, and it should be used like automation, as part of a collaborative mix of people, process and technology. You simply can’t replace human intuition with AI, and many new AI regulations stipulate that human oversight is maintained.
7 Splunk Predictions for 2025
https://www.splunk.com/en_us/form/future-predictions.html
AI: Projects must prove their worth to anxious boards or risk defunding, and LLMs will go small to reduce operating costs and environmental impact.
OpenAI, Google and Anthropic Are Struggling to Build More Advanced AI
Three of the leading artificial intelligence companies are seeing diminishing returns from their costly efforts to develop newer models.
https://www.bloomberg.com/news/articles/2024-11-13/openai-google-and-anthropic-are-struggling-to-build-more-advanced-ai
Sources: OpenAI, Google, and Anthropic are all seeing diminishing returns from costly efforts to build new AI models; a new Gemini model misses internal targets
It Costs So Much to Run ChatGPT That OpenAI Is Losing Money on $200 ChatGPT Pro Subscriptions
https://futurism.com/the-byte/openai-chatgpt-pro-subscription-losing-money?fbclid=IwY2xjawH8epVleHRuA2FlbQIxMQABHeggEpKe8ZQfjtPRC0f2pOI7A3z9LFtFon8lVG2VAbj178dkxSQbX_2CJQ_aem_N_ll3ETcuQ4OTRrShHqNGg
In a post on X-formerly-Twitter, CEO Sam Altman admitted an “insane” fact: that the company is “currently losing money” on ChatGPT Pro subscriptions, which run $200 per month and give users access to its suite of products including its o1 “reasoning” model.
“People use it much more than we expected,” the cofounder wrote, later adding in response to another user that he “personally chose the price and thought we would make some money.”
Though Altman didn’t explicitly say why OpenAI is losing money on these premium subscriptions, the issue almost certainly comes down to the enormous expense of running AI infrastructure: the massive and increasing amounts of electricity needed to power the facilities that power AI, not to mention the cost of building and maintaining those data centers. Nowadays, a single query on the company’s most advanced models can cost a staggering $1,000.
Tekoäly edellyttää yhä nopeampia verkkoja
https://etn.fi/index.php/opinion/16974-tekoaely-edellyttaeae-yhae-nopeampia-verkkoja
A resilient digital infrastructure is critical to effectively harnessing telecommunications networks for AI innovations and cloud-based services. The increasing demand for data-rich applications related to AI requires a telecommunications network that can handle large amounts of data with low latency, writes Carl Hansson, Partner Solutions Manager at Orange Business.
AI’s Slowdown Is Everyone Else’s Opportunity
Businesses will benefit from some much-needed breathing space to figure out how to deliver that all-important return on investment.
https://www.bloomberg.com/opinion/articles/2024-11-20/ai-slowdown-is-everyone-else-s-opportunity
Näin sirumarkkinoilla käy ensi vuonna
https://etn.fi/index.php/13-news/16984-naein-sirumarkkinoilla-kaey-ensi-vuonna
The growing demand for high-performance computing (HPC) for artificial intelligence and HPC computing continues to be strong, with the market set to grow by more than 15 percent in 2025, IDC estimates in its recent Worldwide Semiconductor Technology Supply Chain Intelligence report.
IDC predicts eight significant trends for the chip market by 2025.
1. AI growth accelerates
2. Asia-Pacific IC Design Heats Up
3. TSMC’s leadership position is strengthening
4. The expansion of advanced processes is accelerating.
5. Mature process market recovers
6. 2nm Technology Breakthrough
7. Restructuring the Packaging and Testing Market
8. Advanced packaging technologies on the rise
2024: The year when MCUs became AI-enabled
https://www-edn-com.translate.goog/2024-the-year-when-mcus-became-ai-enabled/?fbclid=IwZXh0bgNhZW0CMTEAAR1_fEakArfPtgGZfjd-NiPd_MLBiuHyp9qfiszczOENPGPg38wzl9KOLrQ_aem_rLmf2vF2kjDIFGWzRVZWKw&_x_tr_sl=en&_x_tr_tl=fi&_x_tr_hl=fi&_x_tr_pto=wapp
The AI party in the MCU space started in 2024, and in 2025, it is very likely that there will be more advancements in MCUs using lightweight AI models.
Adoption of AI acceleration features is a big step in the development of microcontrollers. The inclusion of AI features in microcontrollers started in 2024, and it is very likely that in 2025, their features and tools will develop further.
Just like other technologies that have gone before, such as cloud and cybersecurity automation, right now AI lacks maturity.
https://www.securityweek.com/ai-implementing-the-right-technology-for-the-right-use-case/
If 2023 and 2024 were the years of exploration, hype and excitement around AI, 2025 (and 2026) will be the year(s) that organizations start to focus on specific use cases for the most productive implementations of AI and, more importantly, to understand how to implement guardrails and governance so that it is viewed as less of a risk by security teams and more of a benefit to the organization.
Businesses are developing applications that add Large Language Model (LLM) capabilities to provide superior functionality and advanced personalization
Employees are using third party GenAI tools for research and productivity purposes
Developers are leveraging AI-powered code assistants to code faster and meet challenging production deadlines
Companies are building their own LLMs for internal use cases and commercial purposes.
AI is still maturing
AI Regulation Gets Serious in 2025 – Is Your Organization Ready?
While the challenges are significant, organizations have an opportunity to build scalable AI governance frameworks that ensure compliance while enabling responsible AI innovation.
https://www.securityweek.com/ai-regulation-gets-serious-in-2025-is-your-organization-ready/
Similar to the GDPR, the EU AI Act will take a phased approach to implementation. The first milestone arrives on February 2, 2025, when organizations operating in the EU must ensure that employees involved in AI use, deployment, or oversight possess adequate AI literacy. Thereafter from August 1 any new AI models based on GPAI standards must be fully compliant with the act. Also similar to GDPR is the threat of huge fines for non-compliance – EUR 35 million or 7 percent of worldwide annual turnover, whichever is higher.
While this requirement may appear manageable on the surface, many organizations are still in the early stages of defining and formalizing their AI usage policies.
Later phases of the EU AI Act, expected in late 2025 and into 2026, will introduce stricter requirements around prohibited and high-risk AI applications. For organizations, this will surface a significant governance challenge: maintaining visibility and control over AI assets.
Tracking the usage of standalone generative AI tools, such as ChatGPT or Claude, is relatively straightforward. However, the challenge intensifies when dealing with SaaS platforms that integrate AI functionalities on the backend. Analysts, including Gartner, refer to this as “embedded AI,” and its proliferation makes maintaining accurate AI asset inventories increasingly complex.
Where frameworks like the EU AI Act grow more complex is their focus on ‘high-risk’ use cases. Compliance will require organizations to move beyond merely identifying AI tools in use; they must also assess how these tools are used, what data is being shared, and what tasks the AI is performing. For instance, an employee using a generative AI tool to summarize sensitive internal documents introduces very different risks than someone using the same tool to draft marketing content.
For security and compliance leaders, the EU AI Act represents just one piece of a broader AI governance puzzle that will dominate 2025.
The next 12-18 months will require sustained focus and collaboration across security, compliance, and technology teams to stay ahead of these developments.
The Global Partnership on Artificial Intelligence (GPAI) is a multi-stakeholder initiative which aims to bridge the gap between theory and practice on AI by supporting cutting-edge research and applied activities on AI-related priorities.
https://gpai.ai/about/#:~:text=The%20Global%20Partnership%20on%20Artificial,activities%20on%20AI%2Drelated%20priorities.
307 Comments
Tomi Engdahl says:
Tekoäly vetää it-investoinnit uuteen huippuun
https://etn.fi/index.php/13-news/17052-tekoaely-vetaeae-it-investoinnit-uuteen-huippuun
Maailmanlaajuiset it-investoinnit nousevat tänä vuonna uuteen ennätykseen, kertoo tutkimusyhtiö Gartner. it-järjestelmiin panostetaan tänä vuonna 9,8 prosenttia enemmän kuin vuonna 2024. Kokonaisuudessaan investointien määrä on arviolta 5,61 biljoonaa dollaria vuonna 2025.
Datakeskusinvestoinnit ovat it-kulutuksen nopeimmin kasvava sektori, ja niiden arvioidaan kasvavan yli 23 prosenttia saavuttaen 405 miljardin dollarin tason. Gartnerin mukaan generatiivinen tekoäly (GenAI) ja siihen liittyvät laitteistopäivitykset vauhdittavat tätä kehitystä. Investoinnit tekoälyoptimoituihin palvelimiin kaksinkertaistuvat vuoteen 2025 mennessä perinteisiin palvelimiin verrattuna, saavuttaen huikeat 202 miljardia dollaria.
Tomi Engdahl says:
Kyle Wiggers / TechCrunch:
OpenAI, SoftBank, and Oracle announce The Stargate Project, a JV to invest in US AI infrastructure, committing $100B now and up to $500B in the next four years — OpenAI says that it will team up with both the Japanese conglomerate SoftBank and with Oracle, along with others, to build multiple data centers for AI in the U.S.
OpenAI teams up with SoftBank and Oracle on $500B data center project
https://techcrunch.com/2025/01/21/openai-teams-up-with-softbank-and-oracle-on-50b-data-center-project/
Tomi Engdahl says:
Rest of World:
Eager to attract investments, politicians in Brazil, India, and other non-Western nations with nascent AI regulations have warmly welcomed major AI companies
The global struggle over how to regulate AI
https://restofworld.org/2025/global-ai-regulation-big-tech/
Big AI companies have come out hard against comprehensive regulatory efforts in the West — but are receiving a warm welcome from leaders in many other countries.
Tomi Engdahl says:
Financial Times:
Sources: Google is investing over $1B more in Anthropic, which is also nearing a further $2B funding round from Lightspeed and other VCs at a ~$60B valuation — Artificial intelligence group closes in on $60bn valuation — Google is making a fresh investment of more than $1bn into OpenAI rival Anthropic …
Google invests further $1bn in OpenAI rival Anthropic
Artificial intelligence group closes in on $60bn valuation
https://www.ft.com/content/ed631513-dd37-44a3-a536-b2002f5727cc
Tomi Engdahl says:
Kyle Wiggers / TechCrunch:
Microsoft says it has a new deal with OpenAI that gives Microsoft “right of first refusal”, meaning it’s no longer OpenAI’s exclusive cloud provider — Microsoft was once the exclusive provider of data center infrastructure for OpenAI to train and run its AI models. No longer.
Microsoft is no longer OpenAI’s exclusive cloud provider
https://techcrunch.com/2025/01/21/microsoft-is-no-longer-openais-exclusive-cloud-provider/
Tomi Engdahl says:
Dean Takahashi / VentureBeat:
A survey of 3,000+ game developers: 11% were laid off in the past year, 58% back unionization, 30% say GenAI has a negative impact on the industry, and more — The 2025 GDC survey revealed the impact of industry-wide layoffs in 2024 where one of every 11 game developers lost a job.
https://venturebeat.com/games/gdc-survey-reveals-a-rocky-year-of-layoffs-and-ai-skepticism-for-game-developers/
Tomi Engdahl says:
Radhika Rajkumar / ZDNET:
DeepSeek’s new MIT-licensed AI models are accesible via DeepSeek’s API at a fraction of the cost of comparable OpenAI models, but there are censorship concerns — Open-source artificial intelligence (AI) has reached another milestone — and the cost differences it represents could shake up the industry.
https://www.zdnet.com/article/deepseeks-new-open-source-ai-model-can-outperform-o1-for-a-fraction-of-the-cost/
Tomi Engdahl says:
Gerrit De Vynck / Washington Post:
Documents: Google gave Israel access to its latest AI tools from the early weeks of the Israel-Hamas war, directly assisting the Defense Ministry and the IDF — The company fulfilled requests from Israel’s military for more access to AI tools, as it sought to compete with Amazon, documents obtained by The Post show.
https://www.washingtonpost.com/technology/2025/01/21/google-ai-israel-war-hamas-attack-gaza/?pwapi_token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJyZWFzb24iOiJnaWZ0IiwibmJmIjoxNzM3NDM1NjAwLCJpc3MiOiJzdWJzY3JpcHRpb25zIiwiZXhwIjoxNzM4ODE3OTk5LCJpYXQiOjE3Mzc0MzU2MDAsImp0aSI6IjBiNmJmNDBjLTU0MWItNGU1ZC1hNjIxLWMzZGZkOTE5MzBhOCIsInVybCI6Imh0dHBzOi8vd3d3Lndhc2hpbmd0b25wb3N0LmNvbS90ZWNobm9sb2d5LzIwMjUvMDEvMjEvZ29vZ2xlLWFpLWlzcmFlbC13YXItaGFtYXMtYXR0YWNrLWdhemEvIn0.Slz6roVET1fo7fnrGyDkWzTQZwQhdVRJgVrAGTYXNXM
Tomi Engdahl says:
Financial Times:
Sources: ByteDance plans to spend ~$5.5B to acquire AI chips in China in 2025, double the amount spent in 2024, with ~60% going to Chinese suppliers like Huawei — Chinese company seeks growth from new technology as social media business comes under pressure in US
https://www.ft.com/content/0815c8fb-e6ed-478b-abb1-c67d6f48fd3a
Tomi Engdahl says:
Bloomberg:
StackBlitz, which offers an AI tool to create websites using text prompts, says it’s finalizing raising $83.5M at a $700M valuation, following a $22M Series A
AI Text-to-Code Startup StackBlitz Is in Talks for a $700 Million Valuation
https://www.bloomberg.com/news/articles/2025-01-21/ai-speech-to-code-startup-stackblitz-is-in-talks-for-a-700-million-valuation
StackBlitz is set to raise over $80 million from investors
Struggling startup turned around fortunes with AI product
Tomi Engdahl says:
Michael Nuñez / VentureBeat:
Tencent unveils Hunyuan3D 2.0, an AI system that turns single images or text prompts into 3D models in seconds, and open sources it on HuggingFace and GitHub
https://venturebeat.com/ai/tencent-introduces-hunyuan3d-2-0-ai-that-speeds-up-3d-design-from-days-to-seconds/
Tomi Engdahl says:
James O’Donnell / MIT Technology Review:
Filing: OpenAI spent $1.76M on lobbying in 2024, up from 2023′s $260K, with focus on the AI Advancement and Reliability Act and the Future of AI Innovation Act — OpenAI spent $1.76 million on government lobbying in 2024 and $510,000 in the last three months of the year alone …
https://www.technologyreview.com/2025/01/21/1110260/openai-ups-its-lobbying-efforts-nearly-seven-fold/
Tomi Engdahl says:
ChatGPT:stä paljastui erikoinen ongelma – kehittäjä vaikenee
Käyttäjän yksi pyyntö voidaan moninkertaistaa jopa useaksi tuhanneksi pyynnöksi uhriksi joutuneella sivustolla.
https://www.iltalehti.fi/digiuutiset/a/dfba60a9-b224-4e99-86b9-8827837305e7
OpenAI:n ChatGPT:n hakurobotti pystyy nähtävästi aiheuttamaan palvelunestohyökkäyksiä satunnaisille verkkosivuille, The Register uutisoi. Yhtiö ei ole vielä kommentoinut asiaa millään tavalla.
OpenAI’s ChatGPT crawler can be tricked into DDoSing sites, answering your queries
The S in LLM stands for Security
https://www.theregister.com/2025/01/19/openais_chatgpt_crawler_vulnerability/
OpenAI’s ChatGPT crawler appears to be willing to initiate distributed denial of service (DDoS) attacks on arbitrary websites, a reported vulnerability the tech giant has yet to acknowledge.
In a write-up shared this month via Microsoft’s GitHub, Benjamin Flesch, a security researcher in Germany, explains how a single HTTP request to the ChatGPT API can be used to flood a targeted website with network requests from the ChatGPT crawler, specifically ChatGPT-User.
This flood of connections may or may not be enough to knock over any given site, practically speaking, though it’s still arguably a danger and a bit of an oversight by OpenAI. It can be used to amplify a single API request into 20 to 5,000 or more requests to a chosen victim’s website, every second, over and over again.
“ChatGPT API exhibits a severe quality defect when handling HTTP POST requests to https://chatgpt.com/backend-api/attributions,” Flesch explains in his advisory, referring to an API endpoint called by OpenAI’s ChatGPT to return information about web sources cited in the chatbot’s output. When ChatGPT mentions specific websites, it will call attributions with a list of URLs to those sites for its crawler to go access and fetch information about.
If you throw a big long list of URLs at the API, each slightly different but all pointing to the same site, the crawler will go off and hit every one of them at once.
“The API expects a list of hyperlinks in parameter urls. It is commonly known that hyperlinks to the same website can be written in many different ways,” Flesch wrote.
“The API expects a list of hyperlinks in parameter urls. It is commonly known that hyperlinks to the same website can be written in many different ways,” Flesch wrote.
“Due to bad programming practices, OpenAI does not check if a hyperlink to the same resource appears multiple times in the list. OpenAI also does not enforce a limit on the maximum number of hyperlinks stored in the urls parameter, thereby enabling the transmission of many thousands of hyperlinks within a single HTTP request.”
https://github.com/bf/security-advisories/blob/main/2025-01-ChatGPT-Crawler-Reflective-DDOS-Vulnerability.md
https://platform.openai.com/docs/bots/
web io games says:
How about you check out our web io game site?
Tomi Engdahl says:
How to Eliminate “Shadow AI” in Software Development
With a security-first culture fully in play, developers will view the protected deployment of AI as a marketable skill, and respond accordingly.
https://www.securityweek.com/how-to-eliminate-shadow-ai-in-software-development/
Tomi Engdahl says:
Wall Street Journal:
Sources: Microsoft and OpenAI were arguing over Microsoft’s ability to fulfill OpenAI’s computing needs, in the months leading up to the Stargate announcement — The partnership that launched the AI boom has been strained by disagreements over computing resources
https://www.wsj.com/tech/ai/stargate-open-ai-soft-bank-microsoft-shift-e696cf5b?st=othrSE&reflink=desktopwebshare_permalink
Dina Bass / Bloomberg:
Microsoft was mentioned only as a technology partner in Stargate but the company is poised to benefit from the venture by offloading AI computing costs — – OpenAI-led venture lets Microsoft offload AI computing costs — Software giant could still decide to invest more in future
https://www.bloomberg.com/news/articles/2025-01-22/microsoft-poised-to-benefit-from-stargate-jv-with-zero-money
The Information:
Sources: Sam Altman told some colleagues that OpenAI and SoftBank will each commit $19B to Stargate, and OpenAI would effectively hold a 40% interest in the JV
‘Stargate’ Squares Some AI Circles
OpenAI, Microsoft, SoftBank, Oracle, Trump, MGX, ARM, and NVIDIA all get double-dip wins in the announcement
https://spyglass.org/project-stargate-agi-openai/
Tomi Engdahl says:
Dominic Preston / The Verge:
Google’s Gemini will be the default on-device assistant for the new Galaxy S25 series, and gets the ability to work across multiple apps in a single prompt — Google’s AI assistant Gemini is now able to carry out tasks across multiple apps in a single interaction, in an update announced today alongside …
Google Gemini now works across multiple apps in a single prompt
/ The AI assistant can search for information and send it to a friend or share it to your calendar all at once.
https://www.theverge.com/2025/1/22/24349319/google-gemini-multiple-app-extensions-ai-samsung-bixby-circle-to-search
Mishaal Rahman / Android Police:
Google says Galaxy S25 supports the Gemini Nano AI model and Samsung’s TalkBack accessibility app is the first non-Google app to use the model
https://www.androidpolice.com/samsung-galaxy-s25-multimodal-gemini-nano/
Tomi Engdahl says:
Bloomberg:
Samsung banks on a hybrid AI strategy, using Google features backed by Gemini and internally developed AI features for One UI 7, to spur smartphone demand
https://www.bloomberg.com/news/articles/2025-01-22/samsung-banks-on-ai-expansion-to-spur-demand-for-new-s25-phones
Tomi Engdahl says:
Reuters:
ByteDance launches Doubao-1.5-pro, an upgraded AI model, claiming it outperforms OpenAI’s o1 in AIME benchmarks, joining DeepSeek in China’s AI reasoning push — TikTok owner ByteDance on Wednesday released an update to its flagship AI model aimed at challenging Microsoft-backed OpenAI’s …
TikTok owner ByteDance, DeepSeek lead Chinese push in AI reasoning
https://www.reuters.com/technology/artificial-intelligence/tiktok-owner-bytedance-deepseek-lead-chinese-push-ai-reasoning-2025-01-22/
Tomi Engdahl says:
Stephanie Palazzolo / The Information:
Source: OpenAI plans to release a new ChatGPT feature called Operator this week to automate complex tasks typically done via a web browser — OpenAI is preparing to release a new ChatGPT feature this week that will automate complex tasks typically done through the Web browser …
https://www.theinformation.com/briefings/openai-preps-operator-release-for-this-week
Tomi Engdahl says:
The Information:
Sources: OpenAI is working on an advanced AI coding assistant that can replicate a Level 6 engineer and relies in part on the company’s o1 reasoning model
OpenAI Targets AGI with System That Thinks Like a Pro Engineer
https://www.theinformation.com/articles/openai-targets-agi-with-system-that-thinks-like-a-pro-engineer
Tomi Engdahl says:
Reuters:
Filing: OpenAI told an Indian court that it can’t delete old training data from local news agency ANI due to legal obligations in the US
Exclusive: OpenAI tells India court ChatGPT data removal will breach US legal obligations
https://www.reuters.com/technology/artificial-intelligence/openai-tells-india-court-chatgpt-data-removal-will-breach-us-legal-obligations-2025-01-22/
Tomi Engdahl says:
https://etn.fi/index.php/13-news/17059-samsung-on-tekoaelypuhelimissa-pisimmaellae-mutta-sekin-on-vasta-alussa
Tomi Engdahl says:
https://etn.fi/index.php/opinion/17058-tekoaely-mullistaa-koodaamisen-myoes-haittakoodin
Tomi Engdahl says:
https://www.uusiteknologia.fi/2025/01/23/tekoalyavustaja-parantaa-sulautettujen-kayttoliittymakehitysta/
Sulautettujen näyttöratkaisujen suunnittelijat voivat jatkossa tuoda haluamansa suuret kielimallin osaksi QT-työkaluilla tehtävää ohjelmistokehitysprosessiaan. UUden kokeellisen Qt AI Assistant-työkalun avulla voidaan vähentää esimerkiksi järjestelmien rutiinitehtäviin kuluvaa aikaa. Uusi työkalu on ohjelmoitu jo oletuksena tukemaan monia markkinoiden tunnetuimpia suuria kielimalleja.
Tomi Engdahl says:
https://etn.fi/index.php/13-news/17060-donitsi-moottorillaan-kohahduttanut-suomalaisyritys-uskoo-koodittomaan-kehitykseen
Donut Lab kertoo kehittävänsä maailman ensimmäistä ajoneuvojen no-code -ympäristöä, mikä tarkoittaa että kaikki autojen ja droonien ohjelmistologiikka voidaan tehdä tulevaisuudessa graafisella käyttöliittymällä perinteisen ohjelmakoodin sijaan. Tämä pienentää virheiden määrää, lisää ajoneuvojen turvallisuutta ja nopeuttaa kehitystyötä huomattavasti.
Lehtimäen mukaan tähän kehitetään täysin omaa alustaa. – Aikataulu on arviolta sellainen, että vuoden lopussa menemme vähintään rajattuun julkaisuun ja ensi vuoden puolella se tulee laajasti tarjolle.
Tekoäly tulee ajoneuvojen suunnittelua, kehitystä, testausta ja validointia tavoilla, joita voi olla vaikeaa edes vielä kuvitella, Lehtimäki arvioi.
- Tekoälyn rooli on moninainen. Ensinnäkin tulevalla no-code-työkalulla voidaan kehittää ohjelmistoja, jotka käyttävät tekoälymalleja monin eri tavoin. Myös työkaluissa itsessään on paljon tekoälyavusteisia toimintoja, kuten co-pilot-tyyppinen apu kehitystyöhön sekä esimerkiksi automaattinen mahdollisten turvallisuustoimintojen puutteellisuuteen suunniteltu toiminto.
Fyysisiltä laitteilta kuten moottorilta ja akuilta ei sinänsä vaadita mitään ihmeellistä, jotta ne voidaan integroida osaksi järjestelmää no code -tyyliin. – Ohjainlaitteilta tarvitaan tuki alustalle, Lehtimäki selventää.
Tomi Engdahl says:
Demo and explanation of the AI based zero-latency de-feedback and anti-reverberation plug-in for live sound that we use. Try it for free at https://www.defeedback.ai
Tomi Engdahl says:
Tämän takia 80 prosenttia yrityksistä epäonnistuu tekoälyn käyttöönotossa
https://etn.fi/index.php/13-news/17063-taemaen-takia-80-prosenttia-yrityksistae-epaeonnistuu-tekoaelyn-kaeyttoeoenotossa
Tekoäly voi tutkitusti parantaa päätöksentekoa, jouduttaa innovaatioita ja auttaa johtajia lisäämään työntekijöiden tuottavuutta. Tästä ja useimpien yritysten parhaista pyrkimyksistä huolimatta tutkimukset kuitenkin osoittavat, että vain noin viidennes yrityksistä onnistuu tekoälyteknologioiden hyödyntämisessä toivomallaan tavalla, sanoo apulaisprofessori Natalia Vuori Aalto-yliopistosta.
Vuoren mukaan ymmärryksen puute johtuu ainakin osin siitä, että epäonnistumisia tutkitaan tyypillisesti vain joko teknologian itsensä tai käyttäjien sen toiminnasta tekemien arvioiden näkökulmasta. Näiden sijaan pitäisi tarkemmin katsoa ihmisiä.
- Havaitsimme, että kyse ei niinkään ole tekoälystä tai sen kyvyistä, vaan ihmisten tunteista ja reaktioista tekoälyä kohtaan – sekä siitä, miten johtajat osaavat näitä käsitellä, Vuori sanoo.
Kävi ilmi, että vaikka osa työntekijöistä piti työkalua erittäin arvokkaana ja hyödyllisenä, he eivät tunteneet oloaan mukavaksi, kun tekoäly seurasi heidän kalenterimerkintöjään, sisäistä viestintäänsä ja muita päivittäisiä toimiaan. Siksi he joko lakkasivat kokonaan antamasta tietojaan tai alkoivat manipuloida järjestelmää syöttämällä sille tietoja, joiden he uskoivat hyödyttävän omaa urakehitystään. Tämän seurauksena tekoälyn tuottama tieto muuttui yhä epätarkemmaksi. Syntyi noidankehä, kun käyttäjät entisestään menettivät luottamustaan ohjelman kykyihin.
- Johtajat eivät voineet ymmärtää, miksi tekoälyn käyttö väheni. He tekivät paljon töitä edistääkseen työkalujen käyttöä ja yrittivät selittää, miten tietoja käytettiin. Käyttö väheni silti, Vuori kertoo.
Hänen mukaansa kyseinen tapaustutkimus kuvastaa hyvin toistuvaa kaavaa tekoälyn ja yleisemminkin uuden teknologian käyttöönotossa.
- Tekoälyn käyttöönotto ei ole pelkästään teknologinen haaste – se on myös johtajuuden haaste. Onnistuminen riippuu luottamuksen ymmärtämisestä ja tunteiden huomioimisesta sekä siitä, että työntekijät tuntevat innostusta tekoälyn käytöstä ja sen kokeilemisesta, sanoo Vuori.
Tomi Engdahl says:
Maailma pysähtyi hetkeksi, kun ChatGPT oli muutaman minuutin nurin. Joskus oli samanlaista, kun Google kaatui. Nykyään on paljon vaihtoehtoisia hakukoneita. Saa nähdä miten tekoälyjen kulttuuri kehittyy. En panisi pahakseni, jos juuri tässä kohtaa hajautettu Web3 toteutuisi, ja ihmiset voisivat tarjota tekoälymalleja toistensa käyttöön sekä hyödyntää toistensa tarjoamia tekoälyjä virtuaalivaluutalla käytöstä maksaen.
Tomi Engdahl says:
Cutting-edge Chinese “reasoning” model rivals OpenAI o1—and it’s free to download
DeepSeek R1 is free to run locally and modify, and it matches OpenAI’s o1 in several benchmarks.
https://arstechnica.com/ai/2025/01/china-is-catching-up-with-americas-best-reasoning-ai-models/
Tomi Engdahl says:
OpenAI’s Latest AI Can Cost More Than $1,000 Per Query
https://futurism.com/the-byte/openai-o3-cost-per-query
Brainpower, at an extreme premium.
Money to Burn
OpenAI’s recently unveiled o3 model is purportedly its most powerful AI yet, but with one big drawback: it costs ungodly sums of money to run, TechCrunch reports.
Announced just over a week ago, o3 “reasons” through problems using a technique known as test-time compute — as in, it takes more time to “think” and explore multiple possibilities before spitting out an answer. As such, OpenAI engineers hope that the AI model will produce better responses to complex prompts instead of jumping to a faulty conclusion.
It appears to have worked, at least to some degree. In its most powerful “high-compute mode,” o3 scored 87.5 percent on the ARC-AGI benchmark designed to test language models, according to the test’s creator François Chollet. That’s nearly three times as high as the previous o1 model’s best score, at just 32 percent.
https://techcrunch.com/2024/12/23/openais-o3-suggests-ai-models-are-scaling-in-new-ways-but-so-are-the-costs/
Tomi Engdahl says:
What large language models know and what people think they know
https://www.nature.com/articles/s42256-024-00976-7
As artificial intelligence systems, particularly large language models (LLMs), become increasingly integrated into decision-making processes, the ability to trust their outputs is crucial. To earn human trust, LLMs must be well calibrated such that they can accurately assess and communicate the likelihood of their predictions being correct. Whereas recent work has focused on LLMs’ internal confidence, less is understood about how effectively they convey uncertainty to users. Here we explore the calibration gap, which refers to the difference between human confidence in LLM-generated answers and the models’ actual confidence, and the discrimination gap, which reflects how well humans and models can distinguish between correct and incorrect answers.
Our experiments with multiple-choice and short-answer questions reveal that users tend to overestimate the accuracy of LLM responses when provided with default explanations. Moreover, longer explanations increased user confidence, even when the extra length did not improve answer accuracy. By adjusting LLM explanations to better reflect the models’ internal confidence, both the calibration gap and the discrimination gap narrowed, significantly improving user perception of LLM accuracy. These findings underscore the importance of accurate uncertainty communication and highlight the effect of explanation length in influencing user trust in artificial-intelligence-assisted decision-making environments.
Uncertainty communication plays a critical role in decision-making and policy development. Uncertainties are often expressed verbally to help stakeholders understand risks and make informed choices across a wide range of domains, including climate policy, law, medicine and intelligence forecasting.
communicate uncertainty in natural language contexts. The emergence of large language models (LLMs) introduces new complexities in the area of uncertainty communication. These models are increasingly integrated into areas such as public health6, coding7 and education8. However, the question of how effectively LLMs communicate uncertainty is unexplored. As the primary mode of communication with LLMs is through natural language, it is critical to understand whether LLMs are able to accurately convey through verbal means what they know or do not know.
Recent research raises doubts about the reliability of the information that LLMs generate. One notable issue is the possibility of generating responses that, while convincing, may be inaccurate or nonsensical9,10. The unreliability of LLMs has led developers of LLMs to caution against the uncritical acceptance of model outputs11, suggesting that it is not always clear when the models are or are not confident in the knowledge communicated to the user.
At the same time, recent research has also indicated that LLMs have the ability, to a certain degree, to accurately discern their own knowledge boundaries.
Tomi Engdahl says:
Trump announces a $500 billion AI infrastructure investment in the US
https://www.cnn.com/2025/01/21/tech/openai-oracle-softbank-trump-ai-investment/index.html
Tomi Engdahl says:
Schools Using AI Emulation of Anne Frank That Urges Kids Not to Blame Anyone for Holocaust
byJoe Wilkins
Jan 18, 9:00 AM EST
Alexander Koerner/Getty Images
“It’s a kind of grave-robbing and incredibly disrespectful to the real Anne Frank and her family.”
https://futurism.com/the-byte/ai-anne-frank-blame-holocaust?fbclid=IwY2xjawH_X5BleHRuA2FlbQIxMQABHQLYJwUcC8ZRqqojYPomWr0j2v2G6ZBFk9ZkKJVNI9-IfPJFSPa5tusfwg_aem_Q7mbAd5ylFIKcQcN2ospLw
Tomi Engdahl says:
Office is now “Microsoft 365 Copilot app”. For real. I haven’t used Copilot and I will not use it. I just want Word and Excel, that’s it.
Microsoft 365 app rebranding to Microsoft 365 Copilot, causing more confusion on Windows.
https://www.windowslatest.com/2024/12/18/microsoft-365-app-rebranding-to-microsoft-365-copilot-causing-more-confusion-on-windows/?fbclid=IwY2xjawH_aTtleHRuA2FlbQIxMQABHdVf1xupNpuVbhTKvb304neq1RakPenKI2wvUY8uUVWUSuGOcc881MStIA_aem_Drh39e7pMl3ODSnqTVRTcQ
Tomi Engdahl says:
Jay Peters / The Verge:
OpenAI releases a “research preview” of its Operator AI agent that can automate web-based tasks, launching in the US to subscribers of the $200/month Pro tier — OpenAI is releasing a “research preview” of an AI agent called Operator that can “go to the web to perform tasks for you,” according to a blog post.
OpenAI’s new Operator AI agent can do things on the web for you
/ The agent will be available first in the US to subscribers of ChatGPT Pro.
https://www.theverge.com/2025/1/23/24350395/openai-chatgpt-operator-agent-control-computer
OpenAI is releasing a “research preview” of an AI agent called Operator that can “go to the web to perform tasks for you,” according to a blog post. “Using its own browser, it can look at a webpage and interact with it by typing, clicking, and scrolling,” OpenAI says. It’s launching first in the US for subscribers of OpenAI’s $200 per month ChatGPT Pro tier.
Operator relies a “Computer-Using Agent” model that combines GPT-4o’s vision capabilities with “advanced reasoning through reinforcement learning” to be able to interact with GUIs, OpenAI says. “Operator can ‘see’ (through screenshots) and ‘interact’ (using all the actions a mouse and keyboard allow) with a browser, enabling it to take action on the web without requiring custom API integrations,” according to OpenAI.
Operator can use reasoning to “self-correct,” and if it gets stuck, it will give the user control. It will also ask the user to take over when a website asks for sensitive information like login credentials and “should” ask for a user to approve actions like sending an email. OpenAI also says that Operator has been designed to “refuse harmful requests and block disallowed content.”
Tomi Engdahl says:
Google’s Gemini is already winning the next-gen assistant wars
/ AI has made virtual assistants a big deal again. So far, it looks like ChatGPT, Siri, Alexa, and the rest are all chasing after Gemini.
https://www.theverge.com/2025/1/22/24349416/google-gemini-virtual-assistant-samsung-siri-alexa
One of the most important changes in Samsung’s new phones is a simple one: when you long-press the side button on your phone, instead of activating Samsung’s own Bixby assistant by default, you’ll get Google Gemini.
This is probably a good thing. Bixby was never a very good virtual assistant — Samsung originally built it primarily as a way to more simply navigate device settings, not to get information from the internet. It has gotten better since and can now do standard assistant things like performing visual searches and setting timers, but it never managed to catch up to the likes of Alexa, Google Assistant, and now, even Siri. So, if you’re a Samsung user, this is good news! Your assistant is probably better now. (And if, for some unknown reason, you really do truly love Bixby, don’t worry: there’s still an app.)
The switch to Gemini is an even bigger deal for Google. Google was caught off guard a couple of years ago when ChatGPT launched but has caught up in a big way. According to recent reporting from The Wall Street Journal, CEO Sundar Pichai now believes Gemini has surpassed ChatGPT, and he wants Google to have 500 million users by the end of this year. It might just get there one Samsung phone at a time.
Tomi Engdahl says:
Every:
Hands-on with Operator: limited in what it can browse, can perform repetitive workflows, and can do lengthy tasks on its own with minimal prompting — Operator (Could you help me do this task?) … Today, OpenAI announced Operator, a new research preview of ChatGPT that acts as an agent for your repetitive tasks.
We Tried OpenAI’s New Agent—Here’s What We Found
Operator (Could you help me do this task?)
https://every.to/chain-of-thought/we-tried-openai-s-new-agent-here-s-what-we-found
Today, OpenAI announced Operator, a new research preview of ChatGPT that acts as an agent for your repetitive tasks. It can autonomously perform actions for you like shopping for airline tickets, making restaurant reservations, buying flowers and more.
Operator has access to its own browser, and you can watch it navigate the web in real time—and it allows you to step in to take control whenever you want. Unlike previous web-browsing experiences inside of ChatGPT, Operator is designed to handle tasks end-to-end rather than requiring your input in between.
OpenAI gave Every early access to Operator this week and we’ve been putting it through its paces. Here’s what we found.
Tomi Engdahl says:
Will Douglas Heaven / MIT Technology Review:
OpenAI says Operator is powered by Computer-Using Agent, or CUA, which combines GPT-4o’s vision capabilities with “reasoning” abilities of more advanced models
OpenAI launches Operator—an agent that can use a computer for you
https://www.technologyreview.com/2025/01/23/1110484/openai-launches-operator-an-agent-that-can-use-a-computer-for-you/
The announcement confirms one of two rumors that circled the internet this week. The other was about superintelligence.
After weeks of buzz, OpenAI has released Operator, its first AI agent. Operator is a web app that can carry out simple online tasks in a browser, such as booking concert tickets or filling an online grocery order. The app is powered by a new model called Computer-Using Agent—CUA (“coo-ah”), for short—built on top of OpenAI’s multimodal large language model GPT-4o.
Operator is available today at operator.chatgpt.com to people in the US signed up with ChatGPT Pro, OpenAI’s premium $200-a-month service. The company says it plans to roll the tool out to other users in the future.
OpenAI claims that Operator outperforms similar rival tools, including Anthropic’s Computer Use (a version of Claude 3.5 Sonnet that can carry out simple tasks on a computer) and Google DeepMind’s Mariner (a web-browsing agent built on top of Gemini 2.0).
The fact that three of the world’s top AI firms have converged on the same vision of what agent-based models could be makes one thing clear. The battle for AI supremacy has a new frontier—and it’s our computer screens.
“Moving from generating text and images to doing things is the right direction,” says Ali Farhadi, CEO of the Allen Institute for AI (AI2). “It unlocks business, solves new problems.”
Farhadi thinks that doing things on a computer screen is a natural first step for agents: “It is constrained enough that the current state of the technology can actually work,” he says. “At the same time, it’s impactful enough that people might use it.” (AI2 is working on its own computer-using agent, says Farhadi.)
Don’t believe the hype
OpenAI’s announcement also confirms one of two rumors that circled the internet this week. One predicted that OpenAI was about to reveal an agent-based app, after details about Operator were leaked on social media ahead of its release. The other predicted that OpenAI was about to reveal a new superintelligence—and that officials for newly inaugurated President Trump would be briefed on it.
Could the two rumors be linked? OpenAI superfans wanted to know.
Nope. OpenAI gave MIT Technology Review a preview of Operator in action yesterday. The tool is an exciting glimpse of large language models’ potential to do a lot more than answer questions. But Operator is an experimental work in progress. “It’s still early, it still makes mistakes,” says Yash Kumar, a researcher at OpenAI.
Like Anthropic’s Computer Use and Google DeepMind’s Mariner, Operator takes screenshots of a computer screen and scans the pixels to figure out what actions it can take. CUA, the model behind it, is trained to interact with the same graphical user interfaces—buttons, text boxes, menus—that people use when they do things online. It scans the screen, takes an action, scans the screen again, takes another action, and so on. That lets the model carry out tasks on most websites that a person can use.
“Traditionally the way models have used software is through specialized APIs,” says Reiichiro Nakano, a scientist at OpenAI. (An API, or application programming interface, is a piece of code that acts as a kind of connector, allowing different bits of software to be hooked up to one another.) That puts a lot of apps and most websites off limits, he says: “But if you create a model that can use the same interface that humans use on a daily basis, it opens up a whole new range of software that was previously inaccessible.”
CUA also breaks tasks down into smaller steps and tries to work through them one by one, backtracking when it gets stuck. OpenAI says CUA was trained with techniques similar to those used for its so-called reasoning models, o1 and o3.
OpenAI has tested CUA against a number of industry benchmarks designed to assess the ability of an agent to carry out tasks on a computer. The company claims that its model beats Computer Use and Mariner in all of them.
For example, on OSWorld, which tests how well an agent performs tasks such as merging PDF files or manipulating an image, CUA scores 38.1% to Computer Use’s 22.0% In comparison, humans score 72.4%. On a benchmark called WebVoyager, which tests how well an agent performs tasks in a browser, CUA scores 87%, Mariner 83.5%, and Computer Use 56%. (Mariner can only carry out tasks in a browser and therefore does not score on OSWorld.)
For now, Operator can also only carry out tasks in a browser. OpenAI plans to make CUA’s wider abilities available in the future via an API that other developers can use to build their own apps. This is how Anthropic released Computer Use in December.
Look! No hands
To use Operator, you simply type instructions into a text box. But instead of calling up the browser on your computer, Operator sends your instructions to a remote browser running on an OpenAI server. OpenAI claims that this makes the system more efficient. It’s another key difference between Operator, Computer Use and Mariner (which runs inside Google’s Chrome browser on your own computer).
Because it’s running in the cloud, Operator can carry out multiple tasks at once, says Kumar. In the live demo, he asked Operator to use OpenTable to book him a table for two at 6.30 p.m. at a restaurant called Octavia in San Francisco. Straight away, Operator opened up OpenTable and started clicking through options. “As you can see, my hands are off the keyboard,” he said.
OpenAI is collaborating with a number of businesses, including OpenTable, StubHub, Instacart, DoorDash, and Uber. The nature of those collaborations is not exactly clear, but Operator appears to suggest preset websites to use for certain tasks.
While the tool navigated dropdowns on OpenTable, Kumar sent Operator off to find four tickets for a Kendrick Lamar show on StubHub. While it did that, he pasted a photo of a handwritten shopping list and asked Operator to add the items to his Instacart.
He waited, flicking between Operator’s tabs. “If it needs help or if it needs confirmations, it’ll come back to you with questions and you can answer it,” he said.
Kumar says he has been using Operator at home. It helps him stay on top of grocery shopping: “I can just quickly click a photo of a list and send it to work,” he says.
It’s also become a sidekick in his personal life. “I have a date night every Thursday,” says Kumar. So every Thursday morning, he instructs Operator to send him a list of five restaurants that have a table for two that evening. “Of course, I could do that, but it takes me 10 minutes,” he says. “And I often forget to do it. With Operator, I can run the task with one click. There’s no burden of booking.”
Tomi Engdahl says:
Jon Keegan / Sherwood News:
DeepSeek’s latest AI model R1 sticks to Chinese government restrictions on sensitive topics like Tiananmen Square, Taiwan, and the treatment of Uyghurs in China — Those who train the AI models get to decide what the truth is. — 5H — The AI world was abuzz this week with the release …
A free, powerful Chinese AI model just dropped — but don’t ask it about Tiananmen Square
Those who train the AI models get to decide what the truth is.
https://sherwood.news/tech/a-free-powerful-chinese-ai-model-just-dropped-but-dont-ask-it-about/
Tomi Engdahl says:
Kyle Wiggers / TechCrunch:
Perplexity launches Perplexity Assistant in its Android app, which can take “multi-app actions” like hailing a ride, initially free for users in 15 languages — AI-powered search engine Perplexity has launched an “agent,” of sorts, called Perplexity Assistant …
Perplexity launches an assistant for Android
https://techcrunch.com/2025/01/23/perplexity-launches-an-assistant-for-android/
Tomi Engdahl says:
Kyle Wiggers / TechCrunch:
Anthropic debuts Citations, a new API feature letting developers “ground” answers in source documents, available for Claude 3.5 Sonnet and Claude 3.5 Haiku — In an announcement perhaps timed to divert attention away from OpenAI’s Operator, Anthropic Thursday unveiled a new feature …
Anthropic’s new Citations feature aims to reduce AI errors
https://techcrunch.com/2025/01/23/anthropics-new-citations-feature-aims-to-reduce-ai-errors/
In an announcement perhaps timed to divert attention away from OpenAI’s Operator, Anthropic Thursday unveiled a new feature for its developer API called Citations, which lets devs “ground” answers from its Claude family of AI in source documents such as emails.
Anthropic says Citations allows its AI models to provide detailed references to “the exact sentences and passages” from docs they use to generate responses. As of Thursday afternoon, Citations is available in both Anthropic’s API and Google’s Vertex AI platform.
As Anthropic explains in a blog post with Citations, devs can add source files to have models automatically cite claims that they inferred from those files. Citations is particularly useful in document summarization, Q&A, and customer support applications, Anthropic says, where the feature can nudge models to insert source citations.
Citations isn’t available for all of Anthropic’s models — only Claude 3.5 Sonnet and Claude 3.5 Haiku. Also, the feature isn’t free. Anthropic notes that Citations may incur charges depending on the length and number of the source documents.
Based on Anthropic’s standard API pricing, which Citations uses, a roughly-100-page source doc would cost around $0.30 with Claude 3.5 Sonnet, or $0.08 with Claude 3.5 Haiku. That may well be worth it for devs looking to cut down on hallucinations and other AI-induced errors.
Tomi Engdahl says:
Kevin Roose / New York Times:
CAIS and Scale AI release “Humanity’s Last Exam”, which they claim is the hardest AI test yet, consisting of ~3,000 multiple-choice and short answer questions
When A.I. Passes This Test, Look Out
https://www.nytimes.com/2025/01/23/technology/ai-test-humanitys-last-exam.html?unlocked_article_code=1.rU4.VoJE.ZK8gbYYFh2T6&smid=url-share
The creators of a new test called “Humanity’s Last Exam” argue we may soon lose the ability to create tests hard enough for A.I. models.
Tomi Engdahl says:
New York Times:
A profile of Chinese AI lab DeepSeek, which says its new open-source DeepSeek-V3 model rivals US models while using fewer AI chips to train, costing just $6M
How Chinese A.I. Start-Up DeepSeek Is Competing With Silicon Valley Giants
https://www.nytimes.com/2025/01/23/technology/deepseek-china-ai-chips.html?unlocked_article_code=1.rU4.B5dI.rZuqL149Jok_&smid=url-share
The company built a cheaper, competitive chatbot with fewer high-end computer chips than U.S. behemoths like Google and OpenAI, showing the limits of chip export control.
Tomi Engdahl says:
Maxwell Zeff / TechCrunch:
OpenAI partners with DoorDash, Instacart, Priceline, StubHub, and Uber to ensure that Operator respects these businesses’ terms of service agreements — OpenAI CEO Sam Altman kicked off this year by saying in a blog post that 2025 would be big for AI agents, tools that can automate tasks and take actions on your behalf.
OpenAI launches Operator, an AI agent that performs tasks autonomously
https://techcrunch.com/2025/01/23/openai-launches-operator-an-ai-agent-that-performs-tasks-autonomously/
OpenAI CEO Sam Altman kicked off this year by saying in a blog post that 2025 would be big for AI agents, tools that can automate tasks and take actions on your behalf.
Now, we’re seeing OpenAI’s first real attempt.
OpenAI announced on Thursday that it is launching a research preview of Operator, a general-purpose AI agent that can take control of a web browser and independently perform certain actions. Operator is coming to U.S. users on ChatGPT’s $200 Pro subscription plan first. OpenAI says it plans to roll this feature out to more users in its Plus, Team, and Enterprise tiers eventually.
“[Operator] will be [in] other countries soon,” OpenAI CEO Sam Altman said during a livestream Thursday. “Europe will, unfortunately, take a while.”
This initial research preview is available through operator.chatgpt.com, but soon, OpenAI says it wants to integrate Operator into all of its ChatGPT clients.
Operator promises to automate tasks such as booking travel accommodations, making restaurant reservations, and shopping online, according to OpenAI. There are several task categories users can choose from within the Operator interface, including shopping, delivery, dining, and travel — all of which enable different kinds of automation.
When ChatGPT users activate Operator, a small window will pop up showing a dedicated web browser that the agent uses to complete tasks, along with explanations of specific actions the agent is performing. Users can still take control of their screen while Operator is working, as Operator uses its own dedicated browser.
OpenAI says that Operator is powered by a Computer-Using Agent model, or CUA, that combines the vision capabilities of the company’s GPT-4o model with reasoning abilities from OpenAI’s more advanced models. The CUA is trained to interact with the front-end of websites, meaning it doesn’t need to use developer-facing APIs to tap into different services.
In other words, the CUA can use buttons, navigate menus, and fill out forms on a web page much like a human would.
OpenAI says it’s collaborating with companies like DoorDash, eBay, Instacart, Priceline, StubHub, and Uber to ensure that Operator respects these businesses’ terms of service agreements.
“The CUA model is trained to ask for user confirmation before finalizing tasks with external side effects, for example before submitting an order, sending an email, etc., so that the user can double-check the model’s work before it becomes permanent,” OpenAI writes in materials provided to TechCrunch. “[It] has already proven useful in a variety of cases, and we aim to extend that reliability across a wider range of tasks.”
But OpenAI warns the CUA isn’t perfect. The company says it “[doesn’t] expect [the] CUA to perform reliably in all scenarios just yet.”
“Currently, Operator cannot reliably handle many complex or specialized tasks,” OpenAI adds in a support document, “such as creating detailed slideshows, managing intricate calendar systems, or interacting with highly customized or non-standard web interfaces.
Out of an abundance of caution, OpenAI is also requiring supervision for some tasks, like banking transactions, the CUA and Operator could perform mostly on their own. Users will need to take over to put in credit card information, for example. OpenAI says that Operator doesn’t collect or screenshot any data.
“On particularly sensitive websites, such as email, Operator requires active user supervision, ensuring users can directly catch and address any potential mistakes the model might make,” OpenAI says in its support materials.
Limitations
Operator has a few limitations worth noting.
There are rate limits — both daily and task-dependent. OpenAI says that Operator can perform multiple tasks at once, but that there are “dynamic limits” on this. There is also an overall usage limit that resets daily.
At this release stage, Operator will also refuse to perform tasks outright for security reasons, like sending emails (despite the fact that the CUA is capable of this) and deleting calendar events. OpenAI says this will change in the future, but gives no ETA.
Operator may also get “stuck” if it runs into a particularly complex interface, password field, or CAPTCHA check. It’ll ask the user to take over when this occurs, OpenAI says.
An agentic future
OpenAI has been rather slow to develop an AI agent compared to rivals (see: agents from Rabbit, Google, and Anthropic), which may have something to do with the safety risks around the technology.
When an AI system can take actions on the web, it opens the door to much more dangerous use cases from nefarious actors. You could automate AI agents to orchestrate phishing scams or DDoS attacks, or have them snatch up tickets to a concert before anyone else could. Especially for a tool as widely used as ChatGPT, it’s important OpenAI take steps to prevent those sorts of exploits.
OpenAI seems to think Operator is safe enough to release in its current form, at least as a research preview.
“Operator employs tools that seek to limit the model’s susceptibility to malicious prompts, hidden instructions, and phishing attempts,” OpenAI explains on its website. “A monitoring system pauses execution if suspicious activity is detected, while automated and human-reviewed pipelines continuously update safeguards.”
Operator is OpenAI’s boldest attempt yet at creating an AI agent. Last week, OpenAI released Tasks, giving ChatGPT simple automation features such as the ability to set reminders and schedule prompts to run at a set time every day.
Tasks gave ChatGPT users some familiar, but necessary, features to make ChatGPT as practical to use as Siri or Alexa. However, Operator shows off capabilities that the previous generation of virtual assistants could never do.
AI agents have been pitched as the next big thing in AI after ChatGPT: a new technology that will change how people use the internet and their PCs. Instead of simply delivering and processing information, agents can — in theory — take actions and actually do things.
Tomi Engdahl says:
Associated Press:
Trump signs an EO to develop AI systems “free from ideological bias or engineered social agendas”, revoking Biden policies that “act as barriers” to innovation — President Donald Trump signed an executive order on artificial intelligence Thursday …
https://apnews.com/article/trump-ai-artificial-intelligence-executive-order-eef1e5b9bec861eaf9b36217d547929c
Tomi Engdahl says:
Financial Times:
Sources: OpenAI faces challenges in transitioning to for-profit status, including valuing Microsoft’s stake; its charitable arm could be worth $30B — ChatGPT-maker in complex talks over splitting from its non-profit arm in move opposed by Elon Musk — OpenAI’s board is locked …
https://www.ft.com/content/7dcd4095-717e-49f8-8d12-6c8673eb73d7
Tomi Engdahl says:
Trump Admin Accused of Using AI to Draft Executive Orders
“This is poor, slipshod work obviously assisted by AI.”
https://futurism.com/trump-admin-accused-ai-executive-orders?fbclid=IwY2xjawIAMyBleHRuA2FlbQIxMQABHQoT60_rjdFepQ_i47aQYSYco7o6Z38hQ1y0FJx3U3oDruRo4P_ifslPBw_aem_FrWFky58BRsUe0wChQpOHA
But while the executive actions range in scope, legal experts have called attention to some curious common threads: bizarre typos, formatting errors and oddities, and stilted language — familiar artifacts that have led to speculation that those who penned them might have turned to AI for help.
Needless to say, this is all speculation. But it is based on experts’ understandings of what normal executive orders should look like from a legal perspective. We reached out to the White House to inquire about the possible use of AI to draft executive actions, but haven’t heard back.
Tomi Engdahl says:
Congrats to the GitHub Copilot 1-Day Build Challenge Winners!
https://dev.to/devteam/congrats-to-the-github-copilot-1-day-build-challenge-winners-4iok
Tomi Engdahl says:
AI decodes the calls of the wild
Artificial intelligence could reveal how animals of the land, sea and sky talk to others of their species. By Neil Savage
https://www.nature.com/immersive/d41586-024-04050-5/index.html?utm_source=facebook&utm_medium=paid_social&utm_campaign=CONR_OUTLK_AWA1_GL_PCFU_CFULF_AI-ROBOT&fbclid=IwY2xjawIBie9leHRuA2FlbQEwAGFkaWQBqxfEZrFjZwEdk0yI-q5utdFp4O7PTDGkMOPsWJz8kRiSsLS7oxNsdPKZpd9DX3qDx8fv_aem_Tm5AVfQqhCdL1lMpb7TAcg