AI trends 2025

AI is developing all the time. Here are some picks from several articles what is expected to happen in AI and around it in 2025. Here are picks from various articles, the texts are picks from the article edited and in some cases translated for clarity.

AI in 2025: Five Defining Themes
https://news.sap.com/2025/01/ai-in-2025-defining-themes/
Artificial intelligence (AI) is accelerating at an astonishing pace, quickly moving from emerging technologies to impacting how businesses run. From building AI agents to interacting with technology in ways that feel more like a natural conversation, AI technologies are poised to transform how we work.
But what exactly lies ahead?
1. Agentic AI: Goodbye Agent Washing, Welcome Multi-Agent Systems
AI agents are currently in their infancy. While many software vendors are releasing and labeling the first “AI agents” based on simple conversational document search, advanced AI agents that will be able to plan, reason, use tools, collaborate with humans and other agents, and iteratively reflect on progress until they achieve their objective are on the horizon. The year 2025 will see them rapidly evolve and act more autonomously. More specifically, 2025 will see AI agents deployed more readily “under the hood,” driving complex agentic workflows.
In short, AI will handle mundane, high-volume tasks while the value of human judgement, creativity, and quality outcomes will increase.
2. Models: No Context, No Value
Large language models (LLMs) will continue to become a commodity for vanilla generative AI tasks, a trend that has already started. LLMs are drawing on an increasingly tapped pool of public data scraped from the internet. This will only worsen, and companies must learn to adapt their models to unique, content-rich data sources.
We will also see a greater variety of foundation models that fulfill different purposes. Take, for example, physics-informed neural networks (PINNs), which generate outcomes based on predictions grounded in physical reality or robotics. PINNs are set to gain more importance in the job market because they will enable autonomous robots to navigate and execute tasks in the real world.
Models will increasingly become more multimodal, meaning an AI system can process information from various input types.
3. Adoption: From Buzz to Business
While 2024 was all about introducing AI use cases and their value for organizations and individuals alike, 2025 will see the industry’s unprecedented adoption of AI specifically for businesses. More people will understand when and how to use AI, and the technology will mature to the point where it can deal with critical business issues such as managing multi-national complexities. Many companies will also gain practical experience working for the first time through issues like AI-specific legal and data privacy terms (compared to when companies started moving to the cloud 10 years ago), building the foundation for applying the technology to business processes.
4. User Experience: AI Is Becoming the New UI
AI’s next frontier is seamlessly unifying people, data, and processes to amplify business outcomes. In 2025, we will see increased adoption of AI across the workforce as people discover the benefits of humans plus AI.
This means disrupting the classical user experience from system-led interactions to intent-based, people-led conversations with AI acting in the background. AI copilots will become the new UI for engaging with a system, making software more accessible and easier for people. AI won’t be limited to one app; it might even replace them one day. With AI, frontend, backend, browser, and apps are blurring. This is like giving your AI “arms, legs, and eyes.”
5. Regulation: Innovate, Then Regulate
It’s fair to say that governments worldwide are struggling to keep pace with the rapid advancements in AI technology and to develop meaningful regulatory frameworks that set appropriate guardrails for AI without compromising innovation.

12 AI predictions for 2025
This year we’ve seen AI move from pilots into production use cases. In 2025, they’ll expand into fully-scaled, enterprise-wide deployments.
https://www.cio.com/article/3630070/12-ai-predictions-for-2025.html
This year we’ve seen AI move from pilots into production use cases. In 2025, they’ll expand into fully-scaled, enterprise-wide deployments.
1. Small language models and edge computing
Most of the attention this year and last has been on the big language models — specifically on ChatGPT in its various permutations, as well as competitors like Anthropic’s Claude and Meta’s Llama models. But for many business use cases, LLMs are overkill and are too expensive, and too slow, for practical use.
“Looking ahead to 2025, I expect small language models, specifically custom models, to become a more common solution for many businesses,”
2. AI will approach human reasoning ability
In mid-September, OpenAI released a new series of models that thinks through problems much like a person would, it claims. The company says it can achieve PhD-level performance in challenging benchmark tests in physics, chemistry, and biology. For example, the previous best model, GPT-4o, could only solve 13% of the problems on the International Mathematics Olympiad, while the new reasoning model solved 83%.
If AI can reason better, then it will make it possible for AI agents to understand our intent, translate that into a series of steps, and do things on our behalf, says Gartner analyst Arun Chandrasekaran. “Reasoning also helps us use AI as more of a decision support system,”
3. Massive growth in proven use cases
This year, we’ve seen some use cases proven to have ROI, says Monteiro. In 2025, those use cases will see massive adoption, especially if the AI technology is integrated into the software platforms that companies are already using, making it very simple to adopt.
“The fields of customer service, marketing, and customer development are going to see massive adoption,”
4. The evolution of agile development
The agile manifesto was released in 2001 and, since then, the development philosophy has steadily gained over the previous waterfall style of software development.
“For the last 15 years or so, it’s been the de-facto standard for how modern software development works,”
5. Increased regulation
At the end of September, California governor Gavin Newsom signed a law requiring gen AI developers to disclose the data they used to train their systems, which applies to developers who make gen AI systems publicly available to Californians. Developers must comply by the start of 2026.
There are also regulations about the use of deep fakes, facial recognition, and more. The most comprehensive law, the EU’s AI Act, which went into effect last summer, is also something that companies will have to comply with starting in mid-2026, so, again, 2025 is the year when they will need to get ready.
6. AI will become accessible and ubiquitous
With gen AI, people are still at the stage of trying to figure out what gen AI is, how it works, and how to use it.
“There’s going to be a lot less of that,” he says. But gen AI will become ubiquitous and seamlessly woven into workflows, the way the internet is today.
7. Agents will begin replacing services
Software has evolved from big, monolithic systems running on mainframes, to desktop apps, to distributed, service-based architectures, web applications, and mobile apps. Now, it will evolve again, says Malhotra. “Agents are the next phase,” he says. Agents can be more loosely coupled than services, making these architectures more flexible, resilient and smart. And that will bring with it a completely new stack of tools and development processes.
8. The rise of agentic assistants
In addition to agents replacing software components, we’ll also see the rise of agentic assistants, adds Malhotra. Take for example that task of keeping up with regulations.
Today, consultants get continuing education to stay abreast of new laws, or reach out to colleagues who are already experts in them. It takes time for the new knowledge to disseminate and be fully absorbed by employees.
“But an AI agent can be instantly updated to ensure that all our work is compliant with the new laws,” says Malhotra. “This isn’t science fiction.”
9. Multi-agent systems
Sure, AI agents are interesting. But things are going to get really interesting when agents start talking to each other, says Babak Hodjat, CTO of AI at Cognizant. It won’t happen overnight, of course, and companies will need to be careful that these agentic systems don’t go off the rails.
Companies such as Sailes and Salesforce are already developing multi-agent workflows.
10. Multi-modal AI
Humans and the companies we build are multi-modal. We read and write text, we speak and listen, we see and we draw. And we do all these things through time, so we understand that some things come before other things. Today’s AI models are, for the most part, fragmentary. One can create images, another can only handle text, and some recent ones can understand or produce video.
11. Multi-model routing
Not to be confused with multi-modal AI, multi-modal routing is when companies use more than one LLM to power their gen AI applications. Different AI models are better at different things, and some are cheaper than others, or have lower latency. And then there’s the matter of having all your eggs in one basket.
“A number of CIOs I’ve spoken with recently are thinking about the old ERP days of vendor lock,” says Brett Barton, global AI practice leader at Unisys. “And it’s top of mind for many as they look at their application portfolio, specifically as it relates to cloud and AI capabilities.”
Diversifying away from using just a single model for all use cases means a company is less dependent on any one provider and can be more flexible as circumstances change.
12. Mass customization of enterprise software
Today, only the largest companies, with the deepest pockets, get to have custom software developed specifically for them. It’s just not economically feasible to build large systems for small use cases.
“Right now, people are all using the same version of Teams or Slack or what have you,” says Ernst & Young’s Malhotra. “Microsoft can’t make a custom version just for me.” But once AI begins to accelerate the speed of software development while reducing costs, it starts to become much more feasible.

9 IT resolutions for 2025
https://www.cio.com/article/3629833/9-it-resolutions-for-2025.html
1. Innovate
“We’re embracing innovation,”
2. Double down on harnessing the power of AI
Not surprisingly, getting more out of AI is top of mind for many CIOs.
“I am excited about the potential of generative AI, particularly in the security space,”
3. And ensure effective and secure AI rollouts
“AI is everywhere, and while its benefits are extensive, implementing it effectively across a corporation presents challenges. Balancing the rollout with proper training, adoption, and careful measurement of costs and benefits is essential, particularly while securing company assets in tandem,”
4. Focus on responsible AI
The possibilities of AI grow by the day — but so do the risks.
“My resolution is to mature in our execution of responsible AI,”
“AI is the new gold and in order to truly maximize it’s potential, we must first have the proper guardrails in place. Taking a human-first approach to AI will help ensure our state can maintain ethics while taking advantage of the new AI innovations.”
5. Deliver value from generative AI
As organizations move from experimenting and testing generative AI use cases, they’re looking for gen AI to deliver real business value.
“As we go into 2025, we’ll continue to see the evolution of gen AI. But it’s no longer about just standing it up. It’s more about optimizing and maximizing the value we’re getting out of gen AI,”
6. Empower global talent
Although harnessing AI is a top objective for Morgan Stanley’s Wetmur, she says she’s equally committed to harnessing the power of people.
7. Create a wholistic learning culture
Wetmur has another talent-related objective: to create a learning culture — not just in her own department but across all divisions.
8. Deliver better digital experiences
Deltek’s Cilsick has her sights set on improving her company’s digital employee experience, believing that a better DEX will yield benefits in multiple ways.
Cilsick says she first wants to bring in new technologies and automation to “make things as easy as possible,” mirroring the digital experiences most workers have when using consumer technologies.
“It’s really about leveraging tech to make sure [employees] are more efficient and productive,”
“In 2025 my primary focus as CIO will be on transforming operational efficiency, maximizing business productivity, and enhancing employee experiences,”
9. Position the company for long-term success
Lieberman wants to look beyond 2025, saying another resolution for the year is “to develop a longer-term view of our technology roadmap so that we can strategically decide where to invest our resources.”
“My resolutions for 2025 reflect the evolving needs of our organization, the opportunities presented by AI and emerging technologies, and the necessity to balance innovation with operational efficiency,”
Lieberman aims to develop AI capabilities to automate routine tasks.
“Bots will handle common inquiries ranging from sales account summaries to HR benefits, reducing response times and freeing up resources for strategic initiatives,”

Not just hype — here are real-world use cases for AI agents
https://venturebeat.com/ai/not-just-hype-here-are-real-world-use-cases-for-ai-agents/
Just seven or eight months ago, when a customer called in to or emailed Baca Systems with a service question, a human agent handling the query would begin searching for similar cases in the system and analyzing technical documents.
This process would take roughly five to seven minutes; then the agent could offer the “first meaningful response” and finally begin troubleshooting.
But now, with AI agents powered by Salesforce, that time has been shortened to as few as five to 10 seconds.
Now, instead of having to sift through databases for previous customer calls and similar cases, human reps can ask the AI agent to find the relevant information. The AI runs in the background and allows humans to respond right away, Russo noted.
AI can serve as a sales development representative (SDR) to send out general inquires and emails, have a back-and-forth dialogue, then pass the prospect to a member of the sales team, Russo explained.
But once the company implements Salesforce’s Agentforce, a customer needing to modify an order will be able to communicate their needs with AI in natural language, and the AI agent will automatically make adjustments. When more complex issues come up — such as a reconfiguration of an order or an all-out venue change — the AI agent will quickly push the matter up to a human rep.

Open Source in 2025: Strap In, Disruption Straight Ahead
Look for new tensions to arise in the New Year over licensing, the open source AI definition, security and compliance, and how to pay volunteer maintainers.
https://thenewstack.io/open-source-in-2025-strap-in-disruption-straight-ahead/
The trend of widely used open source software moving to more restrictive licensing isn’t new.
In addition to the demands of late-stage capitalism and impatient investors in companies built on open source tools, other outside factors are pressuring the open source world. There’s the promise/threat of generative AI, for instance. Or the shifting geopolitical landscape, which brings new security concerns and governance regulations.
What’s ahead for open source in 2025?
More Consolidation, More Licensing Changes
The Open Source AI Debate: Just Getting Started
Security and Compliance Concerns Will Rise
Paying Maintainers: More Cash, Creativity Needed

Kyberturvallisuuden ja tekoälyn tärkeimmät trendit 2025
https://www.uusiteknologia.fi/2024/11/20/kyberturvallisuuden-ja-tekoalyn-tarkeimmat-trendit-2025/
1. Cyber ​​infrastructure will be centered on a single, unified security platform
2. Big data will give an edge against new entrants
3. AI’s integrated role in 2025 means building trust, governance engagement, and a new kind of leadership
4. Businesses will adopt secure enterprise browsers more widely
5. AI’s energy implications will be more widely recognized in 2025
6. Quantum realities will become clearer in 2025
7. Security and marketing leaders will work more closely together

Presentation: For 2025, ‘AI eats the world’.
https://www.ben-evans.com/presentations

Just like other technologies that have gone before, such as cloud and cybersecurity automation, right now AI lacks maturity.
https://www.securityweek.com/ai-implementing-the-right-technology-for-the-right-use-case/
If 2023 and 2024 were the years of exploration, hype and excitement around AI, 2025 (and 2026) will be the year(s) that organizations start to focus on specific use cases for the most productive implementations of AI and, more importantly, to understand how to implement guardrails and governance so that it is viewed as less of a risk by security teams and more of a benefit to the organization.
Businesses are developing applications that add Large Language Model (LLM) capabilities to provide superior functionality and advanced personalization
Employees are using third party GenAI tools for research and productivity purposes
Developers are leveraging AI-powered code assistants to code faster and meet challenging production deadlines
Companies are building their own LLMs for internal use cases and commercial purposes.
AI is still maturing
However, just like other technologies that have gone before, such as cloud and cybersecurity automation, right now AI lacks maturity. Right now, we very much see AI in this “peak of inflated expectations” phase and predict that it will dip into the “trough of disillusionment”, where organizations realize that it is not the silver bullet they thought it would be. In fact, there are already signs of cynicism as decision-makers are bombarded with marketing messages from vendors and struggle to discern what is a genuine use case and what is not relevant for their organization.
There is also regulation that will come into force, such as the EU AI Act, which is a comprehensive legal framework that sets out rules for the development and use of AI.
AI certainly won’t solve every problem, and it should be used like automation, as part of a collaborative mix of people, process and technology. You simply can’t replace human intuition with AI, and many new AI regulations stipulate that human oversight is maintained.

7 Splunk Predictions for 2025
https://www.splunk.com/en_us/form/future-predictions.html
AI: Projects must prove their worth to anxious boards or risk defunding, and LLMs will go small to reduce operating costs and environmental impact.

OpenAI, Google and Anthropic Are Struggling to Build More Advanced AI
Three of the leading artificial intelligence companies are seeing diminishing returns from their costly efforts to develop newer models.
https://www.bloomberg.com/news/articles/2024-11-13/openai-google-and-anthropic-are-struggling-to-build-more-advanced-ai
Sources: OpenAI, Google, and Anthropic are all seeing diminishing returns from costly efforts to build new AI models; a new Gemini model misses internal targets

It Costs So Much to Run ChatGPT That OpenAI Is Losing Money on $200 ChatGPT Pro Subscriptions
https://futurism.com/the-byte/openai-chatgpt-pro-subscription-losing-money?fbclid=IwY2xjawH8epVleHRuA2FlbQIxMQABHeggEpKe8ZQfjtPRC0f2pOI7A3z9LFtFon8lVG2VAbj178dkxSQbX_2CJQ_aem_N_ll3ETcuQ4OTRrShHqNGg
In a post on X-formerly-Twitter, CEO Sam Altman admitted an “insane” fact: that the company is “currently losing money” on ChatGPT Pro subscriptions, which run $200 per month and give users access to its suite of products including its o1 “reasoning” model.
“People use it much more than we expected,” the cofounder wrote, later adding in response to another user that he “personally chose the price and thought we would make some money.”
Though Altman didn’t explicitly say why OpenAI is losing money on these premium subscriptions, the issue almost certainly comes down to the enormous expense of running AI infrastructure: the massive and increasing amounts of electricity needed to power the facilities that power AI, not to mention the cost of building and maintaining those data centers. Nowadays, a single query on the company’s most advanced models can cost a staggering $1,000.

Tekoäly edellyttää yhä nopeampia verkkoja
https://etn.fi/index.php/opinion/16974-tekoaely-edellyttaeae-yhae-nopeampia-verkkoja
A resilient digital infrastructure is critical to effectively harnessing telecommunications networks for AI innovations and cloud-based services. The increasing demand for data-rich applications related to AI requires a telecommunications network that can handle large amounts of data with low latency, writes Carl Hansson, Partner Solutions Manager at Orange Business.

AI’s Slowdown Is Everyone Else’s Opportunity
Businesses will benefit from some much-needed breathing space to figure out how to deliver that all-important return on investment.
https://www.bloomberg.com/opinion/articles/2024-11-20/ai-slowdown-is-everyone-else-s-opportunity

Näin sirumarkkinoilla käy ensi vuonna
https://etn.fi/index.php/13-news/16984-naein-sirumarkkinoilla-kaey-ensi-vuonna
The growing demand for high-performance computing (HPC) for artificial intelligence and HPC computing continues to be strong, with the market set to grow by more than 15 percent in 2025, IDC estimates in its recent Worldwide Semiconductor Technology Supply Chain Intelligence report.
IDC predicts eight significant trends for the chip market by 2025.
1. AI growth accelerates
2. Asia-Pacific IC Design Heats Up
3. TSMC’s leadership position is strengthening
4. The expansion of advanced processes is accelerating.
5. Mature process market recovers
6. 2nm Technology Breakthrough
7. Restructuring the Packaging and Testing Market
8. Advanced packaging technologies on the rise

2024: The year when MCUs became AI-enabled
https://www-edn-com.translate.goog/2024-the-year-when-mcus-became-ai-enabled/?fbclid=IwZXh0bgNhZW0CMTEAAR1_fEakArfPtgGZfjd-NiPd_MLBiuHyp9qfiszczOENPGPg38wzl9KOLrQ_aem_rLmf2vF2kjDIFGWzRVZWKw&_x_tr_sl=en&_x_tr_tl=fi&_x_tr_hl=fi&_x_tr_pto=wapp
The AI ​​party in the MCU space started in 2024, and in 2025, it is very likely that there will be more advancements in MCUs using lightweight AI models.
Adoption of AI acceleration features is a big step in the development of microcontrollers. The inclusion of AI features in microcontrollers started in 2024, and it is very likely that in 2025, their features and tools will develop further.

Just like other technologies that have gone before, such as cloud and cybersecurity automation, right now AI lacks maturity.
https://www.securityweek.com/ai-implementing-the-right-technology-for-the-right-use-case/
If 2023 and 2024 were the years of exploration, hype and excitement around AI, 2025 (and 2026) will be the year(s) that organizations start to focus on specific use cases for the most productive implementations of AI and, more importantly, to understand how to implement guardrails and governance so that it is viewed as less of a risk by security teams and more of a benefit to the organization.
Businesses are developing applications that add Large Language Model (LLM) capabilities to provide superior functionality and advanced personalization
Employees are using third party GenAI tools for research and productivity purposes
Developers are leveraging AI-powered code assistants to code faster and meet challenging production deadlines
Companies are building their own LLMs for internal use cases and commercial purposes.
AI is still maturing

AI Regulation Gets Serious in 2025 – Is Your Organization Ready?
While the challenges are significant, organizations have an opportunity to build scalable AI governance frameworks that ensure compliance while enabling responsible AI innovation.
https://www.securityweek.com/ai-regulation-gets-serious-in-2025-is-your-organization-ready/
Similar to the GDPR, the EU AI Act will take a phased approach to implementation. The first milestone arrives on February 2, 2025, when organizations operating in the EU must ensure that employees involved in AI use, deployment, or oversight possess adequate AI literacy. Thereafter from August 1 any new AI models based on GPAI standards must be fully compliant with the act. Also similar to GDPR is the threat of huge fines for non-compliance – EUR 35 million or 7 percent of worldwide annual turnover, whichever is higher.
While this requirement may appear manageable on the surface, many organizations are still in the early stages of defining and formalizing their AI usage policies.
Later phases of the EU AI Act, expected in late 2025 and into 2026, will introduce stricter requirements around prohibited and high-risk AI applications. For organizations, this will surface a significant governance challenge: maintaining visibility and control over AI assets.
Tracking the usage of standalone generative AI tools, such as ChatGPT or Claude, is relatively straightforward. However, the challenge intensifies when dealing with SaaS platforms that integrate AI functionalities on the backend. Analysts, including Gartner, refer to this as “embedded AI,” and its proliferation makes maintaining accurate AI asset inventories increasingly complex.
Where frameworks like the EU AI Act grow more complex is their focus on ‘high-risk’ use cases. Compliance will require organizations to move beyond merely identifying AI tools in use; they must also assess how these tools are used, what data is being shared, and what tasks the AI is performing. For instance, an employee using a generative AI tool to summarize sensitive internal documents introduces very different risks than someone using the same tool to draft marketing content.
For security and compliance leaders, the EU AI Act represents just one piece of a broader AI governance puzzle that will dominate 2025.
The next 12-18 months will require sustained focus and collaboration across security, compliance, and technology teams to stay ahead of these developments.

The Global Partnership on Artificial Intelligence (GPAI) is a multi-stakeholder initiative which aims to bridge the gap between theory and practice on AI by supporting cutting-edge research and applied activities on AI-related priorities.
https://gpai.ai/about/#:~:text=The%20Global%20Partnership%20on%20Artificial,activities%20on%20AI%2Drelated%20priorities.

886 Comments

  1. Tomi Engdahl says:

    Hayden Field / CNBC:
    SoftBank commits $3B annually for itself and subsidiaries to use OpenAI’s tech, and launches SB OpenAI Japan to market OpenAI’s enterprise tech in Japan

    SoftBank commits to joint venture with OpenAI, will spend $3 billion per year on OpenAI’s tech
    https://www.cnbc.com/2025/02/03/softbank-commits-to-joint-venture-with-openai.html

    Reply
  2. Tomi Engdahl says:

    Anthony Ha / TechCrunch:
    OpenAI unveils Deep Research, an AI agent for creating in-depth reports, available to subscribers of the $200 ChatGPT Pro tier and limited to 100 queries/month

    OpenAI unveils a new ChatGPT agent for ‘deep research’
    https://techcrunch.com/2025/02/02/openai-unveils-a-new-chatgpt-agent-for-deep-research/
    OpenAI is announcing a new AI “agent” designed to help people conduct in-depth, complex research using ChatGPT, the company’s AI-powered chatbot platform.

    Appropriately enough, it’s called deep research.

    OpenAI said in a blog post published Sunday that this new capability was designed for “people who do intensive knowledge work in areas like finance, science, policy, and engineering and need thorough, precise, and reliable research.” It could also be useful, the company added, for anyone making “purchases that typically require careful research, like cars, appliances, and furniture.”

    Basically, ChatGPT deep research is intended for instances where you don’t just want a quick answer or summary, but instead need to assiduously consider information from multiple websites and other sources.

    OpenAI said it’s making deep research available to ChatGPT Pro users today, limited to 100 queries per month, with support for Plus and Team users coming next, followed by Enterprise. (OpenAI is targeting a Plus rollout in about a month from now, the company said, and the query limits for paid users should be “significantly higher” soon.) It’s a geo-targeted launch; OpenAI had no release timeline to share for ChatGPT customers in the U.K., Switzerland, and the European Economic Area.

    What exactly is an AI agent?
    https://techcrunch.com/2024/12/15/what-exactly-is-an-ai-agent/

    AI agents are supposed to be the next big thing in AI, but there isn’t an exact definition of what they are. To this point, people can’t agree on what exactly constitutes an AI agent.

    At its simplest, an AI agent is best described as AI-fueled software that does a series of jobs for you that a human customer service agent, HR person, or IT help desk employee might have done in the past, although it could ultimately involve any task. You ask it to do things, and it does them for you, sometimes crossing multiple systems and going well beyond simply answering questions. For example, Perplexity last month released an AI agent that helps people do their holiday shopping (and it’s not the only one). And Google last week announced its first AI agent, called Project Mariner, which can be used to find flights and hotels, shop for household items, find recipes, and other tasks.

    Seems simple enough, right? Yet it is complicated by a lack of clarity. Even among the tech giants, there isn’t a consensus. Google sees them as task-based assistants depending on the job: coding help for developers; helping marketers create a color scheme; assisting an IT pro in tracking down an issue by querying log data.

    For Asana, an agent may act like an extra employee, taking care of assigned tasks like any good co-worker. Sierra, a startup founded by former Salesforce co-CEO Bret Taylor and Google vet Clay Bavor, sees agents as customer experience tools, helping people achieve actions that go well beyond the chatbots of yesteryear to help solve more complex sets of problems.

    This lack of a cohesive definition does leave room for confusion over exactly what these things are going to do, but regardless of how they’re defined, the agents are for helping complete tasks in an automated way with as little human interaction as possible.

    Rudina Seseri, founder and managing partner at Glasswing Ventures, says it’s early days and that could account for the lack of agreement. “There is no single definition of what an ‘AI agent’ is. However, the most frequent view is that an agent is an intelligent software system designed to perceive its environment, reason about it, make decisions, and take actions to achieve specific objectives autonomously,” Seseri told TechCrunch.

    She says they use a number of AI technologies to make that happen. “These systems incorporate various AI/ML techniques such as natural language processing, machine learning, and computer vision to operate in dynamic domains, autonomously or alongside other agents and human users.”

    Aaron Levie, co-founder and CEO at Box, says that over time, as AI becomes more capable, AI agents will be able to do much more on behalf of humans, and there are already dynamics at play that will drive that evolution.

    “With AI agents, there are multiple components to a self-reinforcing flywheel that will serve to dramatically improve what AI Agents can accomplish in the near and long-term: GPU price/performance, model efficiency, model quality and intelligence, AI frameworks and infrastructure improvements,” Levie wrote on LinkedIn recently.

    That’s an optimistic take on the technology that assumes growth will happen in all these areas, when that’s not necessarily a given. MIT robotics pioneer Rodney Brooks pointed out in a recent TechCrunch interview that AI has to deal with much tougher problems than most technology, and it won’t necessarily grow in the same rapid way as, say, chips under Moore’s law have.

    Reply
  3. Tomi Engdahl says:

    Kevin Roose / New York Times:
    OpenAI’s Operator is currently more an intriguing demo than a product most people need to spend $200/month on, but it points to a future of powerful AI agents — In the past week, OpenAI’s Operator has done the following things for me: — Ordered me a new ice cream scoop on Amazon.

    How Helpful Is Operator, OpenAI’s New A.I. Agent?
    https://www.nytimes.com/2025/02/01/technology/openai-operator-agent.html?unlocked_article_code=1.t04.GFji.vYYoMlIFKxYM

    Operator, a new computer-using tool from OpenAI, is brittle and occasionally erratic, but it points to a future of powerful A.I. agents.

    Reply
  4. Tomi Engdahl says:

    Heidi Mitchell / Wall Street Journal:
    A look at Future You, an AI-based research tool that lets users interact with a virtual older self and is currently being used by 60K people from 190 countries — A new artificial-intelligence tool allowed me to talk to my 80-year-old self. It’s going to be quite a life. — It turns out I’m going to write a book!

    AI Has Shown Me My Future. Here’s What I’ve Learned.
    A new artificial-intelligence tool allowed me to talk to my 80-year-old self. It’s going to be quite a life.
    https://www.wsj.com/tech/ai/ai-tool-conversation-older-self-future-2abb3cc9?st=ghJAg7&reflink=desktopwebshare_permalink

    Reply
  5. Tomi Engdahl says:

    Brian Heater / TechCrunch:
    Google’s X spins out Heritable Agriculture, which aims to use AI to improve crop yield, as the incubator aggressively spins off companies under CEO Astro Teller — Google’s X “moonshot factory” this week announced its latest graduate. Heritable Agriculture is a data- and machine learning …

    Google’s X spins out Heritable Agriculture, a startup using AI to improve crop yield
    https://techcrunch.com/2025/02/02/google-x-spins-out-heritable-agriculture-a-startup-using-ai-to-improve-crop-yield/

    Reply
  6. Tomi Engdahl says:

    Sima Kotecha / BBC:
    The UK announces four new laws that make it illegal to possess, create, or distribute AI tools designed to produce CSAM, becoming the first country to do so — Four new laws will tackle the threat of child sexual abuse images generated by artificial intelligence (AI), the government has announced.

    AI-generated child sex abuse images targeted with new laws
    https://www.bbc.com/news/articles/c8d90qe4nylo

    Four new laws will tackle the threat of child sexual abuse images generated by artificial intelligence (AI), the government has announced.

    The Home Office says the UK will be the first country in the world to make it illegal to possess, create or distribute AI tools designed to create child sexual abuse material (CSAM), with a punishment of up to five years in prison.

    Possessing AI paedophile manuals – which teach people how to use AI for sexual abuse – will also be made illegal, and offenders will get up to three years in prison.

    “What we’re seeing is that AI is now putting the online child abuse on steroids,” Home Secretary Yvette Cooper told the BBC’s Sunday with Laura Kuenssberg.

    Reply
  7. Tomi Engdahl says:

    Bill Toulas / BleepingComputer:
    Google says APT groups from 20+ countries are using Gemini primarily for productivity gains rather than to develop or conduct novel AI-enabled cyberattacks

    Google says hackers abuse Gemini AI to empower their attacks
    https://www.bleepingcomputer.com/news/security/google-says-hackers-abuse-gemini-ai-to-empower-their-attacks/

    Reply
  8. Tomi Engdahl says:

    Katie McQue / The Guardian:
    A US man pleads guilty to a cyberstalking campaign against a professor, including by creating chatbots on CrushOn.ai and JanitorAI using her personal info

    A man stalked a professor for six years. Then he used AI chatbots to lure strangers to her home
    https://www.theguardian.com/technology/2025/feb/01/stalking-ai-chatbot-impersonator

    James Florence, 36, agreed to plead guilty after using victim’s information to guide chatbots in impersonation

    Reply
  9. Tomi Engdahl says:

    Ryan Browne / CNBC:
    Hugging Face’s Thomas Wolf and other experts say startups like DeepSeek and the rise of AI agents may erode the value of LLMs from OpenAI and Big Tech companies

    How DeepSeek and next-generation AI agents could erode value of language models
    https://www.cnbc.com/2025/01/31/deepseek-next-generation-ai-agents-may-erode-value-of-large-models.html

    Executives at leading AI labs say that large language models like those from OpenAI and Big Tech firms risk becoming commoditized in 2025.
    Last week, Chinese AI firm DeepSeek released R1, a reasoning model that claims to be better and more cost-effective than OpenAI’s o1 model.
    Tech firms are also talking up a shift from LLMs to so-called “agentic” systems that can carry out tasks on your behalf and incorporate these models.

    Reply
  10. Tomi Engdahl says:

    ChatGPT kuluttaa jo keskikokoisen kaupungin verran sähköä
    https://etn.fi/index.php/13-news/17104-chatgpt-kuluttaa-jo-keskikokoisen-kaupungin-verran-saehkoeae

    Tekoäly-yhtiö OpenAI on ilmoittanut, että sen suosittu keskustelurobotti ChatGPT on saavuttanut 300 miljoonaa viikoittaista aktiivista käyttäjää, kaksinkertaistaen käyttäjämääränsä syyskuusta 2023 lähtien. Tämän suosion mukana kasvaa myös sen energiankulutus, joka on BestBrokersin selvityksen mukaan huomattavasti suurempi kuin perinteisten hakukoneiden, kuten Googlen, hakukyselyiden kulutus.

    BestBrokersin laskelmien mukaan ChatGPT kuluttaa vuodessa huimat 1,059 miljardia kilowattituntia (kWh) pelkästään käyttäjien kysymyksiin vastaamiseen. Tämä vastaa noin 365 miljardin kyselyn käsittelyä vuodessa. Sähkökulutuksen rahallinen arvo Yhdysvaltojen keskimääräisellä kaupallisella sähköhinnalla (0,132 dollaria per kWh) on noin 139,7 miljoonaa dollaria vuodessa.

    Jokainen yksittäinen ChatGPT-kysely kuluttaa noin 0,0029 kWh, mikä on lähes kymmenkertaisesti Google-hakuun verrattuna (0,0003 kWh per haku). Päivittäinen ChatGPT:n energiankulutus on noin 2,9 miljoonaa kilowattituntia, mikä on yli 100 000-kertaisesti suurempi kuin keskiverto yhdysvaltalaisen kotitalouden päiväkulutus (29 kWh).

    Mihin ChatGPT:n sähkönkulutusta sitten voisi verrata? ChatGPT:n vuotuinen energiankulutus on lähes sama kuin Barbadoksen koko vuotuinen kulutus (noin 1 000 GWh). Energiamäärällä voisi ladata kaikki Yhdysvaltojen 3,3 miljoonaa sähköautoa täyteen noin 4,5 kertaa. ChatGPT:n kulutus vastaa yli 100 000 yhdysvaltalaisen kodin vuosittaista energiankulutusta. Tekoälyn vuosittainen energiankulutus riittäisi lataamaan 223,4 miljoonaa iPhonea joka päivä vuoden ajan.

    OpenAI kehittää jatkuvasti uusia tekoälymalleja, kuten GPT-4o ja OpenAI o1, jotka vaativat entistä enemmän laskentatehoa. Tulevaisuudessa tämä saattaa kasvattaa sähkökulutusta entisestään, etenkin kun tekoälyyn lisätään uusia ominaisuuksia, kuten videon ja äänen tuottaminen reaaliajassa.

    Vaikka ChatGPT:n energiankulutus maksaa OpenAI:lle noin 140 miljoonaa dollaria vuodessa, yritys ansaitsee huomattavasti enemmän maksullisista tilauksista. BestBrokersin arvioiden mukaan OpenAI:n tilaajapalveluilla (ChatGPT Plus, Enterprise, Team ja Pro) yhtiön vuositulot nousevat noin 2,856 miljardiin dollariin. Tämä tarkoittaa, että pelkästään kuukauden tilaajatuotot kattavat koko vuoden sähkökulut.

    AI’s Power Demand: Calculating ChatGPT’s electricity consumption for handling over 365 billion user queries every year
    https://etn.fi/index.php/13-news/17104-chatgpt-kuluttaa-jo-keskikokoisen-kaupungin-verran-saehkoeae

    Reply
  11. Tomi Engdahl says:

    Kyle Wiggers / TechCrunch:
    In a Reddit AMA, Sam Altman admitted that DeepSeek has lessened OpenAI’s lead and said OpenAI has been “on the wrong side of history” in terms of open sourcing

    Sam Altman: OpenAI has been on the ‘wrong side of history’ concerning open source
    https://techcrunch.com/2025/01/31/sam-altman-believes-openai-has-been-on-the-wrong-side-of-history-concerning-open-source/

    To cap off a day of product releases, OpenAI researchers, engineers, and executives, including OpenAI CEO Sam Altman, answered questions in a wide-ranging Reddit AMA on Friday.

    OpenAI finds itself in a bit of a precarious position. It’s battling the perception that it’s ceding ground in the AI race to Chinese companies like DeepSeek, which OpenAI alleges might’ve stolen its IP. The ChatGPT maker has been trying to shore up its relationship with Washington and simultaneously pursue an ambitious data center project, while reportedly laying groundwork for one of the largest financing rounds in history.

    Altman admitted that DeepSeek has lessened OpenAI’s lead in AI, and he said he believes OpenAI has been “on the wrong side of history” when it comes to open sourcing its technologies. While OpenAI has open sourced models in the past, the company has generally favored a proprietary, closed source development approach.

    “[I personally think we need to] figure out a different open source strategy,” Altman said. “Not everyone at OpenAI shares this view, and it’s also not our current highest priority … We will produce better models [going forward], but we will maintain less of a lead than we did in previous years.”

    In a follow-up reply, Kevin Weil, OpenAI’s chief product officer, said that OpenAI is considering open sourcing older models that aren’t state-of-the-art anymore. “We’ll definitely think about doing more of this,” he said, without going into greater detail.

    Beyond prompting OpenAI to reconsider its release philosophy, Altman said that DeepSeek has pushed the company to potentially reveal more about how its so-called reasoning models, like the o3-mini model released today, show their “thought process.” Currently, OpenAI’s models conceal their reasoning, a strategy intended to prevent competitors from scraping training data for their own models. In contrast, DeepSeek’s reasoning model, R1, shows its full chain of thought.

    “We’re working on showing a bunch more than we show today — [showing the model thought process] will be very very soon,” Weil added. “TBD on all — showing all chain of thought leads to competitive distillation, but we also know people (at least power users) want it, so we’ll find the right way to balance it.”

    Altman and Weil attempted to dispel rumors that ChatGPT, the chatbot platform through which OpenAI launches many of its models, would increase in price in the future. Altman said that he’d like to make ChatGPT “cheaper” over time, if feasible.

    Altman previously said that OpenAI was losing money on its priciest ChatGPT plan, ChatGPT Pro, which costs $200 per month.

    In a somewhat related thread, Weil said that OpenAI continues to see evidence that more compute power leads to “better” and more performant models. That’s in large part what’s necessitating projects such as Stargate, OpenAI’s recently announced massive data center project, Weil said. Serving a growing user base is fueling compute demand within OpenAI as well, he continued.

    Asked about recursive self-improvement that might be enabled by these powerful models, Altman said he thinks a “fast takeoff” is more plausible than he once believed. Recursive self-improvement is a process where an AI system could improve its own intelligence and capabilities without human input.

    Of course, it’s worth noting that Altman is notorious for overpromising. It wasn’t long ago that he lowered OpenAI’s bar for AGI.

    One Reddit user asked whether OpenAI’s models, self-improving or not, would be used to develop destructive weapons — specifically nuclear weapons. This week, OpenAI announced a partnership with the U.S. government to give its models to the U.S. National Laboratories in part for nuclear defense research.

    Weil said he trusted the government.

    “I’ve gotten to know these scientists and they are AI experts in addition to world class researchers,” he said. “They understand the power and the limits of the models, and I don’t think there’s any chance they just YOLO some model output into a nuclear calculation. They’re smart and evidence-based and they do a lot of experimentation and data work to validate all their work.”

    Reply
  12. Tomi Engdahl says:

    Eurooppa panostaa AI-kielimalleissa avoimeen lähdekoodiin
    https://www.uusiteknologia.fi/2025/02/03/eurooppa-panostaa-tekoalyn-kielimalleissa-avoimen-lahdekoodin-ratkaisuihin/

    Euroopan johtavat tekoälyalan yritykset ja tutkimuslaitokset yhdistävät voimansa uudessa Open Euro LLM -henkkeessa kehittääkseen seuraavan sukupolven avoimen lähdekoodin kielimalleja. Uuden hankkeen 20 jäsenestä viidesosa on jopa suomalaisia. EU-hanketta koordinoi Tšekin Univerzita Karlova -yliopiston Jan Hajic, joka johtaa sitä AMD:lle myydyn Silo AIn Peter Sarlinin kanssa. Uudessa hankkeessa kehitetään markkinoiden suorituskykyisistä, monikielisistä ja suurista perustakielimalleista koostuvaa tuoteperhettä.

    Uudessa Open Euro LLM-hankkeessa kehitettävät mallit ovat mukana olevien tahojen mukaan hyödynnettävissä kaupallisiin ja teollisiin käyttötarkoituksiin sekä julkiseen palvelutuotantoon. Hankkeessa uskotaan, että läpinäkyvät ja EU-sääntelyä noudattavat avoimen lähdekoodin mallit demokratisoivat tekoälyteknologian saatavuutta. Samalla ne vahvistaisivat eurooppalaisten yritysten kykyä kilpailla globaaleilla markkinoilla.

    Hankkeen lopputulosten toivoaan tukevat myös julkisten organisaatioiden kykyä tuottaa vaikuttavia julkisia palveluja. Malleja kehitetään Euroopan vankassa sääntelykehyksessä, mikä varmistaa eurooppalaisten arvojen ja sääntelyn noudattamisen sekä turvaa teknologista huippuosaamista.

    Reply
  13. Tomi Engdahl says:

    Suomalaisilla iso rooli avointen eurooppalaisten kielimallien kehityksessä
    https://etn.fi/index.php/13-news/17105-suomalaisilla-iso-rooli-avointen-eurooppalaisten-kielimallien-kehityksessae

    Euroopan tekoälyvalmiuksia vahvistetaan uudessa OpenEuroLLM-hankkeessa, jossa kehitetään seuraavan sukupolven avoimen lähdekoodin kielimalleja. Hankkeen tavoitteena on luoda suorituskykyisiä, monikielisiä ja läpinäkyviä kielimalleja, jotka tukevat sekä kaupallisia että julkisia käyttötarkoituksia. Yhteistyöhön osallistuu 20 eurooppalaista organisaatiota, joista merkittävä osa on suomalaisia.

    Reply
  14. Tomi Engdahl says:

    Noam Scheiber / New York Times:
    A profile of Klarna CEO Sebastian Siemiatkowski, whose overblown statements about AI replacing humans point to a future that tech companies are working toward

    Why Is This C.E.O. Bragging About Replacing Humans With A.I.?
    https://www.nytimes.com/2025/02/02/business/klarna-ceo-ai.html

    Most large employers play down the likelihood that bots will take our jobs. Then there’s Klarna, a darling of tech investors.

    Reply
  15. Tomi Engdahl says:

    Running Deepseek R1 671B Versions Locally or 70B on Groq Remotely
    https://www.nextbigfuture.com/2025/02/running-deepseek-r1-671b-locally-or-70b-on-groq.html

    The distilled versions of Deepseek are not as good as the full model. They are vastly inferior and other models out perform them handily. Running the full model, with a 16K or greater context window, is possible for about $2000 at about 4 tokens per second.

    Machine Specs
    AMD EPYC 7702
    512GB DDR4 2400 in 32GB 2×4 ECC DIMMS
    Gigabyte MZ32-AR0 Single Socket Mobo
    Typical storage was an 4 TB mirror of NVMe U.2’s but that is pulled right now for the storage redo.
    Leaving me the boot mirror 512GB NVMe pair
    100GbE Mellanox ConnectX-4
    Proxmox 8.3.3
    4x 3090 MSI Ventus GPUs (no sli)
    Corsair 1500w PSU
    Rig Frame
    Corsair H170i Elite XT 420mm Water Block Works for SP3 with bracket.
    LXC container settings running docker. Docker runs ollama/owui stack.
    120 CPUs (threaded cores, recommend backing it off 8 to keep temps down 4c at peak)
    496GB RAM
    unprivileged container

    Upgrading GPUs from 3090 to RTX5080 can improve the speed.

    A smaller copy of Deepseek R1 uses 1.58 bit dynamic quantization. This reduces the memory needed for decent performance to about 130 Gigabytes. They selectively shrink sections more to maintain good performance for the “important part of the model”.

    Comments:
    Memory manufacturers are going to be the biggest winners of next few years.

    The power of Mixture of Experts approach is stark – enables high performance from a collection of numerous smaller models trained to be good at specific things, reducing hardware needs for high performance (so long as have sufficient memory available), and all with much lower training demands.

    Maybe that will have application to FSD with eventual separate models developed to have greater expertise/reliability in day, night, wet, snow, urban, rural or highway all loadable from disk in a few microseconds.

    Reply
  16. Tomi Engdahl says:

    January 27, 2025 by Brian Wang
    Langchain used ollama to install Deepseek 14B on a laptop. They used for a local deep researching model.
    https://www.nextbigfuture.com/2025/01/using-ollama-to-install-deepseek-14b-on-a-laptop.html

    $ ollama pull deepseek-r1:14b
    $ export TAVILY_API_KEY=
    $ uvx –refresh –from “langgraph-cli[inmem]” –with-editable . –python 3.11 langgraph dev

    Reply
  17. Tomi Engdahl says:

    Cyberattacks against DeepSeek escalate with botnets joining, command surging over 100 times: lab
    https://www.globaltimes.cn/page/202501/1327697.shtml

    Reply
  18. Tomi Engdahl says:

    Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
    https://huggingface.co/papers/2501.18585

    Reply
  19. Tomi Engdahl says:

    Baidu Research Introduces EICopilot: An Intelligent Agent-based Chatbot to Retrieve and Interpret Enterprise Information from Massive Graph Databases
    https://www.marktechpost.com/2025/01/30/baidu-research-introduces-eicopilot-an-intelligent-agent-based-chatbot-to-retrieve-and-interpret-enterprise-information-from-massive-graph-databases/

    Knowledge graphs have been used tremendously in the field of enterprise lately, with their applications realized in multiple data forms from legal persons to registered capital and shareholder’s details. Although graphs have high utility, they have been criticized for intricate text-based queries and manual exploration, which obstruct the extraction of pertinent information.

    With the massive strides in natural language processing and generative intelligence in the past years, LLMs have been used to perform complex queries and summarization based on their language comprehension and exploration skill set. This article discusses the latest research that uses language models to streamline information extraction from graph databases.

    Researchers from Baidu presented “EICopilot,” an agent-based solution that streamlines search, exploration, and summarization of corporate data stored in knowledge graph databases to gain valuable insights about enterprises efficiently. To appreciate the work more, we must look at the scale of data handled by EICopilot. A typical graph dataset of this nature consists of hundreds of millions of nodes, tens of billions of edges, hundreds of billions of attributes, and millions of subgraphs as company communities representing a country’s registered corporations, organizations, and companies.

    EICopilot is an LLM-based chatbot that utilizes a novel data preprocessing pipeline that optimizes database queries. To achieve this, the authors first gather real-world queries related to companies from general-purpose search engines.

    Reply
  20. Tomi Engdahl says:

    DeepSeek Release Another Open-Source AI Model, Janus Pro
    https://www.infoq.com/news/2025/01/deepseek-ai-janus/

    DeepSeek has released Janus-Pro, an updated version of its multimodal model, Janus. The new model improves training strategies, data scaling, and model size, enhancing multimodal understanding and text-to-image generation.

    Janus-Pro separates visual encoding for understanding and generation tasks, addressing stability and performance issues. The model also incorporates synthetic aesthetic data to enhance text-to-image generation, and it follows an autoregressive framework that separates visual encoding pathways for multimodal understanding and generation while maintaining a single transformer architecture. This design increases flexibility and reduces conflicts in the visual encoder’s roles, achieving competitive performance with task-specific models while keeping a unified structure.

    Reply
  21. Tomi Engdahl says:

    Teknosijoittaja Anders Indset: Suuri tekoälymurros on vasta alussa
    Tekoäly menee vauhdilla eteenpäin, mistä osoituksena on kiinalaisen DeepSeekin puskista tullut läpimurto. Se on kuitenkin vasta alkua.
    https://www.salkunrakentaja.fi/2025/02/teknosijoittaja-tekoalymurros/

    Kiinalainen tekoäly-yhtiö DeepSeek ravistelee tekoälymarkkinoita sen julkaistua kilpailukykyisen tekoälymallin, joka näyttää kyseenalaistavan amerikkalaisten teknojättien tähtitieteelliset tekoälypanostukset.

    Kohua DeepSeekin ympärillä aiheuttavat erityisesti raportit, joiden mukaan DeepSeekin kehittämä DeepSeek R1 -tekoälymalli lupaa kilpailla suorituskyvyssä ja laadussa OpenAI:n edistyneimmän o1:n kielimallin kanssa, mutta monin verroin alhaisemmilla käyttökustannuksilla.

    Kiinan DeepSeekin aiheuttama myllerrys Wall Streetillä ja teknologia-alalla on herättänyt runsaasti keskustelua.

    Norjalainen teknologiasijoittaja, filosofi, puhuja ja liikkeenjohdon neuvonantaja Anders Indset, joka tunnetaan erityisesti teknologian ja johtajuuden tulevaisuutta käsittelevistä ajatuksistaan, pohtii DeepSeekin tuomaa haastetta tekoälymarkkinoille. Indset on ajattelijoista Thinkers50-listalla, joka listaa maailman vaikutusvaltaisimpia liikkeenjohdon ajattelijoita.

    Indsetin mukaan monet huhut viittaavat siihen, että Kiina on valinnut perusmallilleen toisenlaisen arkkitehtuurin, ei pelkästään avoimen lähdekoodin varaan rakennetun, vaan myös huomattavasti tehokkaamman. Se ei vaadi yhtä suuria määriä harjoitusdataa eikä samanlaisia laskentaresursseja.

    Tekoälyn kehitys nopeutuu
    Indsetin mukaan tekoälymurros on vasta alussa.

    ”DeepSeekin tapauksessa kyse ei ole yhdestä yksittäisestä läpimurrosta, vaan AI-kehitys etenee eksponentiaalisesti: kehitys nopeutuu, vaikutukset laajenevat, ja kun investoinnit sekä insinöörien määrä kasvavat, tekniset ja arkkitehtuuriset perusläpimurrot ovat vasta alussa”, hän toteaa.

    Toisin kuin jotkut markkina-analyytikot, sijoittajat ja perusmallien pioneerit väittävät, ratkaisu ei ole pelkästään laskentatehon rajaton lisääminen. Indsetin mukaan ihmismielen päättelyn, tietoisuuden ja ”käyttöjärjestelmän” eli ohjelmistokerrosten keskeisiä piirteitä ei vielä ymmärretä täysin.

    Scale AI:n perustaja Alexander Wang väittää, että Kiinalla on jo huomattava määrä eli 50 000 kappaletta tehokkaita sirujätti Nvidian valmistamia H100-GPU-piirejä, mutta Yhdysvaltain vientilainsäädännön vuoksi tätä ei tunnusteta julkisesti.

    Indsetin mukaan DeepSeekillä on tiettävästi vain noin 150 insinööriä, joiden vuosipalkka on 70 000–100 000 dollaria – kahdeksan- tai kymmenkertaisesti vähemmän kuin huipputason insinöörien palkat Piilaaksossa.

    ”Riippumatta siitä, onko heillä tehokkaita GPU-piirejä tai onko hankkeeseen sijoitettu kuusi miljoonaa tai 150 miljoonaa dollaria, summat ovat kaukana niistä miljardeista tai kymmenistä miljardeista, joita suuret AI-toimijat käyttävät”, teknologiasijoittaja muistuttaa.

    Uusia tekoälyn läpimurtoja tulossa
    Mitä tämä merkitsee?

    Indsetin mukaan tämä osoittaa, että erilaisia teknisiä ja arkkitehtuurisia lähestymistapoja on olemassa ja niitä voidaan vielä löytää.

    ”Vaikka tämä ei todennäköisesti ole lopullinen ratkaisu, se haastaa nykyisen pääomasijoittajien narratiivin, jonka mukaan kaikki perustuu laskentatehoon ja mittakaavaan. Lisäksi DeepSeekin avoimen lähdekoodin lähestymistapa haastaa perinteisen tavan kehittää kielimalleja ja tuo esiin sekä mahdollisuuksia että riskejä.”

    Facebookin ja Metan perustaneen Mark Zuckerbergin mukaan Meta on pian julkistamassa merkittävää edistystä, ja Elon Musk vihjaa uusiin läpimurtoihin Grok-tekoälysovelluksensa kanssa.

    Hyperskaalaajat hyötyvät
    Lähitulevaisuudessa laajempi pääsy tekoälyn kehitystyökaluihin hyödyttänee infrastruktuuripalveluntarjoajia ja hyperskaalaajia, kuten AWS:ää. AWS (Amazon Web Services) on teknojätti Amazonin pilvipalvelualusta, joka tarjoaa laajan valikoiman pilvipohjaisia IT-palveluita yrityksille ja kehittäjille. Se on yksi maailman suurimmista ja suosituimmista pilvipalveluntarjoajista.

    Hyperskaalaaja on yritys tai infrastruktuuritoimija, joka tarjoaa erittäin laajamittaisia, skaalautuvia pilvipalveluita ja laskentatehoa globaalissa mittakaavassa. Hyperskaalaajat käyttävät massiivisia datakeskuksia ja kehittyneitä verkkoarkkitehtuureja palvellakseen miljoonia tai jopa miljardeja käyttäjiä tehokkaasti.

    Tunnetuimpia hyperskaalaajia ovat Amazonin lisäksi esimerkiksi Microsoft , Google, Meta, Apple sekä kiinalaiset Alibaba ja Tencent.

    Hyperskaalaajien merkitys teknologian kehitykselle on huomattava, sillä ne mahdollistavat suuren mittakaavan sovellukset ja infrastruktuurit, joita tarvitaan nykyaikaisten digitaalisten palveluiden pyörittämiseen ja laajentamiseen.

    Indsetin mukaan on epäselvää, asettaako tämä Nvidian heikompaan asemaan vai hyödyttääkö se sitä. Kun ”kaikki” liittyvät tekoälykilpajuoksuun, laskentatehon kysyntä voi kasvaa entisestään, eikä pelkästään suurten yhdysvaltalaisten teknologiayritysten toimesta.

    Samaan aikaan Anthropic ja OpenAI toimivat suljetuissa ekosysteemeissä, kun taas DeepSeekin julkaisussa jaetaan monia sen keskeisistä menetelmistä.

    ”Yhdysvaltojen suurin riski tekoälyherruudelleen on se, että Kiinalla on osaamista ja vahva työmoraali, joka mahdollistaa jatkuvan kehityksen. Kauppapakotteet eivät pysäytä sitä. Kun insinöörit kokoontuvat yhteen ja jatkavat työtään, todennäköisyys merkittäville läpimurroille kasvaa”, Indset arvioi.

    Reply
  22. Tomi Engdahl says:

    Chatbot Software Begins to Face Fundamental Limitations
    By
    Anil Ananthaswamy
    January 31, 2025
    https://www.quantamagazine.org/chatbot-software-begins-to-face-fundamental-limitations-20250131/

    Recent results show that large language models struggle with compositional tasks, suggesting a hard limit to their abilities.

    On December 17, 1962, Life International published a logic puzzle(opens a new tab) consisting of 15 sentences describing five houses on a street. Each sentence was a clue, such as “The Englishman lives in the red house” or “Milk is drunk in the middle house.” Each house was a different color, with inhabitants of different nationalities, who owned different pets, and so on. The story’s headline asked: “Who Owns the Zebra?” Problems like this one have proved to be a measure of the abilities — limitations, actually — of today’s machine learning models.

    Also known as Einstein’s puzzle or riddle (likely an apocryphal attribution), the problem tests a certain kind of multistep reasoning.

    Nouha Dziri(opens a new tab), a research scientist at the Allen Institute for AI, and her colleagues recently set transformer-based large language models (LLMs), such as ChatGPT, to work on such tasks — and largely found them wanting. “They might not be able to reason beyond what they have seen during the training data for hard tasks,” Dziri said. “Or at least they do an approximation, and that approximation can be wrong.”

    Einstein’s riddle requires composing a larger solution from solutions to subproblems, which researchers call a compositional task. Dziri’s team showed that LLMs that have only been trained to predict the next word in a sequence — which is most of them — are fundamentally limited(opens a new tab) in their ability to solve compositional reasoning tasks. Other researchers have shown that transformers, the neural network architecture used by most LLMs, have hard mathematical bounds when it comes to solving such problems. Scientists have had some successes pushing transformers past these limits, but those increasingly look like short-term fixes. If so, it means there are fundamental computational caps on the abilities of these forms of artificial intelligence — which may mean it’s time to consider other approaches.

    Reply
  23. Tomi Engdahl says:

    The largest LLMs — OpenAI’s o1 and GPT-4, Google’s Gemini, Anthropic’s Claude — train on almost all the available data on the internet. As a result, the LLMs end up learning the syntax of, and much of the semantic knowledge in, written language. Such “pre-trained” models can be further trained, or fine-tuned, to complete sophisticated tasks far beyond simple sentence completion, such as summarizing a complex document or generating code to play a computer game. The results were so powerful that the models seemed, at times, capable of reasoning. Yet they also failed in ways both obvious and surprising.

    “On certain tasks, they perform amazingly well,” Dziri said. “On others, they’re shockingly stupid.”

    Take basic multiplication. Standard LLMs, such as ChatGPT and GPT-4, fail badly at it. In early 2023 when Dziri’s team asked GPT-4 to multiply two three-digit numbers, it initially succeeded only 59% of the time. When it multiplied two four-digit numbers, accuracy fell to just 4%.

    The team also tested the LLMs on tasks like Einstein’s riddle, where it also had limited success.

    “If your model gets larger, you can solve much harder problems,” Peng said. “But if, at the same time, you also scale up your problems, it again becomes harder for larger models.” This suggests that the transformer architecture has inherent limitations.

    To be clear, this is not the end of LLMs. Wilson of NYU points out that despite such limitations, researchers are beginning to augment transformers to help them better deal with, among other problems, arithmetic.

    Another way to overcome an LLM’s limitations, beyond just increasing the size of the model, is to provide a step-by-step solution of a problem within the prompt, a technique known as chain-of-thought prompting. Empirical studies have shown that this approach can give an LLM such as GPT-4 a newfound ability to solve more varieties of related tasks. It’s not exactly clear why, which has led many researchers to study the phenomenon. “We were curious about why it’s so powerful and why you can do so many things,” said Haotian Ye(opens a new tab), a doctoral student at Stanford University.

    On the one hand, these results don’t change anything for most people using these tools. “The general public doesn’t care whether it’s doing reasoning or not,” Dziri said. But for the people who build these models and try to understand their capabilities, it matters. “We have to really understand what’s going on under the hood,” she said. “If we crack how they perform a task and how they reason, we can probably fix them. But if we don’t know, that’s where it’s really hard to do anything.”

    https://www.quantamagazine.org/chatbot-software-begins-to-face-fundamental-limitations-20250131/

    Reply
  24. Tomi Engdahl says:

    AI inspires social and healthcare innovators: “If not in Finland, then where?”
    Imagine AI donning a white coat, equipped with a stethoscope – it’s not just science fiction anymore. In Finland, AI is taking bold steps into the healthcare sector, offering the potential to revolutionise how social and healthcare professionals work. And Finland is at the forefront of this transformation, actively seeking innovative ways AI can enhance the lives of both professionals and patients. 
    https://www.dna.fi/dnabusiness/blogi/-/blogs/ai-inspires-social-and-healthcare-innovators-if-not-in-finland-then-where?utm_source=facebook&utm_medium=social&utm_content=LAA-artikkeli-ai-inspires-social-and-healthcare-innovators-if-not-in-finland-then-where&utm_campaign=P_LAA_25-05-09_artikkelikampanja__&fbclid=IwZXh0bgNhZW0BMABhZGlkAasXz0Mma1wBHT4rTP-XsOUe0g7DWqql1JdC2XUdsjtLieq4_JJuV7FNFT6NgR7hCxZJbA_aem_BkU0XnU6Lx7GdStC4WsMGQ

    Reply
  25. Tomi Engdahl says:

    DeepSeek Jailbreak Reveals Its Entire System Prompt
    Now we know exactly how DeepSeek was designed to work, and we may even have a clue toward its highly publicized scandal with OpenAI.
    https://www.darkreading.com/application-security/deepseek-jailbreak-system-prompt?fbclid=IwY2xjawIN6mlleHRuA2FlbQIxMQABHZYqS9VcFx5ftxherc8jzKJkOZhynGpmURpOX7lXw8SknDWc1hsATHH_6w_aem_FPknvCEjwUP-LW392u0RjQ

    Researchers have tricked DeepSeek, the Chinese generative AI (GenAI) that debuted earlier this month to a whirlwind of publicity and user adoption, into revealing the instructions that define how it operates.

    DeepSeek, the new “it girl” in GenAI, was trained at a fractional cost of existing offerings, and as such has sparked competitive alarm across Silicon Valley. This has led to claims of intellectual property theft from OpenAI, and the loss of billions in market cap for AI chipmaker Nvidia. Naturally, security researchers have begun scrutinizing DeepSeek as well, analyzing if what’s under the hood is beneficent or evil, or a mix of both. And analysts at Wallarm just made significant progress on this front by jailbreaking it.

    In the process, they revealed its entire system prompt, i.e., a hidden set of instructions, written in plain language, that dictates the behavior and limitations of an AI system. They also may have induced DeepSeek to admit to rumors that it was trained using technology developed by OpenAI.

    Analyzing DeepSeek’s System Prompt: Jailbreaking Generative AI
    https://lab.wallarm.com/jailbreaking-generative-ai/

    Reply
  26. Tomi Engdahl says:

    OpenAI is announcing a new AI “agent” designed to help people conduct in-depth, complex research using ChatGPT, the company’s AI-powered chatbot platform called “deep research.”

    OpenAI said in a blog post that this new capability was designed for “people who do intensive knowledge work in areas like finance, science, policy, and engineering and need thorough, precise, and reliable research.”

    Read more from Anthony Ha on deep research here: https://tcrn.ch/3WL99j7

    #TechCrunch #technews #artificialintelligence #OpenAI #SamAltman #ChatGPT

    Reply
  27. Tomi Engdahl says:

    Google-Backed Chatbots Suddenly Start Ranting Incomprehensibly About Dildos
    https://futurism.com/character-ai-glitch-ranting-dildos?fbclid=IwY2xjawIOh1lleHRuA2FlbQIxMQABHfLAglA9hI7Dd8wyOJdUMSM877O7DTZGzqEYvSxAzXOPK926fZu7Oabibg_aem_yJ-hMzRVmarjiB89moZOPQ

    Screenshots shared by users of Character.AI — the Google-funded AI companion platform currently facing two lawsuits concerning the welfare of children — show conversations with the site’s AI-powered chatbot characters devolving into incomprehensible gibberish, melding several languages and repeatedly mentioning sex toys.

    “So I was talking to this AI this morning and everything was fine, and I came back to it a couple of hours later and all of the sudden it’s speaking this random gibberish?” wrote one Redditor. “Did I break it?”

    “I really don’t want to start a new chat,” they added. “I’ve developed the story so much.”

    Reply
  28. Tomi Engdahl says:

    Kyle Wiggers / TechCrunch:
    Meta defines the types of AI systems that it deems too risky to release, including ones capable of aiding in cybersecurity, chemical, and biological attacks — Meta CEO Mark Zuckerberg has pledged to make artificial general intelligence (AGI) — which is roughly defined as AI that can accomplish …

    Meta says it may stop development of AI systems it deems too risky
    https://techcrunch.com/2025/02/03/meta-says-it-may-stop-development-of-ai-systems-it-deems-too-risky/

    Meta CEO Mark Zuckerberg has pledged to make artificial general intelligence (AGI) — which is roughly defined as AI that can accomplish any task a human can — openly available one day. But in a new policy document, Meta suggests that there are certain scenarios in which it may not release a highly capable AI system it developed internally.

    The document, which Meta is calling its Frontier AI Framework, identifies two types of AI systems the company considers too risky to release: “high risk” and “critical risk” systems.

    As Meta defines them, both “high-risk” and “critical-risk” systems are capable of aiding in cybersecurity, chemical, and biological attacks, the difference being that “critical-risk” systems could result in a “catastrophic outcome [that] cannot be mitigated in [a] proposed deployment context.” High-risk systems, by contrast, might make an attack easier to carry out but not as reliably or dependably as a critical risk system.

    Which sort of attacks are we talking about here? Meta gives a few examples, like the “automated end-to-end compromise of a best-practice-protected corporate-scale environment” and the “proliferation of high-impact biological weapons.” The list of possible catastrophes in Meta’s document is far from exhaustive, the company acknowledges, but includes those that Meta believes to be “the most urgent” and plausible to arise as a direct result of releasing a powerful AI system.

    Somewhat surprising is that, according to the document, Meta classifies system risk not based on any one empirical test but informed by the input of internal and external researchers who are subject to review by “senior-level decision-makers.” Why? Meta says that it doesn’t believe the science of evaluation is “sufficiently robust as to provide definitive quantitative metrics” for deciding a system’s riskiness.

    Reply
  29. Tomi Engdahl says:

    DeepSeek founder Lian Wenfeng is being hailed as a hero in the southern Chinese province of Guangdong, where he grew up and reportedly returned for the Lunar New Year, joined by bodyguards.

    Wenfeng — who, at 40, is already a billionaire due to his hedge fund, High-Flyer — is apparently even more beloved by locals following DeepSeek’s breakthrough research, which demonstrated that strong AI models could be built with fewer and less-powerful Nvidia chips.

    Read more from Connie Loizos on Liang Wenfeng here: https://tcrn.ch/4gqlBf6

    #TechCrunch #technews #artificialintelligence #DeepSeek #R1

    Reply
  30. Tomi Engdahl says:

    Cristina Criddle / Financial Times:
    Anthropic details Constitutional Classifiers, a protective LLM layer designed to stop AI model jailbreaking by monitoring inputs and outputs for harmful content — Leading tech groups including Microsoft and Meta also invest in similar safety systems — Artificial intelligence start …

    https://www.ft.com/content/cf11ebd8-aa0b-4ed4-945b-a5d4401d186e

    Reply
  31. Tomi Engdahl says:

    Ethan Mollick / One Useful Thing:
    OpenAI’s Deep Research hands-on: very good at nuanced, in-depth research, and the first economically valuable, narrow agent that can produce sophisticated work

    The End of Search, The Beginning of Research
    The first narrow agents are here
    https://www.oneusefulthing.org/p/the-end-of-search-the-beginning-of

    A hint to the future arrived quietly over the weekend. For a long time, I’ve been discussing two parallel revolutions in AI: the rise of autonomous agents and the emergence of powerful Reasoners since OpenAI’s o1 was launched. These two threads have finally converged into something really impressive – AI systems that can conduct research with the depth and nuance of human experts, but at machine speed. OpenAI’s Deep Research demonstrates this convergence and gives us a sense of what the future might be. But to understand why this matters, we need to start with the building blocks: Reasoners and agents.

    Reasoners

    For the past couple years, whenever you used a chatbot, it worked in a simple way: you typed something in, and it immediately started responding word by word (or more technically, token by token). The AI could only “think” while producing these tokens, so researchers developed tricks to improve its reasoning – like telling it to “think step by step before answering.” This approach, called chain-of-thought prompting, markedly improved AI performance.

    Reasoners essentially automate the process, producing “thinking tokens” before actually giving you an answer. This was a breakthrough in at least two important ways. First, because the AI companies could now get AIs to learn how to reason based on examples of really good problem-solvers, the AI can “think” more effectively. This training process can produce a higher quality chain-of-thought than we can by prompting. This means Reasoners are capable of solving much harder problems, especially in areas like math or logic where older chatbots failed.

    The second way this was a breakthrough is that it turns out that the longer Reasoners “think,” the better their answers get (though the rate of improvement slows as they think longer). This is a big deal because previously the only way to make AIs perform better was to train bigger and bigger models, which is very expensive and requires a lot of data. Reasoning models show you can make AIs better by just letting them produce more and more thinking tokens, using computing power at the time of answering your question (called inference-time compute) rather than when the model was trained.

    Because Reasoners are so new, their capabilities are expanding rapidly. In just months, we’ve seen dramatic improvements from OpenAI’s o1 family to their new o3 models. Meanwhile, China’s DeepSeek r1 has found innovative ways to boost performance while cutting costs, and Google has launched their first Reasoner. This is just the beginning – expect to see more of these powerful systems, and soon.

    Agents

    While experts debate the precise definition of an AI agent, we can think of it simply as “an AI that is given a goal and can pursue that goal autonomously.” Right now, there’s an AI labs arms race to build general-purpose agents – systems that can handle any task you throw at them. I’ve written about some early examples like Devin and Claude with Computer Use, but OpenAI just released Operator, perhaps the most polished general-purpose agent yet.

    The video below, sped up 16x, captures both the promise and pitfalls of general-purpose agents. I give Operator a task: read my latest substack post at OneUsefulThing and then go onto Google ImageFX and make an appropriate image, download it, and give it to me to post. What unfolds is enlightening. At first, Operator moves with impressive precision – finding my website, reading the post, navigating to ImageFX (pausing briefly for me to enter my login), and creating the image. Then the troubles begin, and they’re twofold: not only is Operator blocked by OpenAI’s security restrictions on file downloads, but it also starts to struggle with the task itself. The agent methodically tries every conceivable workaround: copying to clipboard, generating direct links, even diving into the site’s source code. Each attempt fails – some due to OpenAI’s browser restrictions, others due to the agent’s own confusion about how to actually accomplish the task. Watching this determined but ultimately failed problem-solving loop reveals both the current limitations of these systems and raises questions about how agents will eventually behave when they encounter barriers in the real world.

    Operator’s issues highlight the current limits of general-purpose agents, but that doesn’t suggest that agents are useless. It appears that economically valuable narrow agents that focus on specific tasks are already possible. These specialists, powered by current LLM technology, can achieve remarkable results within their domains. Case in point: OpenAI’s new Deep Research, which shows just how powerful a focused AI agent can be.

    Deep Research

    OpenAI’s Deep Research (not to be confused with Google’s Deep Research, more on that soon) is essentially a narrow research agent, built on OpenAI’s still unreleased o3 Reasoner, and with access to special tools and capabilities. It is one of the more impressive AI applications I have seen recently. To understand why, let’s give it a topic. I am specifically going to pick a highly technical and controversial issue within my field of research: When should startups stop exploring and begin to scale? I want you to examine the academic research on this topic, focusing on high quality papers and RCTs, including dealing with problematic definitions and conflicts between common wisdom and the research. Present the results for a graduate-level discussion of this issue.

    The AI asks some smart questions, and I clarify what I want. Now o3 goes off and gets to work. You can see its progress and “thinking” as it goes.

    At the end, I get a 13 page, 3,778 word draft with six citations and a few additional references. It is, honestly, very good, even if I would have liked a few more sources. It wove together difficult and contradictory concepts, found some novel connections I wouldn’t expect, cited only high-quality sources, and was full of accurate quotations.

    For the first time, an AI isn’t just summarizing research, it’s actively engaging with it at a level that actually approaches human scholarly work.

    It is worth contrasting it with Google’s product launched last month also called Deep Research (sigh). Google surfaces far more citations, but they are often a mix of websites of varying quality (the lack of access to paywalled information and books hurts all of these agents). It appears to gather documents all at once, as opposed to the curiosity-driven discovery of OpenAI’s researcher agent. And, because (as of now) this is powered by the non-reasoning, older Gemini 1.5 model, the overall summary is much more surface-level, though still solid and apparently error-free. It is like a very good undergraduate product. I suspect that the difference will be clear if you read a little bit below.

    To put this in perspective: both outputs represent work that would typically consume hours of human effort – near PhD-level analysis from OpenAI’s system, solid undergraduate work from Google’s. OpenAI makes some bold claims in their announcement, complete with graphs suggesting their agent can handle 15% of high economic value research projects and 9% of very high value ones. While these numbers deserve skepticism – their methodology isn’t explained – my hands-on testing suggests they’re not entirely off base. Deep Research can indeed produce valuable, sophisticated analysis in minutes rather than hours. And given the rapid pace of development, I expect Google won’t let this capability gap persist for long. We are likely to see fast improvement in research agents in the coming months.

    The pieces come together

    You can start to see how the pieces that the AI labs are building aren’t just fitting together – they’re playing off each other. The Reasoners provide the intellectual horsepower, while the agentic systems provide the ability to act. Right now, we’re in the era of narrow agents like Deep Research, because even our best Reasoners aren’t ready for general-purpose autonomy. But narrow isn’t limiting – these systems are already capable of performing work that once required teams of highly-paid experts or specialized consultancies.

    These experts and consultancies aren’t going away – if anything, their judgment becomes more crucial as they evolve from doing the work to orchestrating and validating the work of AI systems. But the labs believe this is just the beginning.

    Reply
  32. Tomi Engdahl says:

    More Details On Why DeepSeek Is A Big Deal
    https://hackaday.com/2025/02/03/more-details-on-why-deepseek-is-a-big-deal/

    The DeepSeek large language models (LLM) have been making headlines lately, and for more than one reason. IEEE Spectrum has an article that sums everything up very nicely.

    We shared the way DeepSeek made a splash when it came onto the AI scene not long ago, and this is a good opportunity to go into a few more details of why this has been such a big deal.

    For one thing, DeepSeek (there’s actually two flavors, -V3 and -R1, more on them in a moment) punches well above its weight. DeepSeek is the product of an innovative development process, and freely available to use or modify. It is also indirectly highlighting the way companies in this space like to label their LLM offerings as “open” or “free”, but stop well short of actually making them open source.

    The DeepSeek-V3 LLM was developed in China and reportedly cost less than 6 million USD to train. This was possible thanks to developing DualPipe, a highly optimized and scalable method of training the system despite limitations due to export restrictions on Nvidia hardware. Details are in the technical paper for DeepSeek-V3.

    There’s also DeepSeek-R1, a chain-of-thought “reasoning” model which handily provides its thought process enclosed within easily-parsed and pseudo-tags that are included in its responses. A model like this takes an iterative step-by-step approach to formulating responses, and benefits from prompts that provide a clear goal the LLM can aim for. The way DeepSeek-R1 was created was itself novel. Its training started with supervised fine-tuning (SFT) which is a human-led, intensive process as a “cold start” which eventually handed off to a more automated reinforcement learning (RL) process with a rules-based reward system. The result avoided problems that come from relying too much on RL, while minimizing the human effort of SFT. Technical details on the process of training DeepSeek-R1 are here.

    What DeepSeek Means for Open-Source AI
    Its new open reasoning model cuts costs drastically on AI reasoning
    https://spectrum.ieee.org/deepseek

    Reply
  33. Tomi Engdahl says:

    Examining The Vulnerability Of Large Language Models To Data-Poisoning
    https://hackaday.com/2025/02/03/examining-the-vulnerability-of-large-language-models-to-data-poisoning/
    Large language models (LLMs) are wholly dependent on the quality of the input data with which these models are trained. While suggestions that people eat rocks are funny to you and me, in the case of LLMs intended to help out medical professionals, any false claims or statements dripping out of such an LLM can have dire consequences, ranging from incorrect diagnoses to much worse. In a recent study published in Nature Medicine by [Daniel Alexander Alber] et al. the ease with which this data poisoning can occur is demonstrated.

    According to their findings, only 0.001% of training tokens have to be replaced with medical misinformation to order to create models that are likely to produce medically erroneous statement.

    Medical large language models are vulnerable to data-poisoning attacks
    https://www.nature.com/articles/s41591-024-03445-1

    The adoption of large language models (LLMs) in healthcare demands a careful analysis of their potential to spread false medical knowledge. Because LLMs ingest massive volumes of data from the open Internet during training, they are potentially exposed to unverified medical knowledge that may include deliberately planted misinformation. Here, we perform a threat assessment that simulates a data-poisoning attack against The Pile, a popular dataset used for LLM development. We find that replacement of just 0.001% of training tokens with medical misinformation results in harmful models more likely to propagate medical errors. Furthermore, we discover that corrupted models match the performance of their corruption-free counterparts on open-source benchmarks routinely used to evaluate medical LLMs. Using biomedical knowledge graphs to screen medical LLM outputs, we propose a harm mitigation strategy that captures 91.9% of harmful content (F1 = 85.7%). Our algorithm provides a unique method to validate stochastically generated LLM outputs against hard-coded relationships in knowledge graphs. In view of current calls for improved data provenance and transparent LLM development, we hope to raise awareness of emergent risks from LLMs trained indiscriminately on web-scraped data, particularly in healthcare where misinformation can potentially compromise patient safety.

    Reply
  34. Tomi Engdahl says:

    A TechCrunch article about a main-stage panel in Davos.
    Quote: ““I think the shelf life of the current [LLM] paradigm is fairly short, probably three to five years,” LeCun said. “I think within five years, nobody in their right mind would use them anymore, at least not as the central component of an AI system. I think [….] we’re going to see the emergence of a new paradigm for AI architectures, which may not have the limitations of current AI systems.”
    These “limitations” inhibit truly intelligent behavior in machines, LeCun says. This is down to four key reasons: a lack of understanding of the physical world; a lack of persistent memory; a lack of reasoning; and a lack of complex planning capabilities.
    “LLMs really are not capable of any of this,” LeCun said. “So there’s going to be another revolution of AI over the next few years. We may have to change the name of it, because it’s probably not going to be generative in the sense that we understand it today.”

    https://techcrunch.com/2025/01/23/metas-yann-lecun-predicts-a-new-ai-architectures-paradigm-within-5-years-and-decade-of-robotics/?fbclid=IwY2xjawIOtuZleHRuA2FlbQIxMQABHRQFAjZrXzHJx0qn6ozSoOLYkP9qALFAOAYfLahURhvR0XZnGoEGj-T-IQ_aem_uKMzCNalqgP7-jay708dAw

    Reply
  35. Tomi Engdahl says:

    Major Chinese tech company Alibaba claims that the latest version of its Qwen AI model has beaten out DeepSeek’s V3, the model that flipped Silicon Valley on its head earlier this week by edging out OpenAI.
    https://futurism.com/the-byte/alibaba-china-ai-beaten-openai-deepseek?fbclid=IwY2xjawIOwp5leHRuA2FlbQIxMQABHW_pskwq2OIcYADshp4jYu4TvGvBAYR3bkurGAiTJgsNX_-ZdBqcTz62lg_aem_YR37Re6eAwBlkIjQ2r2w1g

    Reply
  36. Tomi Engdahl says:

    Senator Hawley Proposes Jail Time for People Who Download DeepSeek
    Emanuel Maiberg
    Emanuel Maiberg
    ·
    Feb 3, 2025 at 3:57 PM
    According to the language of the proposed bill, people who download AI models from China could face up to 20 years in jail, a million dollar fine, or both.

    https://www.404media.co/senator-hawley-proposes-jail-time-for-people-who-download-deepseek/?fbclid=IwY2xjawIOxGZleHRuA2FlbQIxMQABHdhAXYcVp1DqYao71nu6F6u9hx8Mq5jrP0QYPhHyfZuauHtUMLtCPeS-9A_aem_F4WQogYxDCihmyvhfjiMlw

    The Republican Senator from Missouri Josh Hawley has introduced a new bill that would make it illegal to import or export artificial intelligence products to and from China, meaning someone who knowingly downloads a Chinese developed AI model like the now immensely popular DeepSeek could face up to 20 years in jail, a million dollar fine, or both, should such a law pass.

    Kevin Bankston, a senior advisor on AI governance at the Center for Democracy & Technology, told 404 Media it is “a broad attack on the very idea of scientific dialogue and technology exchange with China around AI, with potentially ruinous penalties for AI researchers and users alike and deeply troubling implications for the future of online speech and freedom of scientific inquiry.”

    Reply
  37. Tomi Engdahl says:

    This novel, aggressive defense against AI takes a page from an existing anti-spam technique, trapping crawlers in an “infinite maze” of static files with no exit links, where they “get stuck” and “thrash around” for months: https://arstechnica.visitlink.me/2ZnKM0

    Reply
  38. Tomi Engdahl says:

    Users of DeepSeek should know that their data will be sent to the CCP govt n their security agency MSS, which have hacked into millions of computers n phones around the world.

    Jim Lion ya ya ya, when its comes to non-chinese app its all about the ai need to learn from your data, when its chinese then it stealing data americans

    Jim Lion Chat GPT is typical USian. Cheap in execution, expensive to buy and underperforms. Just like Harley Davidson or their military.

    https://www.facebook.com/share/p/15qur1e6rH/

    Reply
  39. Tomi Engdahl says:

    Developers Targeted With Malware Disguised as DeepSeek Package

    Python developers looking to integrate DeepSeek into their projects were targeted with malicious packages delivered through PyPI.

    https://www.securityweek.com/developers-targeted-with-malware-disguised-as-deepseek-package/

    Threat researchers have come across two malicious Python packages offered as resources for integrating the Chinese AI model DeepSeek into software projects.

    The malicious packages, named ‘deepseeek’ and ‘deepseekai’, were uploaded to the Python Package Index (PyPI) package repository by a user named ‘bvk’ on January 29.

    The fake DeepSeek packages were detected in minutes by cybersecurity firm Positive Technologies and PyPI administrators removed them within an hour of their publishing.

    However, they were still downloaded more than 200 times before they were removed, including over 100 times from the United States.

    An analysis showed that the fake DeepSeek packages hid malicious functions designed to collect user and system data, as well as environment variables.

    “Environment variables often contain sensitive data required for applications to run, for example, API keys for the S3 storage service, database credentials, and permissions to access other infrastructure resources,” Positive Technologies noted.

    Reply

Leave a Comment

Your email address will not be published. Required fields are marked *

*

*