AI is developing all the time. Here are some picks from several articles what is expected to happen in AI and around it in 2025. Here are picks from various articles, the texts are picks from the article edited and in some cases translated for clarity.
AI in 2025: Five Defining Themes
https://news.sap.com/2025/01/ai-in-2025-defining-themes/
Artificial intelligence (AI) is accelerating at an astonishing pace, quickly moving from emerging technologies to impacting how businesses run. From building AI agents to interacting with technology in ways that feel more like a natural conversation, AI technologies are poised to transform how we work.
But what exactly lies ahead?
1. Agentic AI: Goodbye Agent Washing, Welcome Multi-Agent Systems
AI agents are currently in their infancy. While many software vendors are releasing and labeling the first “AI agents” based on simple conversational document search, advanced AI agents that will be able to plan, reason, use tools, collaborate with humans and other agents, and iteratively reflect on progress until they achieve their objective are on the horizon. The year 2025 will see them rapidly evolve and act more autonomously. More specifically, 2025 will see AI agents deployed more readily “under the hood,” driving complex agentic workflows.
In short, AI will handle mundane, high-volume tasks while the value of human judgement, creativity, and quality outcomes will increase.
2. Models: No Context, No Value
Large language models (LLMs) will continue to become a commodity for vanilla generative AI tasks, a trend that has already started. LLMs are drawing on an increasingly tapped pool of public data scraped from the internet. This will only worsen, and companies must learn to adapt their models to unique, content-rich data sources.
We will also see a greater variety of foundation models that fulfill different purposes. Take, for example, physics-informed neural networks (PINNs), which generate outcomes based on predictions grounded in physical reality or robotics. PINNs are set to gain more importance in the job market because they will enable autonomous robots to navigate and execute tasks in the real world.
Models will increasingly become more multimodal, meaning an AI system can process information from various input types.
3. Adoption: From Buzz to Business
While 2024 was all about introducing AI use cases and their value for organizations and individuals alike, 2025 will see the industry’s unprecedented adoption of AI specifically for businesses. More people will understand when and how to use AI, and the technology will mature to the point where it can deal with critical business issues such as managing multi-national complexities. Many companies will also gain practical experience working for the first time through issues like AI-specific legal and data privacy terms (compared to when companies started moving to the cloud 10 years ago), building the foundation for applying the technology to business processes.
4. User Experience: AI Is Becoming the New UI
AI’s next frontier is seamlessly unifying people, data, and processes to amplify business outcomes. In 2025, we will see increased adoption of AI across the workforce as people discover the benefits of humans plus AI.
This means disrupting the classical user experience from system-led interactions to intent-based, people-led conversations with AI acting in the background. AI copilots will become the new UI for engaging with a system, making software more accessible and easier for people. AI won’t be limited to one app; it might even replace them one day. With AI, frontend, backend, browser, and apps are blurring. This is like giving your AI “arms, legs, and eyes.”
5. Regulation: Innovate, Then Regulate
It’s fair to say that governments worldwide are struggling to keep pace with the rapid advancements in AI technology and to develop meaningful regulatory frameworks that set appropriate guardrails for AI without compromising innovation.
12 AI predictions for 2025
This year we’ve seen AI move from pilots into production use cases. In 2025, they’ll expand into fully-scaled, enterprise-wide deployments.
https://www.cio.com/article/3630070/12-ai-predictions-for-2025.html
This year we’ve seen AI move from pilots into production use cases. In 2025, they’ll expand into fully-scaled, enterprise-wide deployments.
1. Small language models and edge computing
Most of the attention this year and last has been on the big language models — specifically on ChatGPT in its various permutations, as well as competitors like Anthropic’s Claude and Meta’s Llama models. But for many business use cases, LLMs are overkill and are too expensive, and too slow, for practical use.
“Looking ahead to 2025, I expect small language models, specifically custom models, to become a more common solution for many businesses,”
2. AI will approach human reasoning ability
In mid-September, OpenAI released a new series of models that thinks through problems much like a person would, it claims. The company says it can achieve PhD-level performance in challenging benchmark tests in physics, chemistry, and biology. For example, the previous best model, GPT-4o, could only solve 13% of the problems on the International Mathematics Olympiad, while the new reasoning model solved 83%.
If AI can reason better, then it will make it possible for AI agents to understand our intent, translate that into a series of steps, and do things on our behalf, says Gartner analyst Arun Chandrasekaran. “Reasoning also helps us use AI as more of a decision support system,”
3. Massive growth in proven use cases
This year, we’ve seen some use cases proven to have ROI, says Monteiro. In 2025, those use cases will see massive adoption, especially if the AI technology is integrated into the software platforms that companies are already using, making it very simple to adopt.
“The fields of customer service, marketing, and customer development are going to see massive adoption,”
4. The evolution of agile development
The agile manifesto was released in 2001 and, since then, the development philosophy has steadily gained over the previous waterfall style of software development.
“For the last 15 years or so, it’s been the de-facto standard for how modern software development works,”
5. Increased regulation
At the end of September, California governor Gavin Newsom signed a law requiring gen AI developers to disclose the data they used to train their systems, which applies to developers who make gen AI systems publicly available to Californians. Developers must comply by the start of 2026.
There are also regulations about the use of deep fakes, facial recognition, and more. The most comprehensive law, the EU’s AI Act, which went into effect last summer, is also something that companies will have to comply with starting in mid-2026, so, again, 2025 is the year when they will need to get ready.
6. AI will become accessible and ubiquitous
With gen AI, people are still at the stage of trying to figure out what gen AI is, how it works, and how to use it.
“There’s going to be a lot less of that,” he says. But gen AI will become ubiquitous and seamlessly woven into workflows, the way the internet is today.
7. Agents will begin replacing services
Software has evolved from big, monolithic systems running on mainframes, to desktop apps, to distributed, service-based architectures, web applications, and mobile apps. Now, it will evolve again, says Malhotra. “Agents are the next phase,” he says. Agents can be more loosely coupled than services, making these architectures more flexible, resilient and smart. And that will bring with it a completely new stack of tools and development processes.
8. The rise of agentic assistants
In addition to agents replacing software components, we’ll also see the rise of agentic assistants, adds Malhotra. Take for example that task of keeping up with regulations.
Today, consultants get continuing education to stay abreast of new laws, or reach out to colleagues who are already experts in them. It takes time for the new knowledge to disseminate and be fully absorbed by employees.
“But an AI agent can be instantly updated to ensure that all our work is compliant with the new laws,” says Malhotra. “This isn’t science fiction.”
9. Multi-agent systems
Sure, AI agents are interesting. But things are going to get really interesting when agents start talking to each other, says Babak Hodjat, CTO of AI at Cognizant. It won’t happen overnight, of course, and companies will need to be careful that these agentic systems don’t go off the rails.
Companies such as Sailes and Salesforce are already developing multi-agent workflows.
10. Multi-modal AI
Humans and the companies we build are multi-modal. We read and write text, we speak and listen, we see and we draw. And we do all these things through time, so we understand that some things come before other things. Today’s AI models are, for the most part, fragmentary. One can create images, another can only handle text, and some recent ones can understand or produce video.
11. Multi-model routing
Not to be confused with multi-modal AI, multi-modal routing is when companies use more than one LLM to power their gen AI applications. Different AI models are better at different things, and some are cheaper than others, or have lower latency. And then there’s the matter of having all your eggs in one basket.
“A number of CIOs I’ve spoken with recently are thinking about the old ERP days of vendor lock,” says Brett Barton, global AI practice leader at Unisys. “And it’s top of mind for many as they look at their application portfolio, specifically as it relates to cloud and AI capabilities.”
Diversifying away from using just a single model for all use cases means a company is less dependent on any one provider and can be more flexible as circumstances change.
12. Mass customization of enterprise software
Today, only the largest companies, with the deepest pockets, get to have custom software developed specifically for them. It’s just not economically feasible to build large systems for small use cases.
“Right now, people are all using the same version of Teams or Slack or what have you,” says Ernst & Young’s Malhotra. “Microsoft can’t make a custom version just for me.” But once AI begins to accelerate the speed of software development while reducing costs, it starts to become much more feasible.
9 IT resolutions for 2025
https://www.cio.com/article/3629833/9-it-resolutions-for-2025.html
1. Innovate
“We’re embracing innovation,”
2. Double down on harnessing the power of AI
Not surprisingly, getting more out of AI is top of mind for many CIOs.
“I am excited about the potential of generative AI, particularly in the security space,”
3. And ensure effective and secure AI rollouts
“AI is everywhere, and while its benefits are extensive, implementing it effectively across a corporation presents challenges. Balancing the rollout with proper training, adoption, and careful measurement of costs and benefits is essential, particularly while securing company assets in tandem,”
4. Focus on responsible AI
The possibilities of AI grow by the day — but so do the risks.
“My resolution is to mature in our execution of responsible AI,”
“AI is the new gold and in order to truly maximize it’s potential, we must first have the proper guardrails in place. Taking a human-first approach to AI will help ensure our state can maintain ethics while taking advantage of the new AI innovations.”
5. Deliver value from generative AI
As organizations move from experimenting and testing generative AI use cases, they’re looking for gen AI to deliver real business value.
“As we go into 2025, we’ll continue to see the evolution of gen AI. But it’s no longer about just standing it up. It’s more about optimizing and maximizing the value we’re getting out of gen AI,”
6. Empower global talent
Although harnessing AI is a top objective for Morgan Stanley’s Wetmur, she says she’s equally committed to harnessing the power of people.
7. Create a wholistic learning culture
Wetmur has another talent-related objective: to create a learning culture — not just in her own department but across all divisions.
8. Deliver better digital experiences
Deltek’s Cilsick has her sights set on improving her company’s digital employee experience, believing that a better DEX will yield benefits in multiple ways.
Cilsick says she first wants to bring in new technologies and automation to “make things as easy as possible,” mirroring the digital experiences most workers have when using consumer technologies.
“It’s really about leveraging tech to make sure [employees] are more efficient and productive,”
“In 2025 my primary focus as CIO will be on transforming operational efficiency, maximizing business productivity, and enhancing employee experiences,”
9. Position the company for long-term success
Lieberman wants to look beyond 2025, saying another resolution for the year is “to develop a longer-term view of our technology roadmap so that we can strategically decide where to invest our resources.”
“My resolutions for 2025 reflect the evolving needs of our organization, the opportunities presented by AI and emerging technologies, and the necessity to balance innovation with operational efficiency,”
Lieberman aims to develop AI capabilities to automate routine tasks.
“Bots will handle common inquiries ranging from sales account summaries to HR benefits, reducing response times and freeing up resources for strategic initiatives,”
Not just hype — here are real-world use cases for AI agents
https://venturebeat.com/ai/not-just-hype-here-are-real-world-use-cases-for-ai-agents/
Just seven or eight months ago, when a customer called in to or emailed Baca Systems with a service question, a human agent handling the query would begin searching for similar cases in the system and analyzing technical documents.
This process would take roughly five to seven minutes; then the agent could offer the “first meaningful response” and finally begin troubleshooting.
But now, with AI agents powered by Salesforce, that time has been shortened to as few as five to 10 seconds.
Now, instead of having to sift through databases for previous customer calls and similar cases, human reps can ask the AI agent to find the relevant information. The AI runs in the background and allows humans to respond right away, Russo noted.
AI can serve as a sales development representative (SDR) to send out general inquires and emails, have a back-and-forth dialogue, then pass the prospect to a member of the sales team, Russo explained.
But once the company implements Salesforce’s Agentforce, a customer needing to modify an order will be able to communicate their needs with AI in natural language, and the AI agent will automatically make adjustments. When more complex issues come up — such as a reconfiguration of an order or an all-out venue change — the AI agent will quickly push the matter up to a human rep.
Open Source in 2025: Strap In, Disruption Straight Ahead
Look for new tensions to arise in the New Year over licensing, the open source AI definition, security and compliance, and how to pay volunteer maintainers.
https://thenewstack.io/open-source-in-2025-strap-in-disruption-straight-ahead/
The trend of widely used open source software moving to more restrictive licensing isn’t new.
In addition to the demands of late-stage capitalism and impatient investors in companies built on open source tools, other outside factors are pressuring the open source world. There’s the promise/threat of generative AI, for instance. Or the shifting geopolitical landscape, which brings new security concerns and governance regulations.
What’s ahead for open source in 2025?
More Consolidation, More Licensing Changes
The Open Source AI Debate: Just Getting Started
Security and Compliance Concerns Will Rise
Paying Maintainers: More Cash, Creativity Needed
Kyberturvallisuuden ja tekoälyn tärkeimmät trendit 2025
https://www.uusiteknologia.fi/2024/11/20/kyberturvallisuuden-ja-tekoalyn-tarkeimmat-trendit-2025/
1. Cyber infrastructure will be centered on a single, unified security platform
2. Big data will give an edge against new entrants
3. AI’s integrated role in 2025 means building trust, governance engagement, and a new kind of leadership
4. Businesses will adopt secure enterprise browsers more widely
5. AI’s energy implications will be more widely recognized in 2025
6. Quantum realities will become clearer in 2025
7. Security and marketing leaders will work more closely together
Presentation: For 2025, ‘AI eats the world’.
https://www.ben-evans.com/presentations
Just like other technologies that have gone before, such as cloud and cybersecurity automation, right now AI lacks maturity.
https://www.securityweek.com/ai-implementing-the-right-technology-for-the-right-use-case/
If 2023 and 2024 were the years of exploration, hype and excitement around AI, 2025 (and 2026) will be the year(s) that organizations start to focus on specific use cases for the most productive implementations of AI and, more importantly, to understand how to implement guardrails and governance so that it is viewed as less of a risk by security teams and more of a benefit to the organization.
Businesses are developing applications that add Large Language Model (LLM) capabilities to provide superior functionality and advanced personalization
Employees are using third party GenAI tools for research and productivity purposes
Developers are leveraging AI-powered code assistants to code faster and meet challenging production deadlines
Companies are building their own LLMs for internal use cases and commercial purposes.
AI is still maturing
However, just like other technologies that have gone before, such as cloud and cybersecurity automation, right now AI lacks maturity. Right now, we very much see AI in this “peak of inflated expectations” phase and predict that it will dip into the “trough of disillusionment”, where organizations realize that it is not the silver bullet they thought it would be. In fact, there are already signs of cynicism as decision-makers are bombarded with marketing messages from vendors and struggle to discern what is a genuine use case and what is not relevant for their organization.
There is also regulation that will come into force, such as the EU AI Act, which is a comprehensive legal framework that sets out rules for the development and use of AI.
AI certainly won’t solve every problem, and it should be used like automation, as part of a collaborative mix of people, process and technology. You simply can’t replace human intuition with AI, and many new AI regulations stipulate that human oversight is maintained.
7 Splunk Predictions for 2025
https://www.splunk.com/en_us/form/future-predictions.html
AI: Projects must prove their worth to anxious boards or risk defunding, and LLMs will go small to reduce operating costs and environmental impact.
OpenAI, Google and Anthropic Are Struggling to Build More Advanced AI
Three of the leading artificial intelligence companies are seeing diminishing returns from their costly efforts to develop newer models.
https://www.bloomberg.com/news/articles/2024-11-13/openai-google-and-anthropic-are-struggling-to-build-more-advanced-ai
Sources: OpenAI, Google, and Anthropic are all seeing diminishing returns from costly efforts to build new AI models; a new Gemini model misses internal targets
It Costs So Much to Run ChatGPT That OpenAI Is Losing Money on $200 ChatGPT Pro Subscriptions
https://futurism.com/the-byte/openai-chatgpt-pro-subscription-losing-money?fbclid=IwY2xjawH8epVleHRuA2FlbQIxMQABHeggEpKe8ZQfjtPRC0f2pOI7A3z9LFtFon8lVG2VAbj178dkxSQbX_2CJQ_aem_N_ll3ETcuQ4OTRrShHqNGg
In a post on X-formerly-Twitter, CEO Sam Altman admitted an “insane” fact: that the company is “currently losing money” on ChatGPT Pro subscriptions, which run $200 per month and give users access to its suite of products including its o1 “reasoning” model.
“People use it much more than we expected,” the cofounder wrote, later adding in response to another user that he “personally chose the price and thought we would make some money.”
Though Altman didn’t explicitly say why OpenAI is losing money on these premium subscriptions, the issue almost certainly comes down to the enormous expense of running AI infrastructure: the massive and increasing amounts of electricity needed to power the facilities that power AI, not to mention the cost of building and maintaining those data centers. Nowadays, a single query on the company’s most advanced models can cost a staggering $1,000.
Tekoäly edellyttää yhä nopeampia verkkoja
https://etn.fi/index.php/opinion/16974-tekoaely-edellyttaeae-yhae-nopeampia-verkkoja
A resilient digital infrastructure is critical to effectively harnessing telecommunications networks for AI innovations and cloud-based services. The increasing demand for data-rich applications related to AI requires a telecommunications network that can handle large amounts of data with low latency, writes Carl Hansson, Partner Solutions Manager at Orange Business.
AI’s Slowdown Is Everyone Else’s Opportunity
Businesses will benefit from some much-needed breathing space to figure out how to deliver that all-important return on investment.
https://www.bloomberg.com/opinion/articles/2024-11-20/ai-slowdown-is-everyone-else-s-opportunity
Näin sirumarkkinoilla käy ensi vuonna
https://etn.fi/index.php/13-news/16984-naein-sirumarkkinoilla-kaey-ensi-vuonna
The growing demand for high-performance computing (HPC) for artificial intelligence and HPC computing continues to be strong, with the market set to grow by more than 15 percent in 2025, IDC estimates in its recent Worldwide Semiconductor Technology Supply Chain Intelligence report.
IDC predicts eight significant trends for the chip market by 2025.
1. AI growth accelerates
2. Asia-Pacific IC Design Heats Up
3. TSMC’s leadership position is strengthening
4. The expansion of advanced processes is accelerating.
5. Mature process market recovers
6. 2nm Technology Breakthrough
7. Restructuring the Packaging and Testing Market
8. Advanced packaging technologies on the rise
2024: The year when MCUs became AI-enabled
https://www-edn-com.translate.goog/2024-the-year-when-mcus-became-ai-enabled/?fbclid=IwZXh0bgNhZW0CMTEAAR1_fEakArfPtgGZfjd-NiPd_MLBiuHyp9qfiszczOENPGPg38wzl9KOLrQ_aem_rLmf2vF2kjDIFGWzRVZWKw&_x_tr_sl=en&_x_tr_tl=fi&_x_tr_hl=fi&_x_tr_pto=wapp
The AI party in the MCU space started in 2024, and in 2025, it is very likely that there will be more advancements in MCUs using lightweight AI models.
Adoption of AI acceleration features is a big step in the development of microcontrollers. The inclusion of AI features in microcontrollers started in 2024, and it is very likely that in 2025, their features and tools will develop further.
Just like other technologies that have gone before, such as cloud and cybersecurity automation, right now AI lacks maturity.
https://www.securityweek.com/ai-implementing-the-right-technology-for-the-right-use-case/
If 2023 and 2024 were the years of exploration, hype and excitement around AI, 2025 (and 2026) will be the year(s) that organizations start to focus on specific use cases for the most productive implementations of AI and, more importantly, to understand how to implement guardrails and governance so that it is viewed as less of a risk by security teams and more of a benefit to the organization.
Businesses are developing applications that add Large Language Model (LLM) capabilities to provide superior functionality and advanced personalization
Employees are using third party GenAI tools for research and productivity purposes
Developers are leveraging AI-powered code assistants to code faster and meet challenging production deadlines
Companies are building their own LLMs for internal use cases and commercial purposes.
AI is still maturing
AI Regulation Gets Serious in 2025 – Is Your Organization Ready?
While the challenges are significant, organizations have an opportunity to build scalable AI governance frameworks that ensure compliance while enabling responsible AI innovation.
https://www.securityweek.com/ai-regulation-gets-serious-in-2025-is-your-organization-ready/
Similar to the GDPR, the EU AI Act will take a phased approach to implementation. The first milestone arrives on February 2, 2025, when organizations operating in the EU must ensure that employees involved in AI use, deployment, or oversight possess adequate AI literacy. Thereafter from August 1 any new AI models based on GPAI standards must be fully compliant with the act. Also similar to GDPR is the threat of huge fines for non-compliance – EUR 35 million or 7 percent of worldwide annual turnover, whichever is higher.
While this requirement may appear manageable on the surface, many organizations are still in the early stages of defining and formalizing their AI usage policies.
Later phases of the EU AI Act, expected in late 2025 and into 2026, will introduce stricter requirements around prohibited and high-risk AI applications. For organizations, this will surface a significant governance challenge: maintaining visibility and control over AI assets.
Tracking the usage of standalone generative AI tools, such as ChatGPT or Claude, is relatively straightforward. However, the challenge intensifies when dealing with SaaS platforms that integrate AI functionalities on the backend. Analysts, including Gartner, refer to this as “embedded AI,” and its proliferation makes maintaining accurate AI asset inventories increasingly complex.
Where frameworks like the EU AI Act grow more complex is their focus on ‘high-risk’ use cases. Compliance will require organizations to move beyond merely identifying AI tools in use; they must also assess how these tools are used, what data is being shared, and what tasks the AI is performing. For instance, an employee using a generative AI tool to summarize sensitive internal documents introduces very different risks than someone using the same tool to draft marketing content.
For security and compliance leaders, the EU AI Act represents just one piece of a broader AI governance puzzle that will dominate 2025.
The next 12-18 months will require sustained focus and collaboration across security, compliance, and technology teams to stay ahead of these developments.
The Global Partnership on Artificial Intelligence (GPAI) is a multi-stakeholder initiative which aims to bridge the gap between theory and practice on AI by supporting cutting-edge research and applied activities on AI-related priorities.
https://gpai.ai/about/#:~:text=The%20Global%20Partnership%20on%20Artificial,activities%20on%20AI%2Drelated%20priorities.
297 Comments
Tomi Engdahl says:
https://hackernoon.com/a-basic-ai-prompt-helped-me-learn-rust
Tomi Engdahl says:
OpenAI Targets AGI with System That Thinks Like a Pro Engineer
https://www.theinformation.com/articles/openai-targets-agi-with-system-that-thinks-like-a-pro-engineer
Tomi Engdahl says:
Tutkijat löysivät syyn: tämän takia neuroverkot oppivat niin tehokkaasti
https://etn.fi/index.php/13-news/17069-tutkijat-loeysivaet-syyn-taemaen-takia-neuroverkot-oppivat-niin-tehokkaasti
Oxfordin yliopiston tuore tutkimus paljastaa, miksi DNN-neuroverkot (deep neural networks), jotka muodostavat modernin tekoälyn perustan, ovat niin tehokkaita oppimaan dataa. Tutkimuksessa havaittiin, että neuroverkoilla on sisäänrakennettu “Occamin partaveitsi” -periaate.
Tämä tarkoittaa, että neuroverkot suosivat yksinkertaisimpia ratkaisuja, kun niille esitetään useita vaihtoehtoja, jotka sopivat opetusaineistoon. Erityistä tässä periaatteessa on, että se kompensoi tarkasti monimutkaisten ratkaisujen määrän eksponentiaalisen kasvun. Tutkimus on julkaistu Nature Communications -lehdessä.
Neuroverkot pystyvät tekemään tarkkoja ennusteita uusista, aiemmin näkemättömistä tiedoista, vaikka niiden parametreja on jopa miljoonia tai miljardeja enemmän kuin opetusaineiston datapisteitä. Tutkijat olettivat, että tämä edellyttää jonkinlaista sisäänrakennettua ohjausta, joka auttaa neuroverkkoja valitsemaan oikeat mallit, joihin keskittyä.
- Olimme jo aiemmin tietoisia siitä, että neuroverkkojen tehokkuus perustuu yksinkertaisuuteen painottuvaan induktiiviseen harhaan, eli eräänlaiseen Ockhamin partaveitseen. Mutta sen tarkkaa luonnetta ei ollut vielä ymmärretty, kertoi tutkimusta johtanut teoreettinen fyysikko, professori Ard Louis Oxfordin yliopiston fysiikan laitokselta.
sisäänrakennettu yksinkertaisuuden painotus auttaa neuroverkkoja löytämään säännöt, jotka yleistyvät hyvin eli tuottavat tarkkoja ennusteita sekä opetusaineistosta että näkemättömästä datasta.
Lisäksi tutkijat havaitsivat, että tämä Occamin partaveitsi -periaate kompensoi ainutlaatuisella tavalla monimutkaisten funktioiden määrän eksponentiaalista kasvua järjestelmän koon kasvaessa. Näin neuroverkot välttävät monimutkaiset funktiot, jotka sopivat hyvin opetusaineistoon mutta epäonnistuvat uusien tietojen kanssa.
Neuroverkot eivät sovi kaikkeen
Neuroverkot toimivat hyvin, kun data noudattaa yksinkertaisia kuvioita. Monimutkaisempien ja vähemmän järjestäytyneiden aineistojen kanssa niiden suorituskyky kuitenkin heikkenee, ja ne voivat joskus olla vain sattumanvaraisen arvauksen tasolla. Onneksi todellisessa maailmassa data on usein melko yksinkertaista ja rakenteellista, mikä sopii neuroverkkojen yksinkertaisuuteen painottuvalle oppimisperiaatteelle. Tämä auttaa niitä myös välttämään ylisovittamista eli liian tarkkaa mukautumista opetusaineistoon.
Tomi Engdahl says:
Anthony Ha / TechCrunch:
DeepSeek’s iOS app is now #1 on the “Top Free Apps” chart in Apple’s App Store in the US, just ahead of ChatGPT — Since Chinese AI company DeepSeek released an open version of its reasoning model R1 at the beginning of this week, many in the tech industry have been making grand pronouncements …
DeepSeek gets Silicon Valley talking
Since Chinese AI company DeepSeek released an open version of its reasoning model R1 at the beginning of this week, many in the tech industry have been making grand pronouncements about what the company achieved, and what it means for the state of AI.
Venture capitalist Marc Andreessen, for example, posted that DeepSeek is “one of the most amazing and impressive breakthroughs I’ve ever seen.”
R1 seemingly matches or beats OpenAI’s o1 model on certain AI benchmarks. And the company claims one of its models only cost $5.6 million to train, compared to the hundreds of millions of dollars that leading American companies pay to train theirs.
It also seems to have achieved that in the face of U.S. sanctions that prohibit the sale of advanced chips to Chinese companies. The MIT Technology Review writes that the company’s success illustrates how sanctions are “driving startups like DeepSeek to innovate in ways that prioritize efficiency, resource-pooling, and collaboration.” (On the other hand, the Wall Street Journal reports that DeepSeek’s Liang Wenfeng recently told China’s premier that American export restrictions still pose a bottleneck.)
Curai CEO Neal Khosla offered a simpler explanation, claiming that the company is a “ccp state psyop” that’s “faking the cost was low to justify setting price low and hoping everyone switches to it [to] damage AI competitiveness in the us.” (A Community Note has been attached to his post pointing out that Khosla offers no evidence for this, and that his father Vinod is an OpenAI investor.)
https://techcrunch.com/2025/01/26/deepseek-gets-silicon-valley-talking/
Tomi Engdahl says:
Jeffrey Emanuel / YouTubeTranscriptOptimizer:
A bear case for Nvidia: competition from hardware startups, inference-heavy “reasoning” models, DeepSeek’s training and inference efficiency breakthroughs, more — As someone who spent ~10 years working as a generalist investment analyst at various long/short hedge funds …
https://youtubetranscriptoptimizer.com/blog/05_the_short_case_for_nvda
Tomi Engdahl says:
Steven Sinofsky / @stevesi:
DeepSeek’s use of commodity, disconnected hardware and open-source design is enough of a shot at AI hyper scaling that it could be “the way things will go” — DeepSeek was certain to happen. The only unknown was who was going to do it. The choices were a startup or someone outside …
DeepSeek Has Been Inevitable and Here’s Why (History tells us)
https://x.com/stevesi/status/1883746880536072375?mx=2
DeepSeek was certain to happen. The only unknown was who was going to do it. The choices were a startup or someone outside the current center of US AI leadership and innovation
TL;DR for this article: DeepSeek was certain to happen. The only unknown was who was going to do it. The choices were a startup or someone outside the current center of leadership and innovation in AI, which is mostly in the US clustered around trillion dollar companies. It turned out to be a group in China, which for many (me too) is unfortunate. But again, it absolutely was going to happen. The next question is will the US makers see this with clarity.
There’s more in The Short Case for Nvidia Stock
which is very good but focuses on picking stocks, which isn’t my thing. Strategy and execution are more me so here’s that perspective.
The current trajectory of AI if you read the news in the US is one of MASSIVE CapEx piled on top of even more MASSIVE CapEx. It is a race between Google, Meta, OpenAI/Microsoft, xAI, and to a lesser extent a few other super well-funded startups like Perplexity and Anthropic. All of these together are taking the same approach which I will call “scale up”. Scale up is what you do when you have access to vast resources as all of these companies do.
The history of computing is one of innovation followed by scale up which is that broken by a model that “scales out”—when a bigger and faster approach is replaced by a smaller and more numerous approach. Mainframe->Mini->Micro->Mobile, Big iron->Distributed computing->Internet, Cray->HPC->Intel/CISC->ARM/RISC, OS/360->VMS->Unix->Windows NT->Linux, and on and on. You can see this at these macro levels or you can see it at the micro level when it comes to subsystems from networking to storage to memory.
The past 5 years of AI have been bigger models, more data, more compute, and so on. Why? Because, I would argue the innovation was driven by the cloud hyperscale companies and they were destined to take the approach of doing more of what they already did. They viewed data for training and huge models as their way of winning and their unique architectural approach. The fact that other startups took a similar approach is just Silicon Valley at work—the people move and optimize for different things at a micro scale without considering the larger picture. They look to do what they couldn’t do at their previous efforts or what the previous efforts might have been overlooking.
Tomi Engdahl says:
Caiwei Chen / MIT Technology Review:
Rather than weakening China’s AI capabilities, US sanctions appear to be driving startups like DeepSeek to innovate by prioritizing efficiency and collaboration — The AI community is abuzz over DeepSeek R1, a new open-source reasoning model. — The model was developed by the Chinese AI startup DeepSeek …
https://www.technologyreview.com/2025/01/24/1110526/china-deepseek-top-ai-despite-sanctions/
Tomi Engdahl says:
Zeyi Yang / Wired:
DeepSeek, which started as a deep-learning research branch of Chinese quant hedge fund High-Flyer, is now giving US AI giants a run for their money
How Chinese AI Startup DeepSeek Made a Model that Rivals OpenAI
When Chinese quant hedge fund founder Liang Wenfeng went into AI research, he took 10,000 Nvidia chips and assembled a team of young, ambitious talent. Two years later, DeepSeek exploded on the scene.
https://www.wired.com/story/deepseek-china-model-ai/
Tomi Engdahl says:
Bloomberg:
Some Japanese stocks drop amid DeepSeek anxiety; Advantest, an Nvidia supplier of testing equipment, drops 7%+; datacenter cable company Fujikura drops 9%+
Japan’s Chip Shares Sink as DeepSeek Triggers Competition Fear
https://www.bloomberg.com/news/articles/2025-01-27/japan-s-chip-shares-sink-as-deepseek-triggers-competition-fear
Tomi Engdahl says:
Hall of Impossible Dreams:
A look at the growing prevalence of LLM-written posts that have been backdated and attributed to human authors on Physics Forums, which was founded in 2001 — What Internet Will Look Like in the Future — cripes does anybody remember Google People — Does anybody remember PhysicsForums?
What Internet Will Look Like in the Future
https://hallofdreams.org/posts/physicsforums/
Does anybody remember PhysicsForums?
It was never exactly the center of the Internet, but back when it was founded in 2001, the Internet didn’t really have a center the way it does today. PhysicsForums was one forum among thousands, founded by an enthusiastic teenager named Greg Bernhardt, existing in the ‘hard science’ niche alongside the likes of Bad Astronomy, mostly focused on giving hints for physics homework to struggling students without outright doing the physics homework. It had fairly steady growth until 2012, before petering out throughout the 2010s and 2020s in lieu of more centralized sites like StackExchange, and by 2025, only a small community was left. But, unlike so many other fora from back in the early days, it went from 2003 to 2025 without ever changing its URLs, erasing its old posts, or going down altogether. Thanks to this consistency, PhysicsForums remains quite valuable as a time capsule, and can give us a glimpse at how people thought and what they said two decades ago.
There’s also a social contract: when we create an account in an online community, we do it with the expectation that people we are going to interact with are primarily people. Oh, there will be shills, and bots, and advertisers, but the agreement between the users and the community provider is that they are going to try to defend us from that, and that in exchange we will provide our engagement and content. This is why the recent experiments from Meta with AI generated users are both ridiculous and sickening. When you might be interacting with something masquerading as a human, providing at best, tepid garbage, the value of human interaction via the internet is lost.
Beyond that, the idea of populating existing accounts with LLM-generated content is destructive. Like paving over an arboretum to make room for a generic strip mall. Internet archaeology is already a difficult and fraught business. It’s so difficult to find lost content, servers that have gone down, websites that are just gone… and now, apparently a lot of backdated data that is AI generated. This is not to say websites shouldn’t evolve and stay current, but this is different, this is a re-writing of history, and rewrites history for no clear gain.
It probably feels odd to see us write thousands of words fighting for the integrity of a community neither of us is part of, a tiny speck on the Internet trying desperately to survive, an enclave of a different era that is trying to hold on at all costs. But we are sympathetic. Running a website, especially a forum, is expensive. Server costs go up. Databases stop working and now you need to pay an expert or spend hours of unpaid time working on it. Bots flood in. DDOS attacks happen. Another wave of crypto-scams shows up. Staying alive on the internet costs money, and money comes through users and ads. You need those clicks like a man in the desert needs water, and every week it gets more competitive.
One must transform to survive. That axiom is a truth on the internet. If you don’t, you rapidly find yourself buried on the eighth page of Google results, with no users and no money to keep the servers up. But when communities compromise their morals and the core of their identity to stay afloat, and destroy the very bedrock of their commitment to their users and to some degree to the broader idea of the Internet, we have to wonder… was it worth it?
Tomi Engdahl says:
John Gruber / Daring Fireball:
Siri with Apple Intelligence is a massive regression from the old Siri, which recognized its limitations and provided a list of search links to answer a query
Siri Is Super Dumb and Getting Dumber
https://daringfireball.net/2025/01/siri_is_super_dumb_and_getting_dumber
Writing about the current state of Apple Intelligence yesterday, I mentioned how utterly stupid and laughably wrong Siri is when asked the simple question, “Who won Super Bowl 13?”, and mentioned that that particular example came from a friend. That friend was Paul Kafasis, and he took it and pursued it thoroughly, asking Siri “Who won Super Bowl __?” for every number from 1 through 60.
Other answer engines handle the same questions with aplomb. I haven’t run a comprehensive test from Super Bowls 1 through 60 because I’m lazy, but a spot-check of a few random numbers in that range indicates that every other ask-a-question-get-an-answer agent I personally use gets them all correct. I tried ChatGPT, Kagi, DuckDuckGo, and Google. Those four all even fare well on the arguably trick questions regarding the winners of Super Bowls 59 and 60, which haven’t yet been played. E.g., asked the winner of Super Bowl 59, Kagi’s “Quick Answer”1 starts: “Super Bowl 59 is scheduled to take place on February 9, 2025. As of now, the game has not yet occurred, so there is no winner to report.”
Old Siri — which is to say pre-Apple-Intelligence Siri — does OK on this same question. On my Mac running MacOS 15.1.1, where ChatGPT integration is not yet available, Siri declined to answer the question itself and provided a list of links, search-engine-style, and the top link was to this two-page PDF listing the complete history of North Dakota’s Class A boys’ and girls’ champions, but only through 2019. Not great, but good enough.
New Siri — powered by Apple Intelligence™ with ChatGPT integration enabled — gets the answer completely but plausibly wrong, which is the worst way to get it wrong. It’s also inconsistently wrong — I tried the same question four times, and got a different answer, all of them wrong, each time. It’s a complete failure.
https://daringfireball.net/linked/2025/01/22/ios-18-3-macos-153-apple-intelligence-default-onboarding
Tomi Engdahl says:
Kenrick Cai / Reuters:
A look at Google’s plans to shape public perception and policies on AI, including building out educational programs, ahead of a global wave of AI regulation
Google pushes global agenda to educate workers, lawmakers on AI
https://www.reuters.com/technology/artificial-intelligence/google-pushes-global-agenda-educate-workers-lawmakers-ai-2025-01-25/
Tomi Engdahl says:
Bloomberg:
DeepSeek’s iOS app is now #1 on the App Store’s Top Free Apps chart in the US, ahead of ChatGPT, stirring doubts in Silicon Valley about the US’ AI lead — – App’s lower-cost model upends premise for AI spending boom — Stocks of chip gear makers ASML and Advantest plunge
https://www.bloomberg.com/news/articles/2025-01-27/china-s-deepseek-tops-iphone-downloads-and-drives-asia-stocks
Tomi Engdahl says:
Matt Marshall / VentureBeat:
How DeepSeek outpaced OpenAI at 3% of the cost: open-source approach, pure reinforcement learning, not supervised fine-tuning, and building on DeepSeek-R1-Zero — DeepSeek R1′s Monday release has sent shockwaves through the AI community, disrupting assumptions about what’s required to achieve cutting-edge AI performance.
DeepSeek R1’s bold bet on reinforcement learning: How it outpaced OpenAI at 3% of the cost
https://venturebeat.com/ai/deepseek-r1s-bold-bet-on-reinforcement-learning-how-it-outpaced-openai-at-3-of-the-cost/
DeepSeek R1’s Monday release has sent shockwaves through the AI community, disrupting assumptions about what’s required to achieve cutting-edge AI performance. Matching OpenAI’s o1 at just 3%-5% of the cost, this open-source model has not only captivated developers but also challenges enterprises to rethink their AI strategies.
The model has rocketed to the top-trending model being downloaded on HuggingFace (109,000 times, as of this writing) – as developers rush to try it out and seek to understand what it means for their AI development. Users are commenting that DeepSeek’s accompanying search feature (which you can find at DeepSeek’s site) is now superior to competitors like OpenAI and Perplexity, and is only rivaled by Google’s Gemini Deep Research.
The implications for enterprise AI strategies are profound: With reduced costs and open access, enterprises now have an alternative to costly proprietary models like OpenAI’s. DeepSeek’s release could democratize access to cutting-edge AI capabilities, enabling smaller organizations to compete effectively in the AI arms race.
DeepSeek’s breakthrough: Moving to pure reinforcement learning
In November, DeepSeek made headlines with its announcement that it had achieved performance surpassing OpenAI’s o1, but at the time it only offered a limited R1-lite-preview model. With Monday’s full release of R1 and the accompanying technical paper, the company revealed a surprising innovation: a deliberate departure from the conventional supervised fine-tuning (SFT) process widely used in training large language models (LLMs).
SFT, a standard step in AI development, involves training models on curated datasets to teach step-by-step reasoning, often referred to as chain-of-thought (CoT). It is considered essential for improving reasoning capabilities. However, DeepSeek challenged this assumption by skipping SFT entirely, opting instead to rely on reinforcement learning (RL) to train the model.
This bold move forced DeepSeek-R1 to develop independent reasoning abilities, avoiding the brittleness often introduced by prescriptive datasets. While some flaws emerge – leading the team to reintroduce a limited amount of SFT during the final stages of building the model – the results confirmed the fundamental breakthrough: reinforcement learning alone could drive substantial performance gains.
The company got much of the way using open source – a conventional and unsurprising way
To train its models, High-Flyer Quant secured over 10,000 Nvidia GPUs before U.S. export restrictions, and reportedly expanded to 50,000 GPUs through alternative supply routes, despite trade barriers. This pales compared to leading AI labs like OpenAI, Google, and Anthropic, which operate with more than 500,000 GPUs each.
DeepSeek’s ability to achieve competitive results with limited resources highlights how ingenuity and resourcefulness can challenge the high-cost paradigm of training state-of-the-art LLMs.
Despite speculation, DeepSeek’s full budget is unknown
DeepSeek reportedly trained its base model — called V3 — on a $5.58 million budget over two months, according to Nvidia engineer Jim Fan. While the company hasn’t divulged the exact training data it used (side note: critics say this means DeepSeek isn’t truly open-source), modern techniques make training on web and open datasets increasingly accessible. Estimating the total cost of training DeepSeek-R1 is challenging. While running 50,000 GPUs suggests significant expenditures (potentially hundreds of millions of dollars), precise figures remain speculative.
What’s clear, though, is that DeepSeek has been very innovative from the get-go. Last year, reports emerged about some initial innovations it was making, around things like Mixture of Experts and Multi-Head Latent Attention.
[Update: Here is a very detailed report just published about DeepSeek’s various infrastructure innovations by Jeffrey Emanuel, a former quant investor and now entrepreneur. It’s long but very good. See the “Theoretical Threat” section about three other innovations worth mentioning: (1) mixed-precision training, which allowed DeepSeek to use 8-bit floating numbers throughout the training, instead of 32-bit — allowing DeepSeek to dramatically reduce memory requirements per GPU, translating into needing fewer GPUs; (2) multi-token predicting during inference; and (3) advances in GPU communication efficiency through their DualPipe algorithm, resulting in higher GPU utilization.]
The journey to DeepSeek-R1’s final iteration began with an intermediate model, DeepSeek-R1-Zero, which was trained using pure reinforcement learning. By relying solely on RL, DeepSeek incentivized this model to think independently, rewarding both correct answers and the logical processes used to arrive at them.
This approach led to an unexpected phenomenon: The model began allocating additional processing time to more complex problems, demonstrating an ability to prioritize tasks based on their difficulty. DeepSeek’s researchers described this as an “aha moment,” where the model itself identified and articulated novel solutions to challenging problems (see screenshot below). This milestone underscored the power of reinforcement learning to unlock advanced reasoning capabilities without relying on traditional training methods like SFT.
The researchers conclude: “It underscores the power and beauty of reinforcement learning: rather than explicitly teaching the model on how to solve a problem, we simply provide it with the right incentives, and it autonomously develops advanced problem-solving strategies.”
More than RL
However, it’s true that the model needed more than just RL. The paper goes on to talk about how despite the RL creating unexpected and powerful reasoning behaviors, this intermediate model DeepSeek-R1-Zero did face some challenges, including poor readability, and language mixing (starting in Chinese and switching over to English, for example). So only then did the team decide to create a new model, which would become the final DeepSeek-R1 model. This model, again based on the V3 base model, was first injected with limited SFT – focused on a “small amount of long CoT data” or what was called cold-start data, to fix some of the challenges. After that, it was put through the same reinforcement learning process of R1-Zero. The paper then talks about how R1 went through some final rounds of fine-tuning.
The ramifications
One question is why there has been so much surprise by the release. It’s not like open source models are new. Open Source models have a huge logic and momentum behind them. Their free cost and malleability is why we reported recently that these models are going to win in the enterprise.
Meta’s open-weights model Llama 3, for example, exploded in popularity last year, as it was fine-tuned by developers wanting their own custom models. Similarly, now DeepSeek-R1 is already being used to distill its reasoning into an array of other, much smaller models – the difference being that DeepSeek offers industry-leading performance. This includes running tiny versions of the model on mobile phones, for example.
DeepSeek-R1 not only performs better than the leading open source alternative, Llama 3. It shows its entire chain of thought of its answers transparently. Meta’s Llama hasn’t been instructed to do this as a default; it takes aggressive prompting of Llama to do this.
The transparency has also provided a PR black-eye to OpenAI, which has so far hidden its chains of thought from users, citing competitive reasons and not to confuse users when a model gets something wrong. Transparency allows developers to pinpoint and address errors in a model’s reasoning, streamlining customizations to meet enterprise requirements more effectively.
For enterprise decision-makers, DeepSeek’s success underscores a broader shift in the AI landscape: leaner, more efficient development practices are increasingly viable.
To be sure, no massive lead
While DeepSeek’s innovation is groundbreaking, by no means has it established a commanding market lead. Because it published its research, other model companies will learn from it, and adapt. Meta and Mistral, the French open source model company, may be a beat behind, but it will probably only be a few months before they catch up. As Meta’s lead researcher Yann Lecun put it: “The idea is that everyone profits from everyone else’s ideas. No one ‘outpaces’ anyone and no country ‘loses’ to another. No one has a monopoly on good ideas. Everyone’s learning from everyone else.” So it’s execution that matters.
Ultimately, it’s the consumers, startups and other users who will win the most, because DeepSeek’s offerings will continue to drive the price of using these models near zero (again aside from cost of running models at inference). This rapid commoditization could pose challenges – indeed, massive pain – for leading AI providers that have invested heavily in proprietary infrastructure. As many commentators have put it, including Chamath Palihapitiya, an investor and former executive at Meta, this could mean that years of OpEx and CapEx by OpenAI and others will be wasted.
There is substantial commentary about whether it is ethical to use the DeepSeek-R1 model because of the biases instilled in it by Chinese laws, for example that it shouldn’t answer questions about the Chinese government’s brutal crackdown at Tiananmen Square.
Moreover, they point to different, but analogous biases that are held by models from OpenAI and other companies. Meta’s Llama has emerged as a popular open model despite its data sets not being made public, and despite hidden biases, and lawsuits being filed against it as a result.
Questions abound around the ROI of big investments by OpenAI
This all raises big questions about the investment plans pursued by OpenAI, Microsoft and others. OpenAI’s $500 billion Stargate project reflects its commitment to building massive data centers to power its advanced models. Backed by partners like Oracle and Softbank, this strategy is premised on the belief that achieving artificial general intelligence (AGI) requires unprecedented compute resources. However, DeepSeek’s demonstration of a high-performing model at a fraction of the cost challenges the sustainability of this approach, raising doubts about OpenAI’s ability to deliver returns on such a monumental investment.
Entrepreneur and commentator Arnaud Bertrand captured this dynamic, contrasting China’s frugal, decentralized innovation with the U.S. reliance on centralized, resource-intensive infrastructure: “It’s about the world realizing that China has caught up — and in some areas overtaken — the U.S. in tech and innovation, despite efforts to prevent just that.” Indeed, yesterday another Chinese company, ByteDance announced Doubao-1.5-pro, which Includes a “Deep Thinking” mode that surpasses OpenAI’s o1 on the AIME benchmark.
Want to dive deeper into how DeepSeek-R1 is reshaping AI development? Check out our in-depth discussion on YouTube, where I explore this breakthrough with ML developer Sam Witteveen. Together, we break down the technical details, implications for enterprises, and what this means for the future of AI:
Deepseek R1: How China’s open source AI model beats OpenAI at 3% of the cost
https://www.youtube.com/watch?v=bJzj5lTiqe0
Tomi Engdahl says:
To get started with DeepSeek R1, you’ll need a computer with the right hardware specifications. The hardware requirements depend on the size of the model you want to run: 7B model: Requires at least 8GB of RAM. 13B model: Needs 16GB of RAM.1
DeepSeek R1 AI Without Cloud Costs: How To Install And Run On Your PC
DeepSeek R1 is an open-source AI model offering advanced capabilities at low cost. Here’s a guide covering installation, hardware needs, and customisation options.
https://www.timesnownews.com/technology-science/deepseek-r1-ai-withoDeepSeek R1 AI Without Cloud Costs: How To Install And Run On Your PCDeepSeek R1 is an open-source AI model offering advanced capabilities at low cost. Here’s a guide covering installation, hardware needs, and customisation options.ut-cloud-costs-how-to-install-and-run-on-your-pc-article-117602023
Hardware requirements?
https://huggingface.co/deepseek-ai/DeepSeek-R1/discussions/19
I am running the Q8_0 GGUF with llama.cpp on my 256 GB workstation. It does not actually require this much RAM since it is an MoE model, if you keep the context window modest. The KV cache consumes more RAM than the model itself. For example with context length of 32092 tokens it takes around 220 GB RAM.
So, you able to run original R1 671B model with Q8_0 GGUF in 256 GB VRAM?if yes, which GPUs config you are using ?
Running it at about 5-8 t/s on a dual EPYC CPU with 24 x 16GB of DDR5 RAM (384GB). Running the IQ4_XS version with llama.cpp. No GPU.
Total system cost a bit over $4000, CPU bought on eBay engineering samples.
Tomi Engdahl says:
https://venturebeat.com/ai/deepseek-r1s-bold-bet-on-reinforcement-learning-how-it-outpaced-openai-at-3-of-the-cost/
Tomi Engdahl says:
Financial Times:
US tech stocks fell sharply pre-market over DeepSeek concerns, with NVDA down 8%+ and META and MSFT down 3%; ASML fell 9%+; Japanese chip companies also dropped — Start-up’s model raises questions about need for huge western hardware investment — Tech stocks fell sharply on Monday …
Tech stocks tumble as China’s DeepSeek sows doubts about AI spending
Start-up’s model raises questions about need for huge western hardware investment
https://www.ft.com/content/e670a4ea-05ad-4419-b72a-7727e8a6d471?accessToken=zwAGLK3ztQyIkdPmcKTqBa1EGdO3Kncn6KbUcQ.MEUCIDagCvJsEsR78pR_a_Pw82wONrDwyqTv0dMavNACVszcAiEA8mTL7nCfkLmWxUmQgzK75KwVqmsDRq87PdslKhsmKDk&sharetype=gift&token=ec2fdcf8-5c05-443a-a961-22e60072b64e
Tomi Engdahl says:
The Information:
Sources: Meta set up four war rooms to analyze DeepSeek’s tech, two focusing on how High-Flyer cut training costs and one on what data High-Flyer may have used — Artificial intelligence researchers at Meta Platforms have been in panic mode. In recent days, leaders of some of the company’s …
Meta Scrambles After Chinese AI Equals Its Own, Upending Silicon Valley
https://www.theinformation.com/articles/meta-scrambles-after-chinese-ai-equals-its-own-upending-silicon-valley
Tomi Engdahl says:
Stephen Morris / Financial Times:NEW
At Davos, AI leaders clashed over safety concerns and the $100B Stargate project; Demis Hassabis and Anthropic co-founder Dario Amodei reiterated stark warnings
https://www.ft.com/content/174c2759-c5b8-42ed-adc2-8d5f659f5982
Tomi Engdahl says:
Nvidia syöksyy ja Yhdysvaltoihin povataan rajua laskuavausta – Deepseek herätti huolen puolijohdejättien kilpailukyvystä
Puolijohdealan toimijoiden kursseja koetellaan Euroopassa. Tämä tulee hyvin todennäköisesti näkymään myös Yhdysvalloissa, jossa futuurit povaavat keskeisille pörssi-indekseille rajuja laskuja.
https://www.kauppalehti.fi/uutiset/kl/d2d923a6-10fb-4063-8133-1b5f6a2079c6?fbclid=IwZXh0bgNhZW0CMTEAAR3vplzUqP4qlBLvgq25Zm7ofDwbgBwwmFDLBsMmYxoQE29bnZTuvRiMUXU_aem_zKWTmT6KTswXKckb2rE3kQ
Tomi Engdahl says:
Sijoittajat ovat alkaneet hermoilla kiinalaisen startup-yhtiö Deepseekin viime viikolla julkistetusta tekoälysovelluksesta, joka vaikuttaa olevan kilpailukykyinen OpenAI:n ja Metan sovellusten kanssa.
”(Sovelluksesta) heräsi heti huoli, että se voi mullistaa tekoälyliiketoiminnan bisnesmallin,” pankki Jefferiesin analyytikko kirjoitti asiakaskirjeessä, uutistoimisto Bloomberg kertoo.
https://www.kauppalehti.fi/uutiset/kl/d2d923a6-10fb-4063-8133-1b5f6a2079c6?fbclid=IwZXh0bgNhZW0CMTEAAR3vplzUqP4qlBLvgq25Zm7ofDwbgBwwmFDLBsMmYxoQE29bnZTuvRiMUXU_aem_zKWTmT6KTswXKckb2rE3kQ
Tomi Engdahl says:
Deepseek nousi viikonloppuna suosituimmaksi ilmaiseksi puhelinsovellukseksi Yhdysvalloissa.
“New York Timesin mukaan Deepseekin kehittämiseen tarvittiin kuuden miljoonan dollarin arvosta tietokonelaskentaa. Open AI:n kehittyneimpien mallien koulutuksen arvioidaan maksavan kymmeniä ellei satoja miljoonia.” (HS)
Jos tuo arvio pitää lähellekään paikkaansa, luo Kiina merkittävää disruptiota teknokuplaan. Deepseek on vieläpä julkaistu avoimena lähdekoodina eli kuka tahansa pystyy kopioimaan sen ja muokkaamaan siitä oman versionsa. Lähtökohtaisesti Kiinassa ei edes haluta tehdä rahaa tällä julkaisulla vaan puhtaasti lyödä kapuloita rattaisiin USA:n vahvasti tekoälyvetoiselle taloudelle. Ovelaa touhua.
HS:n artikkeli valitettavasti maksumuurin takana.
https://www.hs.fi/visio/art-2000010991243.html
Tomi Engdahl says:
Näin tunnistat botin
https://etn.fi/index.php/13-news/17075-naein-tunnistat-botin
Lähes puolet internetliikenteestä on koneiden generoimaa ja haitalliset botit muodostavat lähes kolmanneksen kaikesta liikenteestä. Sosiaalisen median botit ovat erityisen yleisiä, ja jopa 65 prosenttia näistä boteista on haitallisia, kertoo tuore tutkimus.
Asiantuntijat AI-kehitystyökalu AIPRM:ltä ovat jakaneet vinkkejä, joiden avulla sosiaalisen median botin voi tunnistaa. AIPRM:n perustaja Christoph C. Cemper myös varoittaa bottien mahdollisista huijauksista.
Botit voidaan tunnistaa monista erityispiirteistä. Näiden tuntomerkkien avulla voit arvioida, onko kyseessä ihmisen sijaan automaattinen tili:
Botit käyttävät usein geneerisiä tai satunnaisia käyttäjänimiä, heikkolaatuisia kuvia tai internetistä otettuja varastokuvia, ja niiden profiilitiedot ovat vajaita.
Bottien viestintä on usein kömpelöä, ja viesteissä esiintyy paljon kielioppivirheitä, kömpelöitä käännöksiä ja oudon rakenteisia lauseita.
Botit vastaavat usein viesteihin välittömästi ja julkaisevat sisältöä epätavallisiin vuorokaudenaikoihin. Lisäksi bottien lähettämissä viesteissä ei näy kirjoitusilmaisinta, koska botit eivät kirjoita vaan lähettävät viestit suoraan.
Botit julkaisevat usein paljon sisältöä lyhyessä ajassa, mikä poikkeaa ihmisten käyttäytymisestä. Myös bottitilien sitoutumisasteet voivat olla epänormaaleja, koska niiden seuraajat ovat usein muita botteja tai passiivisia tilejä.
Haitalliset botit jakavat usein samanlaista tai identtistä sisältöä eri tileillä ja alustoilla. Tämä toisteisuus erottuu ihmisten luonnollisesta sisällöntuotannosta.
Jos epäilet, että olet kohdannut botin, voi käyttää verkosta löytyviä bottientunnistustyökaluja.
Teknologian kehittyessä botit herättävät yhä enemmän kysymyksiä yksityisyydestä ja turvallisuudesta. Haitalliset botit voivat kerätä suuria määriä dataa, eikä usein tiedetä, miten tietoja säilytetään tai käytetään. Botit ovat myös merkittävä väline väärän tiedon ja haitallisen sisällön levittämisessä.
Your Cheat Code for AI
https://www.aiprm.com/
Tomi Engdahl says:
DeepSeek’s ‘Sputnik moment’ prompts investors to sell big AI players
https://www.reuters.com/technology/chinas-deepseek-sets-off-ai-market-rout-2025-01-27/?utm_medium=Social&utm_source=Facebook&fbclid=IwZXh0bgNhZW0CMTEAAR2Y6hipRkDC9vwmOvoSKyIeJIV_225UJ_KXotFoS57CUkL4JYAlECwbuak_aem_xs3jE0Uq9Ave_NKF79NHUw
Tech stocks sink; Nvidia drops sharply in premarket
China’s DeepSeek AI assistance surges in popularity
Nervous investors seek safe-havens, dollar falls
LONDON/SINGAPORE, Jan 27 (Reuters) – Investors hammered technology stocks on Monday, sending the likes of Nvidia (NVDA.O), opens new tab and Oracle (ORCL.N), opens new tab plummeting, as the emergence of a low-cost Chinese artificial intelligence model cast doubts on Western companies’ dominance in this sector.
Startup DeepSeek last week launched a free assistant it says uses less data at a fraction of the cost of incumbent players’ models, possibly marking a turning point in the level of investment needed for AI.
Futures on the Nasdaq 100 slid almost 4%, suggesting the index could see its biggest daily slide since September 2022 later on Monday, if those losses are sustained.
Those on the S&P 500 dropped 2%. Shares in AI chipmaker Nvidia fell more than 11%, rival Oracle dropped 8.5% and AI data analytics company Palantir (PLTR.O), opens new tab lost 6.5% in pre-market trading.
DeepSeek, which by Monday had overtaken U.S. rival ChatGPT in terms of downloads on the Apple Store, offers the prospect of a viable, cheaper AI alternative which has raised questions about the sustainability of the level of spending and investment on AI by Western companies, including Apple (AAPL.O), opens new tab and Microsoft (MSFT.O), opens new tab.
From Tokyo to Amsterdam, shares in AI players tumbled.
“We still don’t know the details and nothing has been 100% confirmed in regards to the claims, but if there truly has been a breakthrough in the cost to train models from $100 million+ to this alleged $6 million number this is actually very positive for productivity and AI end users as cost is obviously much lower meaning lower cost of access,” Jon Withaar, a senior portfolio manager at Pictet Asset Management, said.
The hype around AI has powered a huge inflow of capital into the equity markets in the last 18 months in particular, as investors have bought into the technology, inflating company valuations and sending stock markets to record highs.
Little is known about the small Hangzhou startup behind DeepSeek. Its researchers wrote in a paper last month that the DeepSeek-V3 model, launched on Jan. 10, used Nvidia’s H800 chips for training, spending less than $6 million – the figure referenced by Pictet’s Withaar.
“Deepseek R1 is one of the most amazing and impressive breakthroughs I’ve ever seen — and as open source, a profound gift to the world,” he said in a separate post.
In Europe, ASML (ASML.AS), opens new tab which counts Taiwan’s TSMC (2330.TW), opens new tab, Intel (INTC.O), opens new tab and Samsung (005930.KS), opens new tab as its customers, dropped almost 7.5%, while Siemens Energy (ENR1n.DE), opens new tab lost nearly 18%. In Japan, startup investor SoftBank Group (9984.T), opens new tab slid more than 8%. Last week it announced a $19 billion commitment to fund Stargate, a data-centre joint venture with OpenAI.
Masahiro Ichikawa, chief market strategist at Sumitomo Mitsui DS Asset Management said: “The idea that the most cutting-edge technologies in America, like Nvidia and ChatGPT, are the most superior globally, there’s concern that this perspective might start to change.”
“I think it might be a bit premature,” Ichikawa said.
Tomi Engdahl says:
Chinese AI startup DeepSeek overtakes ChatGPT on Apple App Store
https://www.reuters.com/technology/artificial-intelligence/chinese-ai-startup-deepseek-overtakes-chatgpt-apple-app-store-2025-01-27/
BEIJING, Jan 27 (Reuters) – Chinese startup DeepSeek’s AI Assistant on Monday overtook rival ChatGPT to become the top-rated free application available on Apple’s App Store in the United States.
Powered by the DeepSeek-V3 model, which its creators say “tops the leaderboard among open-source models and rivals the most advanced closed-source models globally”, the artificial intelligence application has surged in popularity among U.S. users since it was released on Jan. 10, according to app data research firm Sensor Tower.
AI models from ChatGPT to DeepSeek require advanced chips to power their training. The Biden administration has since 2021 widened the scope of bans designed to stop these chips from being exported to China and used to train Chinese firms’ AI models.
However, DeepSeek researchers wrote in a paper last month that the DeepSeek-V3 used Nvidia’s H800 chips for training, spending less than $6 million.
Although this detail has since been disputed, the claim that the chips used were less powerful than the most advanced Nvidia products Washington has sought to keep out of China, as well as the relatively cheap training costs, has prompted U.S. tech executives to question the effectiveness of tech export controls.
Since then, dozens of Chinese tech companies large and small have released their own AI models, but DeepSeek is the first to be praised by the U.S. tech industry as matching or even surpassing the performance of cutting-edge U.S. models.
Tomi Engdahl says:
https://www.ft.com/content/c82933fe-be28-463b-8336-d71a2ff5bbbf?fbclid=IwZXh0bgNhZW0CMTEAAR0DjtMP5t2xXaxX0GTK7ppJ_uj0dNRoQktkxjzZw5UU9CTqGYy13vJ_Hxk_aem_WH3-WdKRsqXn-hT9y5Yqng
For anyone wanting to train an LLM on analyst responses to DeepSeek, the Temu of ChatGPTs, this post is a one-stop shop. We’ve grabbed all relevant sellside emails in our inbox and copy-pasted them with minimal intervention.
Tomi Engdahl says:
A surprise advancement from a Chinese artificial intelligence company, DeepSeek, threatened the aura of invincibility surrounding America’s technology industry.
A shocking Chinese AI advancement called DeepSeek is sending US stocks plunging
https://edition.cnn.com/2025/01/27/tech/deepseek-stocks-ai-china/index.html?Date=20250127&Profile=CNN+International&utm_content=1737986429&utm_medium=social&utm_source=facebook&fbclid=IwZXh0bgNhZW0CMTEAAR0wxpcyT24NXe8iG7X-7XIE_3FqD_DsSXu65-OiWtWmCQfRnkkjmj9G6v4_aem_S9RARpTImEUSrDpUwWH0Fg
US stocks dropped sharply Monday morning after a surprise advancement from a Chinese artificial intelligence company, DeepSeek, threatened the aura of invincibility surrounding America’s technology industry.
Tomi Engdahl says:
DeepSeek, a one-year-old startup, revealed a stunning capability: It presented a ChatGPT-like AI model called R1, which has all the familiar abilities, operating at a fraction of the cost of OpenAI’s, Google’s or Meta’s popular AI models.
Tomi Engdahl says:
Kiinalainen tekoäly tuli puun takaa ja löi kaikki ällikällä – Syrjäytti ChatGPT:n, voi aiheuttaa merkittäviä markkinajäristyksiä
Kiinalaisesta tekoälymallista on kohistu paljon. Siihen on kaksi syytä.
https://www.talouselama.fi/uutiset/te/b5e2a0de-3a01-45ba-8763-84341a8ba975?utm_term=Autofeed&utm_medium=Social&utm_source=Facebook&fbclid=IwZXh0bgNhZW0CMTEAAR1WXo-nKEGUfUALyLU0rVWsPKw5ArRFGvomUZHzLSdzzqZFPg9BWslnTxo_aem_e8B6YKf_2CO3VoFJzPyB0Q#Echobox=1737967488
Kiinalaisen tekoälystartupin Deepseekin sovellus nousi Applen sovelluskaupan ladatuimmaksi sovellukseksi Yhdysvalloista.
Asiasta kertoo uutistoimisto Reuters
Tomi Engdahl says:
The Wall Street Journalin mukaan Deepseekin tekoälymallit olivat Berkeleyn yliopiston pitämässä listauksessa tehokkaampia kuin esimerkiksi Elon Muskin omistaman X:n käyttämä Grok. Listauksessa Googlen Gemini-malli on ykkönen.
Tomi Engdahl says:
The release of a less capital-intensive artificial intelligence model from China’s DeepSeek sent a chill through the U.S. stock market Monday, initiating a massive selloff and hitting billionaires where it hurts—their fortunes.
Tomi Engdahl says:
The DeepSeek sell-off: What major analysts are saying about Nvidia, possible AI bubble popping
Published Mon, Jan 27 20256:53 AM ESTUpdated Mon, Jan 27 20258:47 AM EST
thumbnail
John Melloy
@johnmelloy
WATCH LIVE
https://www.cnbc.com/2025/01/27/deepseek-sell-off-nvidia-analysts-react.html?__source=pro%7Corganic%7Csocial%7Cfacebook%7Cq12025&tpcc=pro%7Corganic%7Csocial%7Cfacebook%7Cq12025&fbclid=IwZXh0bgNhZW0CMTEAAR3ySp6EliYiG_5CbI1KFO99aS1t2L4lt2YYEwki8Udiv3QpGe9UFHtsYeQ_aem_qJ6ZhT76EAlHXFM-oj9trw#no_universal_links
Tomi Engdahl says:
What is DeepSeek, the Chinese AI startup that shook the tech world?
https://edition.cnn.com/2025/01/27/tech/deepseek-ai-explainer/index.html?Date=20250127&Profile=CNN,CNN+International&utm_content=1737992391&utm_medium=social&utm_source=facebook&fbclid=IwZXh0bgNhZW0CMTEAAR1kJjmymRLSuufVQpsugnIhMokmIBxgFfR6jLkwS7fAZ-W9us9N-Y4ClMk_aem_M5HCneYIuJyxLB73HI6Fmw
A surprisingly efficient and powerful Chinese AI model has taken the technology industry by storm. It’s called DeepSeek R1, and it’s rattling nerves on Wall Street.
The new AI model was developed by DeepSeek, a startup that was born just a year ago and has somehow managed a breakthrough that famed tech investor Marc Andreessen has called “AI’s Sputnik moment”: R1 can nearly match the capabilities of its far more famous rivals, including OpenAI’s GPT-4, Meta’s Llama and Google’s Gemini — but at a fraction of the cost.
The company said it had spent just $5.6 million powering its base AI model, compared with the hundreds of millions, if not billions of dollars US companies spend on their AI technologies. That’s even more shocking when considering that the United States has worked for years to restrict the supply of high-power AI chips to China, citing national security concerns. That means DeepSeek was supposedly able to achieve its low-cost model on relatively under-powered AI chips.
The DeepSeek app has surged on the app store charts, surpassing ChatGPT Monday, and it has been downloaded nearly 2 million times.
Why is DeepSeek such a big deal?
AI is a power-hungry and cost-intensive technology — so much so that America’s most powerful tech leaders are buying up nuclear power companies to provide the necessary electricity for their AI models.
Meta last week said it would spend upward of $65 billion this year on AI development. Sam Altman, CEO of OpenAI, last year said the AI industry would need trillions of dollars in investment to support the development of high-in-demand chips needed to power the electricity-hungry data centers that run the sector’s complex models.
So the notion that similar capabilities as America’s most powerful AI models can be achieved for such a small fraction of the cost — and on less capable chips — represents a sea change in the industry’s understanding of how much investment is needed in AI. The technology has many skeptics and opponents, but its advocates promise a bright future: AI will advance the global economy into a new era, they argue, making work more efficient and opening up new capabilities across multiple industries that will pave the way for new research and developments.
DeepSeek may show that turning off access to a key technology doesn’t necessarily mean the United States will win. That’s an important message to President Donald Trump as he pursues his isolationist “America First” policy.
Wall Street was alarmed by the development. US stocks were set for a steep selloff Monday morning. Nvidia (NVDA), the leading supplier of AI chips, whose stock more than doubled in each of the past two years, fell 12% in premarket trading. Meta (META) and Alphabet (GOOGL), Google’s parent company, were also down sharply, as were Marvell, Broadcom, Palantir, Oracle and many other tech giants.
“The DeepSeek model rollout is leading investors to question the lead that US companies have and how much is being spent and whether that spending will lead to profits (or overspending),” said Keith Lerner, analyst at Truist. “Ultimately, our view, is the required spend for data and such in AI will be significant, and US companies remain leaders.”
Although the cost-saving achievement may be significant, the R1 model is a ChatGPT competitor — a consumer-focused large-language model. It hasn’t yet proven it can handle some of the massively ambitious AI capabilities for industries that — for now — still require tremendous infrastructure investments.
“Thanks to its rich talent and capital base, the US remains the most promising ‘home turf’ from which we expect to see the emergence of the first self-improving AI,” said Giuseppe Sette, president of AI market research firm Reflexivity.
Tomi Engdahl says:
Viral AI company DeepSeek releases new image model family
https://techcrunch.com/2025/01/27/viral-ai-company-deepseek-releases-new-image-model-family/?fbclid=IwY2xjawIEysFleHRuA2FlbQIxMQABHYb_Lw_jE6s-0RAFxAByLPbOrcpxg9r0vqtIYNy07qnj6hW9U8iAd8aQxg_aem_8E4cM9o2BP6NeqhlmsBymQ
DeepSeek, the viral AI company, has released a new set of multimodal AI models that it claims can outperform OpenAI’s DALL-E 3.
The models, which are available for download from the AI dev platform Hugging Face, are part of a new model family that DeepSeek is calling Janus-Pro. They range in size from 1 billion to 7 billion parameters. Parameters roughly correspond to a model’s problem-solving skills, and models with more parameters generally perform better than those with fewer parameters.
Janus-Pro, which DeepSeek describes as a “novel autoregressive framework,” can both analyze and create new images. According to the company, on two AI evaluation benchmarks, GenEval and DPG-Bench, the largest Janus-Pro model, Janus-Pro-7B, beats DALL-E 3 as well as models such as PixArt-alpha, Emu3-Gen, and Stability AI‘s Stable Diffusion XL.
Granted, some of those models are on the older side, and most Janus-Pro models can only analyze small images with a resolution of up to 384 x 384. But Janus-Pro’s performance is impressive, considering the models’ compact sizes.
Tomi Engdahl says:
Teknologiayhtiö Nvidian osake syöksyy – taustalla kiinalainen tekoäly-yhtiö DeepSeek
DeepSeek on kertonut käyttäneensä mallinsa kehittämiseen vain 5,6 miljoonaa dollaria.
Luin seuraavan artikkelin ja ajattelin sen kiinnostavan sinua:
Teknologiayhtiö Nvidian osake syöksyy – taustalla kiinalainen tekoäly-yhtiö DeepSeek
https://www.is.fi/taloussanomat/art-2000010993363.html
Kiinalaisen tekoäly-yhtiön DeepSeekin ennakoitu menestys näkyi pörsseissä maanantaina. Teknologiayhtiö Nvidian osake on maanantaina ollut jopa 17 prosentin laskussa. New Yorkissa teknologiapainotteinen Nasdaq-indeksi oli taas kaupankäynnin alussa yli kolme prosenttia miinuksella.
Tomi Engdahl says:
Kyle Wiggers / TechCrunch:
Meta says Meta AI will now use account info from across Meta’s apps to give personalized suggestions; users can also ask Meta AI to “remember” specific things
Meta AI can now use your Facebook and Instagram data to personalize its responses
https://techcrunch.com/2025/01/27/meta-ai-can-now-use-your-facebook-and-instagram-data-to-personalize-its-responses/
Meta says that it is rolling out improvements to Meta AI, its cross-platform chatbot, including the ability to have the bot “remember” details from conversations.
In a post on Meta’s official blog, the company said that, in chats with Meta AI on Facebook, Messenger, and WhatsApp for iOS and Android in the U.S. and Canada, users can now tell Meta AI to remember certain things about them, like that they love to travel and learn new languages.
The memory feature, similar to the memory features for OpenAI’s ChatGPT and Google’s Gemini, lets Meta AI pick up on “important details” based on context, according to Meta. For example, if a user mentioned in a previous chat that they’re vegan and asks Meta AI for breakfast ideas, the chatbot will consistently factor in that dietary preference.
https://about.fb.com/news/2025/01/building-toward-a-smarter-more-personalized-assistant/
Tomi Engdahl says:
Kyle Wiggers / TechCrunch:
DeepSeek debuts a family of multimodal, MIT-licensed open-source models including Janus-Pro-7B, which it claims beats OpenAI’s DALL-E 3 in GenEval and DPG-Bench
Viral AI company DeepSeek releases new image model family
https://techcrunch.com/2025/01/27/viral-ai-company-deepseek-releases-new-image-model-family/
DeepSeek, the viral AI company, has released a new set of multimodal AI models that it claims can outperform OpenAI’s DALL-E 3.
The models, which are available for download from the AI dev platform Hugging Face, are part of a new model family that DeepSeek is calling Janus-Pro. They range in size from 1 billion to 7 billion parameters. Parameters roughly correspond to a model’s problem-solving skills, and models with more parameters generally perform better than those with fewer parameters.
Janus-Pro is under an MIT license, meaning it can be used commercially without restriction.
Janus-Pro, which DeepSeek describes as a “novel autoregressive framework,” can both analyze and create new images. According to the company, on two AI evaluation benchmarks, GenEval and DPG-Bench, the largest Janus-Pro model, Janus-Pro-7B, beats DALL-E 3 as well as models such as PixArt-alpha, Emu3-Gen, and Stability AI‘s Stable Diffusion XL.
Granted, some of those models are on the older side, and most Janus-Pro models can only analyze small images with a resolution of up to 384 x 384. But Janus-Pro’s performance is impressive, considering the models’ compact sizes.
“Janus-Pro surpasses previous unified model and matches or exceeds the performance of task-specific models,” DeepSeek writes in a post on Hugging Face. “The simplicity, high flexibility, and effectiveness of Janus-Pro make it a strong candidate for next-generation unified multimodal models.”
Tomi Engdahl says:
Matt Marshall / VentureBeat:
How DeepSeek outpaced OpenAI at a fraction of the cost: open source, pure reinforcement learning, no supervised fine-tuning, and building on DeepSeek-R1-Zero — DeepSeek-R1′s release last Monday has sent shockwaves through the AI community, disrupting assumptions about what’s required to achieve cutting-edge AI performance.
https://venturebeat.com/ai/deepseek-r1s-bold-bet-on-reinforcement-learning-how-it-outpaced-openai-at-3-of-the-cost/
Tomi Engdahl says:
Dan Primack / Axios:
DeepSeek could be an extinction-level event for venture capital firms that went all-in on foundational model companies; investors say they are not panicking
DeepSeek resets the board
https://www.axios.com/2025/01/27/deepseek-ai-china-venture-capital
Davos consensus last week was that the U.S. had a giant lead in the AI race, with the only real question being if there would be enough general contractors to build all of the needed data centers.
Ummm … maybe not.
Driving the news: China’s DeepSeek appears to have built AI models that rival OpenAI, while allegedly using much less money, chips, and energy.
It’s an open-source project hatched by a hedge fund, which at least for now seems aimed at developers instead of at enterprises or consumers. But that product focus could expand, particularly given that DeepSeek yesterday topped Apple’s App Store.
Why it matters: This could be an extinction-level event for venture capital firms that went all-in on foundational model companies. Particularly if those companies haven’t yet productized with wide distribution.
The quantums of capital are just so much more than anything VC has ever before disbursed, based on what might be a suddenly-stale thesis.
If nanotech and web3 were venture industry grenades, this could be a nuclear bomb.
Investors I spoke to over the weekend aren’t panicking, but they’re clearly concerned. Particularly that they could be taken so off-guard. Don’t be surprised if some deals in process get paused.
Yes, but: There’s still a ton we don’t know about DeepSeek, including if it really spent as little money as it claims. And obviously there could be national security impediments for U.S. companies or consumers, given what we’ve seen with TikTok.
The bottom line: The game has changed.
Tomi Engdahl says:
The anatomy of a bubble bursting
https://www.axios.com/2025/01/27/bubble-bursting-ai-nvidia-deepseek
Why it matters: That can cost investors $1 trillion or more in a single day, as happened Monday with the global AI rout.
It can also challenge the fundamental assumptions behind an entire economy, like the nascent Trump administration’s push to invest hundreds of billions of dollars in American AI supremacy.
Zoom out: In the 1950s, the Soviets beat the U.S. into space. In 2025, China appears to have potentially beaten the U.S. to building a better AI mousetrap.
Last week, the small Chinese upstart DeepSeek announced a new reasoning model, R1, that appears to outperform the best America has to offer, including OpenAI’s ChatGPT, Anthropic’s Claude and Meta’s Llama.
The problem? Those companies spent billions of dollars building their models, fueling growth for companies like Nvidia, whose chips are the gold standard in that training process.
DeepSeek spent a mere $6 million, figured out how to do it faster and more efficiently with cheaper hardware, and then released the whole thing as a free, open-source platform.
The big picture: President Trump’s economic vision relies on massive growth, fueled by the AI boom that his closest advisers have sold as the country’s future.
The biggest economic announcement of his first week in office was Stargate, a five-year plan to spend $500 billion on AI infrastructure. (Complicating matters, Trump ally Elon Musk immediately cast doubt on whether anyone actually had the money to fund the project.)
But if China can do AI better and faster at one one-thousandth of the cost, it casts a shadow on the rationale for spending that kind of money and leaves the country playing catch-up.
Tomi Engdahl says:
Ben Thompson / Stratechery:
An in-depth look at DeepSeek: DeepSeekMoE and DeepSeekMLA, cheap V3 training, the US chip ban, “distillation” from other models, impact on Nvidia, AGI, and more — It’s Monday, January 27. Why haven’t you written about DeepSeek yet? — I did! I wrote about R1 last Tuesday.
It’s Monday, January 27. Why haven’t you written about DeepSeek yet?
https://stratechery.com/2025/deepseek-faq/
DeepSeekMoE, as implemented in V2, introduced important innovations on this concept, including differentiating between more finely-grained specialized experts, and shared experts with more generalized capabilities. Critically, DeepSeekMoE also introduced new approaches to load-balancing and routing during training; traditionally MoE increased communications overhead in training in exchange for efficient inference, but DeepSeek’s approach made training more efficient as well.
DeepSeekMLA was an even bigger breakthrough. One of the biggest limitations on inference is the sheer amount of memory required: you both need to load the model into memory and also load the entire context window. Context windows are particularly expensive in terms of memory, as every token requires both a key and corresponding value; DeepSeekMLA, or multi-head latent attention, makes it possible to compress the key-value store, dramatically decreasing memory usage during inference.
I’m not sure I understood any of that.
The key implications of these breakthroughs — and the part you need to understand — only became apparent with V3, which added a new approach to load balancing (further reducing communications overhead) and multi-token prediction in training (further densifying each training step, again reducing overhead): V3 was shockingly cheap to train. DeepSeek claimed the model training took 2,788 thousand H800 GPU hours, which, at a cost of $2/GPU hour, comes out to a mere $5.576 million.
So no, you can’t replicate DeepSeek the company for $5.576 million.
I still don’t believe that number.
Actually, the burden of proof is on the doubters, at least once you understand the V3 architecture. Remember that bit about DeepSeekMoE: V3 has 671 billion parameters, but only 37 billion parameters in the active expert are computed per token; this equates to 333.3 billion FLOPs of compute per token. Here I should mention another DeepSeek innovation: while parameters were stored with BF16 or FP32 precision, they were reduced to FP8 precision for calculations; 2048 H800 GPUs have a capacity of 3.97 exoflops, i.e. 3.97 billion billion FLOPS. The training set, meanwhile, consisted of 14.8 trillion tokens; once you do all of the math it becomes apparent that 2.8 million H800 hours is sufficient for training V3. Again, this was just the final run, not the total cost, but it’s a plausible number.
Scale AI CEO Alexandr Wang said they have 50,000 H100s.
So was this a violation of the chip ban?
Nope. H100s were prohibited by the chip ban, but not H800s. Everyone assumed that training leading edge models required more interchip memory bandwidth, but that is exactly what DeepSeek optimized both their model structure and infrastructure around.
Again, just to emphasize this point, all of the decisions DeepSeek made in the design of this model only make sense if you are constrained to the H800; if DeepSeek had access to H100s, they probably would have used a larger training cluster with much fewer optimizations specifically focused on overcoming the lack of bandwidth.
So V3 is a leading edge model?
It’s definitely competitive with OpenAI’s 4o and Anthropic’s Sonnet-3.5, and appears to be better than Llama’s biggest model. What does seem likely is that DeepSeek was able to distill those models to give V3 high quality tokens to train on.
What is distillation?
Distillation is a means of extracting understanding from another model; you can send inputs to the teacher model and record the outputs, and use that to train the student model. This is how you get models like GPT-4 Turbo from GPT-4. Distillation is easier for a company to do on its own models, because they have full access, but you can still do distillation in a somewhat more unwieldy way via API, or even, if you get creative, via chat clients.
Distillation obviously violates the terms of service of various models, but the only way to stop it is to actually cut off access, via IP banning, rate limiting, etc. It’s assumed to be widespread in terms of model training, and is why there are an ever-increasing number of models converging on GPT-4o quality. This doesn’t mean that we know for a fact that DeepSeek distilled 4o or Claude, but frankly, it would be odd if they didn’t.
Indeed, this is probably the core economic factor undergirding the slow divorce of Microsoft and OpenAI. Microsoft is interested in providing inference to its customers, but much less enthused about funding $100 billion data centers to train leading edge models that are likely to be commoditized long before that $100 billion is depreciated.
Is this why all of the Big Tech stock prices are down?
In the long run, model commoditization and cheaper inference — which DeepSeek has also demonstrated — is great for Big Tech. A world where Microsoft gets to provide inference to its customers for a fraction of the cost means that Microsoft has to spend less on data centers and GPUs, or, just as likely, sees dramatically higher usage given that inference is so much cheaper. Another big winner is Amazon: AWS has by-and-large failed to make their own quality model, but that doesn’t matter if there are very high quality open source models that they can serve at far lower costs than expected.
Apple is also a big winner. Dramatically decreased memory requirements for inference make edge inference much more viable, and Apple has the best hardware for exactly that. Apple Silicon uses unified memory, which means that the CPU, GPU, and NPU (neural processing unit) have access to a shared pool of memory; this means that Apple’s high-end hardware actually has the best consumer chip for inference (Nvidia gaming GPUs max out at 32GB of VRAM, while Apple’s chips go up to 128 GB of RAM).
Meta, meanwhile, is the biggest winner of all. I already laid out last fall how every aspect of Meta’s business benefits from AI; a big barrier to realizing that vision is the cost of inference, which means that dramatically cheaper inference — and dramatically cheaper training, given the need for Meta to stay on the cutting edge — makes that vision much more achievable.
Google, meanwhile, is probably in worse shape: a world of decreased hardware requirements lessens the relative advantage they have from TPUs. More importantly, a world of zero-cost inference increases the viability and likelihood of products that displace search; granted, Google gets lower costs as well, but any change from the status quo is probably a net negative.
How did DeepSeek make R1?
DeepSeek actually made two models: R1 and R1-Zero. I actually think that R1-Zero is the bigger deal;
Here again it seems plausible that DeepSeek benefited from distillation, particularly in terms of training R1. That, though, is itself an important takeaway: we have a situation where AI models are teaching AI models, and where AI models are teaching themselves. We are watching the assembly of an AI takeoff scenario in realtime.
So are we close to AGI?
It definitely seems like it. This also explains why Softbank (and whatever investors Masayoshi Son brings together) would provide the funding for OpenAI that Microsoft will not: the belief that we are reaching a takeoff point where there will in fact be real returns towards being first.
But isn’t R1 now in the lead?
I don’t think so; this has been overstated. R1 is competitive with o1, although there do seem to be some holes in its capability that point towards some amount of distillation from o1-Pro. OpenAI, meanwhile, has demonstrated o3, a far more powerful reasoning model. DeepSeek is absolutely the leader in efficiency, but that is different than being the leader overall.
So why is everyone freaking out?
I think there are multiple factors. First, there is the shock that China has caught up to the leading U.S. labs, despite the widespread assumption that China isn’t as good at software as the U.S.. This is probably the biggest thing I missed in my surprise over the reaction. The reality is that China has an extremely proficient software industry generally, and a very good track record in AI model building specifically.
Second is the low training cost for V3, and DeepSeek’s low inference costs. This part was a big surprise for me as well, to be sure, but the numbers are plausible. This, by extension, probably has everyone nervous about Nvidia, which obviously has a big impact on the market.
Third is the fact that DeepSeek pulled this off despite the chip ban. Again, though, while there are big loopholes in the chip ban, it seems likely to me that DeepSeek accomplished this with legal chips.
I own Nvidia! Am I screwed?
There are real challenges this news presents to the Nvidia story. Nvidia has two big moats:
CUDA is the language of choice for anyone programming these models, and CUDA only works on Nvidia chips.
Nvidia has a massive lead in terms of its ability to combine multiple chips together into one large virtual GPU.
These two moats work together. I noted above that if DeepSeek had access to H100s they probably would have used a larger cluster to train their model, simply because that would have been the easier option; the fact they didn’t, and were bandwidth constrained, drove a lot of their decisions in terms of both model architecture and their training infrastructure.
That noted, there are three factors still in Nvidia’s favor. First, how capable might DeepSeek’s approach be if applied to H100s, or upcoming GB100s? Just because they found a more efficient way to use compute doesn’t mean that more compute wouldn’t be useful. Second, lower inference costs should, in the long run, drive greater usage. Microsoft CEO Satya Nadella, in a late night tweet almost assuredly directed at the market, said exactly that:
Third, reasoning models like R1 and o1 derive their superior performance from using more compute. To the extent that increasing the power and capabilities of AI depend on more compute is the extent that Nvidia stands to benefit!
Still, it’s not all rosy. At a minimum DeepSeek’s efficiency and broad availability cast significant doubt on the most optimistic Nvidia growth story, at least in the near term. The payoffs from both model and infrastructure optimization also suggest there are significant gains to be had from exploring alternative approaches to inference in particular. For example, it might be much more plausible to run inference on a standalone AMD GPU, completely sidestepping AMD’s inferior chip-to-chip communications capability. Reasoning models also increase the payoff for inference-only chips that are even more specialized than Nvidia’s GPUs.
In short, Nvidia isn’t going anywhere; the Nvidia stock, however, is suddenly facing a lot more uncertainty that hasn’t been priced in. And that, by extension, is going to drag everyone down.
Wait, why is China open-sourcing their model?
Well DeepSeek is, to be clear; CEO Liang Wenfeng said in a must-read interview that open source is key to attracting talent:
In the face of disruptive technologies, moats created by closed source are temporary. Even OpenAI’s closed source approach can’t prevent others from catching up. So we anchor our value in our team — our colleagues grow through this process, accumulate know-how, and form an organization and culture capable of innovation. That’s our moat.
Open source, publishing papers, in fact, do not cost us anything. For technical talent, having others follow your innovation gives a great sense of accomplishment. In fact, open source is more of a cultural behavior than a commercial one, and contributing to it earns us respect. There is also a cultural attraction for a company to do this.
The interviewer asked if this would change:
DeepSeek, right now, has a kind of idealistic aura reminiscent of the early days of OpenAI, and it’s open source. Will you change to closed source later on? Both OpenAI and Mistral moved from open-source to closed-source.
We will not change to closed source. We believe having a strong technical ecosystem first is more important.
This actually makes sense beyond idealism. If models are commodities — and they are certainly looking that way — then long-term differentiation comes from having a superior cost structure; that is exactly what DeepSeek has delivered, which itself is resonant of how China has come to dominate other industries. This is also contrary to how most U.S. companies think about differentiation, which is through having differentiated products that can sustain larger margins.
So is OpenAI screwed?
Not necessarily. ChatGPT made OpenAI the accidental consumer tech company, which is to say a product company; there is a route to building a sustainable consumer business on commoditizable models through some combination of subscriptions and advertisements. And, of course, there is the bet on winning the race to AI take-off.
Anthropic, on the other hand, is probably the biggest loser of the weekend. DeepSeek made it to number one in the App Store, simply highlighting how Claude, in contrast, hasn’t gotten any traction outside of San Francisco. The API business is doing better, but API businesses in general are the most susceptible to the commoditization trends that seem inevitable (and do note that OpenAI and Anthropic’s inference costs look a lot higher than DeepSeek because they were capturing a lot of margin; that’s going away).
So this is all pretty depressing, then?
Actually, no. I think that DeepSeek has provided a massive gift to nearly everyone. The biggest winners are consumers and businesses who can anticipate a future of effectively-free AI products and services. Jevon’s Paradox will rule the day in the long run, and everyone who uses AI will be the biggest winners.
Another set of winners are the big consumer tech companies. A world of free AI is a world where product and distribution matters most, and those companies already won that game; The End of the Beginning was right.
China is also a big winner, in ways that I suspect will only become apparent over time. Not only does the country have access to DeepSeek, but I suspect that DeepSeek’s relative success to America’s leading AI labs will result in a further unleashing of Chinese innovation as they realize they can compete.
That leaves America, and a choice we have to make. We could, for very logical reasons, double down on defensive measures, like massively expanding the chip ban and imposing a permission-based regulatory regime on chips and semiconductor equipment that mirrors the E.U.s approach to tech; alternatively, we could realize that we have real competition, and actually give ourself permission to compete. Stop wringing our hands, stop campaigning for regulations — indeed, go the other way, and cut out all of the cruft in our companies that have nothing to do with winning. If we choose to compete we can still win, and, if we do, we will have a Chinese company to thank.
Tomi Engdahl says:
Hayden Field / CNBC:
DeepSeek says it’s temporarily limiting new user registrations “due to large-scale malicious attacks” on its services — DeepSeek on Monday said it would temporarily limit user registrations “due to large-scale malicious attacks” on its services, though existing users will be able to log in as usual.
DeepSeek hit with large-scale cyberattack, says it’s limiting registrations
https://www.cnbc.com/2025/01/27/deepseek-hit-with-large-scale-cyberattack-says-its-limiting-registrations.html
Tomi Engdahl says:
Nvidia calls China’s DeepSeek R1 model ‘an excellent AI advancement’
https://www.cnbc.com/2025/01/27/nvidia-calls-chinas-deepseek-r1-model-an-excellent-ai-advancement.html
Nvidia called DeepSeek’s R1 model “an excellent AI advancement,” despite the Chinese startup’s emergence causing the chip maker’s stock price to plunge 17% on Monday.
The comments come after DeepSeek last week released R1, which is an open-source reasoning model that reportedly outperformed the best models from U.S. companies such as OpenAI’s.
Nvidia’s statement indicates that it sees DeepSeek’s breakthrough as creating more work for the American chip maker’s graphics processing units, or GPUs.
Nvidia
called DeepSeek’s R1 model “an excellent AI advancement,” despite the Chinese startup’s emergence causing the chip maker’s stock price to plunge 17% on Monday.
“DeepSeek is an excellent AI advancement and a perfect example of Test Time Scaling,” an Nvidia spokesperson told CNBC on Monday. “DeepSeek’s work illustrates how new models can be created using that technique, leveraging widely-available models and compute that is fully export control compliant.”
The comments come after DeepSeek last week released R1, which is an open-source reasoning model that reportedly outperformed the best models from U.S. companies such as OpenAI. R1′s self-reported training cost was less than $6 million, which is a fraction of the billions that Silicon Valley companies are spending to build their artificial-intelligence models.
Nvidia’s statement indicates that it sees DeepSeek’s breakthrough as creating more work for the American chip maker’s graphics processing units, or GPUs.
Tomi Engdahl says:
Steven Sinofsky / @stevesi:
DeepSeek’s use of commodity, disconnected hardware, and open-source design is enough of a shot at AI hyper scaling that it could be “the way things will go”
https://x.com/stevesi/status/1883746880536072375?mx=2
Tomi Engdahl says:
The Information:
Sources: Meta set up four war rooms to analyze High-Flyer’s DeepSeek, including two for how High-Flyer cut training costs and one on what data it may have used
Meta Scrambles After Chinese AI Equals Its Own, Upending Silicon Valley
https://www.theinformation.com/articles/meta-scrambles-after-chinese-ai-equals-its-own-upending-silicon-valley
Tomi Engdahl says:
Interesting take:
“The valuation of the tech giants is built on shifting sands. You can destroy any multi billion dollar social media app by changing regulations (eg TikTok) or even changing moderation rules (eg Twitter). You can destroy e-commerce giant valuations by fiat and sequestering their CEO (Alibaba) or by changing mobile gaming rules ( Tencent).
You can destroy over-spending and over-hyped AI businesses in America with a small open sourced low cost Chinese startup that replicated this for perhaps 5% of the cost to train. And then open-sourced it and gave it to the world for free.
Sanctions on the export of US tech merely create better Chinese competitor innovation because necessity is the mother of invention. That’s trillions of dollars at risk for arrogant technocrats. Typical Chinese innovation with constraints copying the west but doing it much better. ”
op: https://www.linkedin.com/posts/herman-singh-b669357_the-valuation-of-the-tech-giants-is-built-activity-7289683133031972864-pT5E?
Tomi Engdahl says:
DeepSeek Coder
https://deepseekcoder.github.io
DeepSeek Coder comprises a series of code language models trained from scratch on both 87% code and 13% natural language in English and Chinese.
https://deepseekcoder.github.io/
DeepSeek Coder comprises a series of code language models trained from scratch on both 87% code and 13% natural language in English and Chinese, with each model pre-trained on 2T tokens. We provide various sizes of the code model, ranging from 1B to 33B versions. Each model is pre-trained on repo-level code corpus by employing a window size of 16K and a extra fill-in-the-blank task, resulting in foundational models (DeepSeek-Coder-Base). We further fine-tune the base model with 2B tokens of instruction data to get instruction-tuned models, namedly DeepSeek-Coder-Instruct.
Pretrained on 2 Trillion tokens over more than 80 programming languages.
Various model sizes (1.3B, 5.7B, 6.7B and 33B) to support different requirements.
A window size of 16K window size, supporting project-level code completion and infilling.
State-of-the-Art performance among open code models.
Open source and free for research and commercial use.
Performance
We evaluate DeepSeek Coder on various coding-related benchmarks. The result shows that DeepSeek-Coder-Base-33B significantly outperforms existing open-source code LLMs. Compared with CodeLLama-34B, it leads by 7.9%, 9.3%, 10.8% and 5.9% respectively on HumanEval Python, HumanEval Multilingual, MBPP and DS-1000. Surprisingly, our DeepSeek-Coder-Base-7B reaches the performance of CodeLlama-34B. And the DeepSeek-Coder-Instruct-33B model after instruction tuning outperforms GPT-3.5-turbo on HumanEval and achieves comparable result with GPT-3.5-turbo on MBPP.
How to Use DeepSeek Coder
https://chat.deepseek.com/sign_in
Tomi Engdahl says:
Asana AI teammates don’t just answer questions — they take work off your hands by automating tasks and crafting workflows.
Tomi Engdahl says:
https://github.com/deepseek-ai/DeepSeek-Coder
Tomi Engdahl says:
Is DeepSeek Coder good?
I just spent 30 hours coding with DeepSeek V3, and it might be the best AI coding assistant I’ve ever used. From cleaning up 1000-line files to building APIs and even creating a chess game where LLMs play against each other, DeepSeek V3 blew me away. In this video, I’ll break down: How DeepSeek V3 compares to Claude.
DeepSeek V3 A 20-Year Developer’s Honest Review After 30 Hours of Coding
https://www.youtube.com/watch?v=cvH6xjpT1PA