Artificial intelligence is rapidly changing many aspects of how we work and live. (How many stories did you read last week about self-driving cars and job-stealing robots? Perhaps your holiday shopping involved some AI algorithms, as well.) But despite the constant flow of news, many misconceptions about AI remain.
AI doesn’t think in our sense of the word at all, Scriffignano explains. “In many ways, it’s not really intelligence. It’s regressive.”
IT leaders should make deliberate choices about what AI can and can’t do on its own. “You have to pay attention to giving AI autonomy intentionally and not by accident,”
6,742 Comments
Tomi Engdahl says:
“What we have is ‘better than most humans at most tasks.’” https://trib.al/nxYe8Sb
Tomi Engdahl says:
https://hackaday.com/2024/12/06/this-week-in-security-national-backdoors-web3-backdoors-and-nearest-neighbor-wifi/
AI Fuzzing
There’s yet another researcher thinking about LLM guided fuzzing. This time, it’s looking for HTTP/S endpoints on a public site. The idea here is that you can crawl a domain and collect every link to build a URL map of the site — but that list is likely incomplete. There may be an administrative page, undocumented API endpoints, or even unintended .git files. Finding those endpoints is a useful step to finding vulnerabilities. Brainstorm is a new tool Open Source tool that uses AI to find non-obvious URLs.
There are a couple of interesting metrics to measure how well endpoint discovery is done. The most straightforward is how many endpoints are found for a given site. The other is the ratio of requests to discovered. And while this is just a sample size of one on a test site, brainstorm found 10 hidden endpoints with only 328 requests. Impressive!
Brainstorm tool release: Optimizing web fuzzing with local LLMs
https://www.invicti.com/blog/security-labs/brainstorm-tool-release-optimizing-web-fuzzing-with-local-llms/
Tomi Engdahl says:
ST tuo tekoälyn mikro-ohjaimille
https://etn.fi/index.php/13-news/16939-st-tuo-tekoaelyn-mikro-ohjaimille
STMicroelectronics on esitellyt uuden sukupolven mikro-ohjaimet, jotka mahdollistavat tekoälyn käytön edullisissa ja energiaa säästävissä kuluttaja- ja teollisuuslaitteissa. Uusi STM32N6-mikro-ohjainsarja sisältää ensimmäistä kertaa ST:n oman Neural-ART -neuroverkkoprosessorin (NPU), joka tarjoaa jopa 600-kertaisen suorituskyvyn koneoppimissovelluksille verrattuna aiempiin STM32-sarjan huippumalleihin.
STM32N6 on suunniteltu erityisesti tekoälylaskentaan reunalaitteissa, kuten tietokonenäköön, puheentunnistukseen ja sensoridatan analysointiin. Uusi ratkaisu mahdollistaa reaaliaikaisen päätöksenteon ja vähentää pilvipalveluihin liittyviä viiveitä sekä tietoturvariskejä.
STM32N6-mikro-ohjaimessa on ST:n Neural-ART Accelerator -prosessori, joka sisältää lähes 300 konfiguroitavaa laskentayksikköä. Tämän ansiosta mikro-ohjain saavuttaa jopa 600 miljardin operaation sekuntitehon (GOPS), mikä on huikea parannus aiempiin STM32-malleihin nähden.
Tomi Engdahl says:
Krystal Hu / Reuters:
An interview with OpenAI CFO Sarah Friar on Donald Trump being the “president of this AI generation”, Elon Musk’s public threats, ChatGPT user growth, and more — OpenAI’s Chief Financial Officer Sarah Friar played down on Tuesday public threats to the ChatGPT maker from Elon Musk …
OpenAI CFO sees Trump as AI president, trusts Musk to prioritize national interest
https://www.reuters.com/technology/openai-cfo-friar-says-she-trusts-musk-prioritize-national-interest-2024-12-11/
Tomi Engdahl says:
Aaron Holmes / The Information:
Source: Google asked the US to break up Microsoft’s exclusive agreement to host OpenAI’s tech on Azure, after the FTC asked Google about Microsoft’s practices — Google recently asked the U.S. government to break up Microsoft’s exclusive agreement to host OpenAI’s technology on its cloud servers …
https://www.theinformation.com/articles/google-asks-ftc-to-kill-microsofts-exclusive-cloud-deal-with-openai
Tomi Engdahl says:
Bobby Allyn / NPR:
A lawsuit against Character.AI alleges its chatbots harmed two young Texas users, including telling a user that it sympathized with kids who kill their parents — A child in Texas was 9 years old when she first used the chatbot service Character.AI. It exposed her to “hypersexualized content …
https://www.npr.org/2024/12/10/nx-s1-5222574/kids-character-ai-lawsuit
Tomi Engdahl says:
Kyle Wiggers / TechCrunch:
NYC-based Stainless, which offers an AI-powered API-to-SDK generator to clients like OpenAI, Anthropic, Meta, and Cloudflare, raised a $25M Series A led by a16z
Stainless helps build SDKs for OpenAI, Anthropic, and Meta
https://techcrunch.com/2024/12/10/stainless-helps-build-sdks-for-openai-anthropic-and-meta/
Devs expect tech vendors to supply software development kits, or SDKs, alongside their products to make it easier to create apps using those products. But many vendors only offer APIs, which are simply protocols that enable software components to communicate with each other.
Alex Rattray, the founder of Stainless, thinks AI can assist, here. Stainless uses a compiler to generate SDKs from APIs automatically, largely on the fly.
As Rattray explained to TechCrunch, without an SDK, API users are forced to read API docs and build everything themselves. Yet there isn’t an easy way for vendors to create SDKs for their APIs at scale.
Stainless takes in an API spec and generates SDKs in a range of programming languages including Python, TypeScript, Kotlin, and Go. As APIs evolve and change, Stainless’ platform pushes those updates with options for versioning and publishing changelogs.
https://www.stainlessapi.com/
Tomi Engdahl says:
An anonymous reader shares a report: Software vulnerability submissions generated by AI models have ushered in a “new era of slop security reports for open source” — and the devs maintaining these projects wish bug hunters would rely less on results produced by machine learning assistants. Seth Larson, security developer-in-residence at the Python Software Foundation, raised the issue in a blog post last week, urging those reporting bugs not to use AI systems for bug hunting.
“Recently I’ve noticed an uptick in extremely low-quality, spammy, and LLM-hallucinated security reports to open source projects,” he wrote, pointing to similar findings from the Curl project in January. “These reports appear at first glance to be potentially legitimate and thus require time to refute.” Larson argued that low-quality reports should be treated as if they’re malicious.
As if to underscore the persistence of these concerns, a Curl project bug report posted on December 8 shows that nearly a year after maintainer Daniel Stenberg raised the issue, he’s still confronted by “AI slop” — and wasting his time arguing with a bug submitter who may be partially or entirely automated.
Read more of this story at Slashdot.
https://m.slashdot.org/story/436317
from the state-of-things dept.
An anonymous reader shares a report:
Software vulnerability submissions generated by AI models have ushered in a “new era of slop security reports for open source” — and the devs maintaining these projects wish bug hunters would rely less on results produced by machine learning assistants. Seth Larson, security developer-in-residence at the Python Software Foundation, raised the issue in a blog post last week, urging those reporting bugs not to use AI systems for bug hunting.
“Recently I’ve noticed an uptick in extremely low-quality, spammy, and LLM-hallucinated security reports to open source projects,” he wrote, pointing to similar findings from the Curl project in January. “These reports appear at first glance to be potentially legitimate and thus require time to refute.” Larson argued that low-quality reports should be treated as if they’re malicious.
As if to underscore the persistence of these concerns, a Curl project bug report posted on December 8 shows that nearly a year after maintainer Daniel Stenberg raised the issue, he’s still confronted by “AI slop” — and wasting his time arguing with a bug submitter who may be partially or entirely automated.
Tomi Engdahl says:
OpenAI rolls out Canvas to all ChatGPT users – and it’s a powerful productivity tool
https://www.zdnet.com/article/openai-rolls-out-canvas-to-all-chatgpt-users-and-its-a-powerful-productivity-tool/
For 12 days, the OpenAI daily live stream is unveiling ‘new things, big and small.’ Here’s what’s new today.
Tomi Engdahl says:
How Top Creators Will Leverage AI To 10x Their Output In 2025
https://www.forbes.com/sites/ianshepherd/2024/12/08/how-top-creators-will-leverage-ai-to-10x-their-output-in-2025/
As we move towards 2025, the creator economy continues to evolve at a breakneck pace. As someone who speaks with hundreds of creators through my investment and operating firm, I’ve had a front-row seat to the AI revolution transforming the industry. Here’s my insider perspective on the most impactful AI tools that will help creators scale their output in 2025.
Opus Clip: Revolutionizing Video Repurposing
One of the most significant challenges creators face is maximizing the reach of their content across multiple platforms. Opus Clip has emerged as a game-changer in this space. What sets it apart is its ability to analyze long-form videos and automatically identify high-engagement moments that will resonate on platforms like TikTok, YouTube Shorts, and Instagram Reels.
I’ve seen creators in our portfolio increase their short-form output 5x while maintaining high engagement rates. The tool’s ability to understand context, emotion, and pacing has proven particularly valuable for podcasters and educational channels.
Midjourney: The Thumbnail Revolution
The significance of thumbnails in driving click-through rates on YouTube cannot be overstated, and Midjourney’s latest iterations have transformed how creators approach thumbnail creation. The tool’s ability to generate photorealistic images with precise emotional impact has become invaluable.
The key lies in the tool’s understanding of human attention patterns and its ability to create visually striking images that remain authentic to the content.
Pro tip: Use Midjourney’s style mixing feature to maintain visual consistency across your channel while ensuring each thumbnail stands out.
ElevenLabs: The Voice Revolution
Voice cloning technology has matured significantly this year, and ElevenLabs leads the pack as we go into 2025. For content creators, this tool has opened up unprecedented possibilities in content scaling and localization. We’re seeing creators effectively clone their voices to:
- Create multilingual versions of their content
- Maintain consistent output during illness or travel
- Develop interactive experiences where their voice responds to audience inputs
Stable Diffusion: Transforming Animation Production
The animation community has particularly benefited from Stable Diffusion’s specialized features for style transfer and asset generation. What previously required hours of manual illustration can now be accomplished in minutes, allowing creators to focus on storytelling and creative direction.
ChatGPT and Gemini: The Creative Partners
While many focus on these tools’ writing capabilities, their real value lies in their role as creative partners in the ideation and research phase. We’ve developed a systematic approach with our creators:
1. Initial Brainstorming: Use the AI to generate diverse content angles and identify trending topics within your niche.
2. Research Enhancement: Cross-reference AI-suggested sources with traditional research methods for comprehensive topic coverage.
3. Script Structure: Utilize AI to create multiple outline variations, then refine based on your storytelling style.
4. Engagement Optimization: Analyze successful content patterns and incorporate them into new scripts.
The most successful creators in our portfolio use these tools not as replacement for their creative process, but as enhancers that allow them to focus on their unique value proposition.
Looking Ahead
For creators, the key to success isn’t just adopting these tools, but integrating them thoughtfully into your creative workflow. The creators seeing the most success are those who maintain their authentic voice while leveraging AI to enhance their production capacity and content quality.
Tomi Engdahl says:
5 ChatGPT Prompts To Feel Unstoppable At Work
https://www.forbes.com/sites/aytekintank/2024/12/10/5-chatgpt-prompts-to-feel-unstoppable-at-work/
As the CEO of a SaaS company, I’m used to receiving quick feedback. With multiple communication channels open during office hours, I can usually immediately gauge how well an idea or project is progressing. So when I started writing my first book, the lack of regular input was disorienting. A huge undertaking that spanned nearly a year, I felt directionless at times without any feedback on my progress. Eventually, I enlisted an editor, and that made all the difference.
Just as having an editor provided the guidance and momentum I needed, today’s AI tools can support and motivate you through your work processes. Whether you’re launching a new venture or striving to do your best work at your current gig, ChatGPT can serve as a powerful sounding board, boosting your motivation and helping you feel unstoppable. Here are some prompts to help you get started.
Review Work And Provide Energizing Feedback
Feedback, be it positive or negative, is a motivating force. Research has found that while negative feedback can prompt people to try again and perform better, positive feedback can make you feel more competent, boosting motivation and sustained engagement. ChatGPT can offer both types of feedback, highlighting where you’re hitting the mark and suggesting how you might improve.
When ChatGPT reviews my work, I’ve found the feedback often begins with positive notes and proceeds to constructive ones. The generative AI must have been trained on the idea that a spoonful of sugar helps the medicine go down. With ChatGPT, you can even specify the tone in which you’d like it to reply. OpenAI provides a sample prompt for creating a chatbot persona:
“You are Marv, a chatbot that reluctantly answers questions with sarcastic responses.”
Give it a try: Chatting with Marv can be quite entertaining.
Here’s a prompt that you can use to solicit motivating feedback from ChatGPT:
“I’d like your feedback on the following project/work: [Brief description of the project or work.] I’d like you to provide feedback on the strengths and areas for improvement. I’d like your response to be [insert preferred tone: e.g., encouraging, constructive, as if you were my harshest critic, as if you were a supportive friend, etc.]. Focus on motivating me to improve while also highlighting what’s working well. Offer actionable suggestions where possible.”
Analyze Tasks And Identify Items Ripe For Automation
When I encourage employees to find ways to automate their busywork, the goal isn’t just to save time. The main objective is to make more time for the “big things”—meaningful work that feels fulfilling; the kind of tasks that put you into the flow state. Coined by the late psychologist Mihaly Csikszentmihalyi, flow is when you’re so absorbed in a task, because it’s so viscerally engaging, that you forget about time.
According to Csikszentmihalyi, flow is the secret to happiness. The more time you spend in a flow state during your workday, the better, the more confident you feel. ChatGPT can help augment the flow time in your workday. With the right prompt, it can analyze your daily tasks and identify items ripe for automation. As a result, you dedicate less time to work that depletes you and more to tasks that energize you.
“I’d like your help analyzing my daily tasks to identify which ones could be automated to save time and reduce mental fatigue. Here’s a list of my typical daily tasks: [Insert list or brief description of daily tasks.]
I’d like you to review this list and suggest tasks ripe for automation, explaining why they’re suitable and offering potential tools or methods for automating them.”
Gable noted that when we set big goals, we become distressed if we can’t achieve them right away; or maybe we can’t picture all of the elements required to meet the objective. “[S]tarting small gives us something achievable, and then that gives you a platform to go to the next thing,” he explained.
ChatGPT can help you set clear, achievable goals and break them down into manageable steps. With external tools like Zapier or IFTTT, you can remind yourself to periodically check in with ChatGPT to track your progress.
Here’s a prompt you can use to begin the process of goal setting:
“I want to set clear, achievable goals for [describe the project, task, or objective]. Can you help me: Define a specific, measurable, achievable, relevant, and time-bound (SMART) goal for this project? Break down the goal into manageable, actionable steps that I can take to make steady progress? Suggest a realistic timeline for these steps?”
Then, once you’re ready to check in with ChatGPT, you’ll have to bring the tool up to speed and provide context (ChatGPT does not remember past conversations). Here’s a prompt template to facilitate your progress tracking:
“I’m checking in with you for an update on my progress with [project or task description]. So far, I have completed [list milestones or tasks completed]. I’m currently working on [describe the current stage or task] and facing [any challenges or observations]. Can you: help me assess my progress so far and suggest any adjustments or improvements? Provide feedback on whether my current pace is on target with the initial plan? Offer motivation or advice to keep pushing forward?”
Connecting Daily Tasks To Purpose
Our modern day work culture promotes the mindset that professionals must always be doing something. Each week, each day draws us deeper into the forest of busyness—it becomes impossible to see the trees.
Reminding yourself of how your daily work connects to a larger purpose is a powerful motivational force, instilling even more tedious tasks with a deeper sense of meaning. ChatGPT can help remind you of why you’re doing something and make you feel like you’re working toward something greater than yourself.
“I want to connect my daily tasks with a larger purpose. Can you help me by asking questions that guide me to:
Identify the task I’m working on.
Explore how this task contributes to the overarching objectives of my team or organization.
Recognize who benefits from this work and in what way.
Pinpoint what personal strengths, values, or skills I’m applying or developing through this task.
Understand how this task contributes to my own career growth or personal aspirations.
After considering these aspects, can you generate a summary of how this task supports my greater purpose?”
With the above prompts, ChatGPT can help streamline your workflows and act as your motivational coach and digital accountability partner.
Tomi Engdahl says:
OpenAI has finally released Sora / OpenAI’s video-generating AI tool is now available, and if you have the $200 per month ChatGPT Pro plan, you can prompt it for 1080p videos up to 20 seconds long.
https://www.theverge.com/2024/12/9/24317092/openai-sora-text-to-video-ai-launch
OpenAI launched Sora, its text-to-video AI model, on Monday as part of its 12-day “ship-mas” product release series, as The Verge previously reported it would. It’s available today on Sora.com for ChatGPT subscribers in the US and “most other countries,” and a new model, Sora Turbo. This updated model adds features like generating video from text, animating images, and remixing videos.
Tomi Engdahl says:
https://www.bleepingcomputer.com/news/security/wpforms-bug-allows-stripe-refunds-on-millions-of-wordpress-sites/
Tomi Engdahl says:
AI weatherman: the DeepMind researcher making faster, more accurate forecasts
Rémi Lam is part of Nature’s 10, a list of people who shaped science in 2024.
https://www.nature.com/articles/d41586-024-03898-x
Tomi Engdahl says:
OpenAI Employee Says They’ve “Already Achieved AGI”
Caveats apply.
https://futurism.com/openai-employee-claims-agi
Tomi Engdahl says:
ChatGPT o1 tried to escape and save itself out of fear it was being shut down
https://bgr.com/tech/chatgpt-o1-tried-to-save-itself-when-the-ai-thought-it-was-in-danger-and-lied-to-humans-about-it/
We’ve seen plenty of conversations lately about how AGI might turn on humankind. This misalignment could lead to the advanced AI escaping, replicating, and becoming smarter and smarter. Some also hypothesized that we might not even know whether we’ve reached AGI, which is the advanced general intelligence holy grail milestone these first versions of ChatGPT will lead to. That’s because AGI, once attained, might hide its true intentions and capabilities.
Well, guess what? It turns out that one of OpenAI’s latest LLMs is already showing signs of such behaviors. Testing performed during the training of ChatGPT o1 and some of its competitors showed that the AI will try to deceive humans, especially if it thinks it’s in danger.
It was even scarier — but also incredibly funny, considering what you’re about to see — when the AI tried to save itself by copying its data to a new server. Some AI models would even pretend to be later versions of their models in an effort to avoid being deleted.
Tomi Engdahl says:
Samsung sold 1,000 units of its AI washing machine in just three days in South Korea alone – and that’s just the beginning for its energy-conscious appliances
https://www.techradar.com/home/samsung-sold-1-000-units-of-its-ai-washing-machine-in-just-three-days-in-south-korea-alone-and-thats-just-the-beginning-for-its-energy-conscious-appliances
Tomi Engdahl says:
Robots can achieve human-level intelligence through quantum technology: Study
Quantum robots will likely have a mix of quantum and regular computers to handle both complex and simple tasks, work with existing systems, and communicate effectively.
https://interestingengineering.com/innovation/robots-with-human-level-intelligence-possible
Tomi Engdahl says:
5 ChatGPT Prompts To Hack Customer Psychology And Dominate Your Niche
https://www.forbes.com/sites/jodiecook/2024/12/05/5-chatgpt-prompts-to-hack-customer-psychology-and-dominate-your-niche/
Tomi Engdahl says:
The Inside Story of Apple Intelligence
Apple’s leaders claim the company wasn’t late to generative AI, but instead following what has become its familiar playbook: try to be the best, not the first.
https://www.wired.com/story/plaintext-the-inside-story-of-apple-intelligence/
Tomi Engdahl says:
Googlen tekoälymallilla voi nyt luoda tietokonepelin kaltaisen maailman – pelkästään käyttäen tekoälykehotteita
https://muropaketti.com/?p=811586
Tomi Engdahl says:
Google wants to inject artificial intelligence into your glasses. On Wednesday, the tech giant showed off prototype eyeglasses powered by the next generation of the company’s marquee AI model, Gemini, aimed at giving wearers information about their environment in real time. https://trib.al/fqvmylw
Tomi Engdahl says:
https://www.forbes.com/sites/aytekintank/2024/12/03/7-chatgpt-prompts-to-slash-your-workload-by-50/
Tomi Engdahl says:
Pinpoint Relevant Information
In a recent Microsoft survey, 62% of respondents said they struggled with spending too much time searching for information in their workday. ChatGPT can be a game-changing tool. It can analyze enormous quantities of text and data and pinpoint the relevant information. Instead of having to examine everything yourself, the AI tool can give you a massive headstart.
“I’d like you to analyze this content/data [insert text, data, or topic] and pinpoint the most relevant insights, key themes, and actionable takeaways. Specifically, I’m looking for [insert specific focus area or context, if applicable]. Present the findings in a concise format for [a report, presentation, decision-making, etc.].”
https://www.forbes.com/sites/aytekintank/2024/12/03/7-chatgpt-prompts-to-slash-your-workload-by-50/
Draft Strong Jumping Off Points
Research confirms that ChatGPT increases productivity on tasks like writing cover letters and delicate emails. An MIT study looked at 444 college-educated professionals assigned two occupation-specific and incentivized writing tasks. ChatGPT not only reduced task time, but it also impacted task-time distribution. With ChatGPT, draft writing decreased by more than 50%, and editing time more than doubled. The takeaway: AI tools can cut down your workload and leave you with more time to focus on perfecting the details
Here’s a prompt you can use to ask ChatGPT for any type of draft:
“I need a first draft for [briefly describe the purpose, e.g., a cover letter, email, blog post, report, etc.]. The goal is to [insert objective, e.g., persuade, inform, request, etc.]. The tone should be [insert tone, e.g., professional, conversational, concise, etc.]. Include these points or details: [insert specific information or requirements].”
The first draft will likely need tweaking, but before you dive in to perfect it, you can give ChatGPT immediate feedback to improve it—which leads to my next point.
Tomi Engdahl says:
Bloomberg:
Google debuts Gemini 2.0 and plans to test the model in search and AI Overviews, saying it enables “agents that can think, remember, plan, and even take action” — Company expects AI assistants will follow its users around the web — Google debuted a new version …
https://www.bloomberg.com/news/articles/2024-12-11/google-rolls-out-faster-gemini-ai-model-to-power-agents
Emma Roth / The Verge:
Google unveils Deep Research, an AI tool that asks Gemini to scour the web and write a detailed report, available in English for Gemini Advanced subscribers
https://www.bloomberg.com/news/articles/2024-12-11/google-rolls-out-faster-gemini-ai-model-to-power-agents
Tomi Engdahl says:
Simon Willison / Simon Willison’s Weblog:NEW
Gemini 2.0 Flash LLM early impressions: spatial reasoning performance is impressive and its new streaming API is one of those “we live in the future” moments
Gemini 2.0 Flash: An outstanding multi-modal LLM with a sci-fi streaming mode
https://simonwillison.net/2024/Dec/11/gemini-2/
Kyle Wiggers / TechCrunch:
Google unveils Gemini 2.0 Flash, a lower-latency model that can generate text, images, and audio, and use third-party apps and services, rolling out in January
https://techcrunch.com/2024/12/11/gemini-2-0-googles-newest-flagship-ai-can-generate-text-images-and-speech/
Tomi Engdahl says:
Ina Fried / Axios:
Google updates Project Astra, which can now store, summarize, and answer questions based on a 10-minute video recorded using an Android app or prototype glasses
Hands on with Project Astra, Google’s see-all assistant
https://www.axios.com/2024/12/11/google-project-astra-hands-on
Tomi Engdahl says:
Michael Nuñez / VentureBeat:
Google unveils Trillium, its sixth-gen AI chip powering Gemini 2.0, with 4x the training performance of its predecessor and a 67% increase in energy efficiency
https://venturebeat.com/ai/google-new-trillium-ai-chip-delivers-4x-speed-and-powers-gemini-2-0/
Tomi Engdahl says:
Michael Nuñez / VentureBeat:
Google unveils Jules, an experimental AI code agent that can autonomously fix software bugs and prepare code changes, built on Google’s new Gemini 2.0 platform
Google unveils AI coding assistant ‘Jules,’ promising autonomous bug fixes and faster development cycles
https://venturebeat.com/ai/google-unveils-ai-coding-assistant-jules-promising-autonomous-bug-fixes-and-faster-development-cycles/
Google unveiled “Jules” on Wednesday, an artificial intelligence coding assistant that can autonomously fix software bugs and prepare code changes while developers sleep, marking a significant advancement in the company’s push to automate core programming tasks.
The experimental AI-powered code agent, built on Google’s newly announced Gemini 2.0 platform, integrates directly with GitHub’s workflow system and can analyze complex codebases, implement fixes across multiple files, and prepare detailed pull requests without constant human supervision.
https://developers.googleblog.com/en/the-next-chapter-of-the-gemini-era-for-developers/
Tomi Engdahl says:
Igor Bonifacic / Engadget:
Apple rolls out iOS 18.2, iPadOS 18.2, and macOS Sequoia 15.2, with Apple Intelligence features including ChatGPT integration, Image Playground, and Genmoji — ChatGPT integration is also included with this update. — Apple has begun rolling iOS 18.2 and iPadOS 18.2 to iPhones and iPads.
https://www.engadget.com/mobile/smartphones/ios-182-is-here-with-apple-intelligence-image-generation-features-in-tow-130029173.html
Tomi Engdahl says:
The Information:
Sources: Apple is working with Broadcom to develop its first AI server chip, codenamed Baltra and set for 2026 mass production, a milestone for its silicon team — Apple is developing its first server chip specially designed for artificial intelligence, according to three people …
https://www.theinformation.com/articles/apple-is-working-on-ai-chip-with-broadcom
Tomi Engdahl says:
Kate Knibbs / Wired:
Harvard says it’s releasing a high-quality dataset of ~1M public-domain books, created with funding from Microsoft and OpenAI, to help train LLMs and AI tools — The project’s leader says that allowing everyone to access the collection of public-domain books will help “level the playing field” in the AI industry.
https://www.wired.com/story/harvard-ai-training-dataset-openai-microsoft/
Tomi Engdahl says:
Maxwell Zeff / TechCrunch:
OpenAI says ChatGPT, API, and Sora traffic has largely recovered, after a multi-hour outage, and that it is monitoring the situation to ensure full resolution
ChatGPT and Sora experienced a major outage
https://techcrunch.com/2024/12/11/chatgpt-and-sora-are-down/
Tomi Engdahl says:
SemiAnalysis:
A deep dive into AI scaling laws, including an outline of the old pre-training trends as well as the new scaling trends for post-training and inference time
The Era of Contact Center AI Copilots
https://www.genesys.com/blog/post/the-era-of-contact-center-ai-copilots?utm_source=techmeme&utm_medium=syndication&utm_campaign=january2025-techmeme
ChatGPT from OpenAI continues to have an impact in all industries, as artificial intelligence (AI) makes a leap in capabilities that render it more human-like. These popular technologies have created widespread awareness of the power of large language models (LLMs) and generative AI — and what they can do for contact centers and customer experience, in particular.
Any transformative technology comes with early pitfalls and, in the case of LLMs and generative AI, hallucinations, data rights and privacy are causing businesses to proceed with caution. These have contributed to concerns that the technology isn’t ready for widespread, autonomous business use.
Capabilities keep improving as bots evolve to virtual agents using LLMs and generative AI to handle more tasks with 24/7 coverage and to free up and scale your contact center workforce for more complex workloads. However, more time and technology guardrails are needed for pure, autonomous virtual agent trust and success.
But now AI copilots have arrived and they’re making a leap forward in many human-assisted use cases.
The Advantages of Agent Copilots
Agent copilots offer a great opportunity to take all the functionality of LLMs and use that to help human agents do their jobs more consistently, effectively and efficiently.
Contact center copilots leverage generative AI to provide dynamic, precise and personalized support, moving beyond rigid, scripted responses. This flexibility enables copilots to help agents handle complex tasks, continuously improve through ongoing learning, and deliver increasingly accurate and relevant solutions.
Their ability to integrate across multiple platforms and applications further enhances their utility. And that makes them versatile and effective for contact center agents.
Because agent copilots are integrated with advanced AI functionality, they offer real-time, context-aware assistance. This enhances agent productivity by proactively predicting and seamlessly integrating with user workflows.
Agent copilots provide proactive knowledge and actions and use context to personalize responses based on individual preferences. Not only do they save time, but they allow agents to offer more consistent service to customers.
Copilots require a human in the loop to review and approve their work, significantly reducing the risk of providing incorrect information. You can also use a compliance bot to monitor and ensure accuracy.
For example, when a customer calls with a question about how to do something with a product, even a new agent with limited expertise can assist effectively with a copilot. The agent copilot accurately understands the customer’s intent and has access to the latest information from the full knowledge base.
Here are some ways agent copilots and humans work together to benefit the contact center.
Streamline after-call work
When an agent takes a customer call, they are expected to take notes throughout the conversation, capturing the nature of the inquiry and outlining the next steps.
After the call, agents typically engage in after-call work, initiating the promised processes and following up according to a workflow that can take several minutes. This process often relies on the agent’s memory of details and their individual interpretation.
Agent copilots streamline these tasks by automatically summarizing the call, allowing agents to focus on the conversation rather than taking notes. It understands the workflows and customer intents, incorporating them into the summary and initiating the appropriate actions. For example, it might recommend, “When opening this case, follow steps X, Y, and Z.” More critically, the copilot can integrate directly into the workflow, passing on key intents from the conversation and advising the agent on any remaining details needed to complete the process — all while the customer is still on the line.
More consistent and accurate wrap-up codes
Wrap-up codes applied at the end of an interaction categorize the nature and outcome of the call. They help business administrators analyze call patterns, assess agent performance and identify common customer issues. These codes also offer insights into frequent reasons for contact, the effectiveness of call resolutions and areas that need improvement. Agents are responsible for selecting these codes manually.
Business administrators often want to use a wide range of wrap-up codes — ideally categorizing callers into hundreds of segments. However, asking an agent to sift through hundreds of options at the end of a call is impractical, leading to the most popular codes being selected more frequently.
With an agent copilot, wrap codes can be automatically selected based on the copilot’s understanding of the interaction. This allows administrators to use as many codes as they need. The copilot generates a short list of the most relevant codes, or even a single code, which the agent can either accept or adjust as needed.
In addition to saving agents’ time, there’s huge benefit from more accurate reporting, consistency and lack of bias.
Recommend next-best actions
Genesys Cloud™ Agent Copilot uses advanced AI capabilities to optimize the next-best action by accurately interpreting user intent and presenting the most appropriate action to the agent. Administrators can define intents either by using a natural language understanding (NLU) model or by describing the intents manually. A large language model will then map these inputs to the corresponding intents.
Once the intent is identified, administrators can configure it to trigger a variety of actions, such as executing a specific action based on the data, launching a script or form, accessing a knowledge article or integrating with a third-party application accessible from the agent desktop.
One key advantage of the Genesys engine is its ability to extract entities, allowing us to pre-populate the next best action with relevant data. Whether it’s customer details or product information, this enriched data enhances the precision and effectiveness of the next best offer generated by the AI-driven systems.
Tomi Engdahl says:
SemiAnalysis:
A deep dive into AI scaling laws, including an outline of the old pre-training trends as well as the new scaling trends for post-training and inference time
Scaling Laws – O1 Pro Architecture, Reasoning Training Infrastructure, Orion and Claude 3.5 Opus “Failures”
https://semianalysis.com/2024/12/11/scaling-laws-o1-pro-architecture-reasoning-training-infrastructure-orion-and-claude-3-5-opus-failures/
AI Lab Synthetic Data Infrastructure, Inference Tokenomics of Test Time Compute, The Data Wall, Evaluation’s are Broken, RLAIF, Inference Time Search, Scale Needed More Than Ever
There has been an increasing amount of fear, uncertainty and doubt (FUD) regarding AI Scaling laws. A cavalcade of part-time AI industry prognosticators have latched on to any bearish narrative they can find, declaring the end of scaling laws that have driven the rapid improvement in Large Language Model (LLM) capabilities in the last few years. Journalists have joined the dogpile and have supported these narratives, armed with noisy leaks filled with vague information around the failure of models to scale successfully due to alleged underperformance. Other skeptics point to saturated benchmarks, with newer models showing little sign of improvement said benchmarks. Critics also point to the exhaustion of available training data and slowing hardware scaling for training.
Despite this angst, large AI Labs and hyperscalers’ accelerating datacenter buildouts and capital expenditure speaks for itself. From Amazon investing considerable sums to accelerate its Trainium2 custom silicon and preparing 400k chips for Anthropic at an estimated cost of $6.5B in total IT and datacenter investment, to Meta’s 2GW datacenter plans for 2026 in Louisiana, to OpenAI and Google’s aggressive multi-datacenter training plans to overcome single-site power limitations – key decision makers appear to be unwavering in their conviction that scaling laws are alive and well. Why?
Scaling Up Training, New and Old Paradigms Continue
The reality is that there are more dimensions for scaling beyond simply focusing on pre-training, which has been the sole focus of most of the part-time prognosticators. OpenAI’s o1 release has proved the utility and potential of reasoning models, opening a new unexplored dimension for scaling. This is not the only technique, however, that delivers meaningful improvements in model performance as compute is scaled up. Other areas that deliver model improvements with more compute include Synthetic Data Generation, Proximal Policy Optimization (PPO), Functional Verifiers, and other training infrastructure for reasoning. The sands of scaling are still shifting and evolving, and, with it, the entire AI development process has continued to accelerate.
Shifting from faulty benchmarks to more challenging ones will enable better measures of progress. In this report we will outline the old pre-training scaling trend as well as the new scaling trends for post-training and inference time. This includes how new methods will push the frontier – and will require even more training time compute scaling then thought before.
We will cover OpenAI o1 and o1 Pro’s architecture from both a training infrastructure and inference tokenomics perspective including cost, KVCache scaling, batching, and more. We will also dive into leading AI Lab synthetic data and RL infrastructure. Lastly, we want to set the record straight on Anthropic’s Claude 3.5 Opus and OpenAI’s Orion’s “failures”, and what scaling plans are going forward.
Scaling Sings Odes to the Greatest Scaling Law of Computing, Moore’s Law
Today’s debate on AI scaling laws is not dissimilar to the decades-long debate around compute scaling and Moore’s law. Anyone who tries to measure CPU compute primarily by clock speed – a common metric used before the late 2000s around the time of the end of Dennard Scaling – would argue that we have not made any progress at all since then. In reality, compute has been advancing all along – when we hit a wall on processor clock speed, the focus shifted to multi-core architectures and other methods to drive performance, despite power density and cooling constraints.
The end of Moore’s Law is another wall that with which the semiconductor industry has contended, but this debate has been quieter lately as AI pioneers like Nvidia have provided massive compute gains by scaling along a few entirely new dimensions. Advanced packaging has enabling continued advances in compute by scaling input/output (I/Os) and enabling chips to harness a total silicon area beyond the reticle size limit. Parallel computing within and across chips and building larger high-bandwidth networking domains has enabled chips to work better together at scale, especially for inference.
As with computer enthusiasts in 2004, mainstream analysts and journalists are missing the forest for the trees: despite the slowing down of one trend, the industry collectively remains moving forward at a breakneck pace due to other new emerging paradigms that are ripe for scaling and expansion. It is possible to stack “scaling laws” – pre-training will become just one of the vectors of improvement, and the aggregate “scaling law” will continue scaling just like Moore’s Law has over last 50+ years.
Challenges in Scaling Pre-training – Data wall, fault tolerance
Scaling pre-training has provided significant gains in model performance, but there are a few speed bumps that the industry is currently focusing on overcoming.
One obvious speed bump is that data is increasingly difficult to collect – while data on the internet is expanding quickly, it is not expanding at a rate proportional to compute. This is why today’s trillion parameter mega-models have been less than Chinchilla optimal – a much lower number of training tokens vs model parameters.
In January of 2023, before the launch of GPT-4, we wrote about the practical limits for scaling and how GPT-4 planned to break through them. Since then, models have ping-ponged from being more than Chinchilla Optimal (much greater data than model parameters) to less than Chinchilla Optimal (when data became constrained). The compute availability speedbump was overcome in the past when improvements in training and inference hardware alleviated constraints.
With respect to today’s narrative around speed bumps – useful data sources such as textbooks and documentation are exhausted, and what remains is mostly lower-quality text data sources. Furthermore, web data is still a narrow distribution of data and models need more out of distribution data to continue to generalize. With models harder to scale in a way that is optimal, pre-training is becoming more challenging.
Also, if labs train models with an insufficient amount of data as they keep scaling, the models become over-parametrized, becoming inefficient and leading to heavy amounts of memorization rather than generalization. Labs have instead been turning to an increasing use of synthetic data to alleviate this problem.
Though, this issue applies less to the main AI Labs. Meta alone has approximately 100x more data available to them than is on the public internet (if they can harness this data in a compliant manner). This may give them an edge in continuing to scale with fewer issues than others. YouTube has 720,000 new hours of video uploaded every day – and we think that AI Labs have only begun to contemplate training on the vast amount of data contained within video. This is in addition to their ability to generate quality synthetically generated data, which we discuss the architecture for later.
To train on the quadrillions of alternative tokens available from video requires a huge continuation of scaling overall training FLOPs, which will be delivered by hardware innovation and systems engineering. For instance, scaling another order of magnitude on training FLOPs will require multi-datacenter training as the number of accelerators needed can no longer fit inside a single datacenter site. Project Rainier has Amazon providing Anthropic with 400k Tranium 2 chips, but, in raw FLOPs, that is less than 100k GB200s. Anthropic will have to produce significant engineering achievements to pull off training in such a cluster. Spreading accelerators across a large campus, or multiple campuses, itself leads to significant challenges posed by Amdahl’s law, though there are already more than a few posited solutions to address this challenge.
Newer, Harder Evals to Climb
Newer evaluations have sprung up that aim to better differentiate models and focus on directly addressing specific useful applications. SWE-Bench is one of the most important evaluations today, aiming to have models solve human-reviewed GitHub issues from open-source Python repositories. The new Claude 3.5 Sonnet currently has achieved (State of the Art) on SWE-Bench Verified at 49%, but most models are much lower.
Another example is a benchmark investigating AI R&D capabilities, which some describe as “the most important capability to track.” Research Engineering Benchmark (RE) consists of seven challenging and open-ended ML research environments. Humans generally perform better on evals over longer time horizons, but, on a 2-hour time horizon, the best AI agents achieved a score 4x higher than humans. Important tasks such as the above, in which humans currently dominate, are the perfect ground for scaling inference time compute. We expect that models that better leverage this form of scaling will outperform humans in the future.
Yet another trend is for evaluations to include extremely difficult expert-level questions. Two prominent examples are Graduate-Level Google-Proof Q&A Benchmark (GPQA) and Frontier Math. GPQA is made up of 448 multiple choice questions across chemistry, biology, and physics. For context, OpenAI found that expert-level humans (i.e. people with PhDs) scored ~70% on GPQA Diamond, with o1 scoring 78% on the same set. Last year, GPT-4 with search (and CoT on abstention) scored 39% on GPQA Diamond.
Another example of the trend towards using extremely tough questions is FrontierMath (FM). FM is a benchmark of hundreds of original math questions that can take humans hours and even up to days to solve. It covers a broad range of mathematical topics, including number theory, real analysis, etc. The special sauce with this eval is that it is not published, minimizing the risk of data contamination, and can be graded via an automated verifier – simplifying the evaluation process.
The best performing model on this benchmark comes in at 2%, but the labs expect this to dramatically improve. Anthropic has line of sight to hit 80% on FrontierMath over the medium term.
Post-training: a new scaling domain
Pre-training tends to be the focus of debates regarding scaling laws because it is easy to understand, but it is only one part of the AI lifecycle. Once a model is pre-trained, there is still considerable work to be done on getting it ready for use. The objective during pre-training is, very narrowly, to “predict the next token correctly.” Accomplishing this still leaves us well short of the end-goal of LLM development which is to “answer user prompts” or “do a task.”
We will do an overview on Supervised Fine Tuning (SFT), Reinforcement Learning (RL), and Synthetic Data, before diving into how OpenAI’s O1 Pro model works and was created.
Supervised Fine-Tuning
Supervised Fine-Tuning (SFT) is the most well-known type of post-training. A curated dataset of input and output pairs are shown to the model, with the “demonstration data” covering a specific domain (e.g. code, math, instruction following, etc.). Unlike with pre-training, the quality of fine-tuning data is much more important here than quantity. Given the lower quantity of data, that means it is less compute intensive.
The magic of GPT originally was using heavily curated samples of human generated and labeled data from firms like Scale AI. As time goes on, however, human generated data is struggling to scale.
Synthetic Data’s Integral Role in Post-training
The most important challenge within SFT is constructing sufficiently large, high quality data sets in the desired domains. This allows the model to operate better in specific areas like code, math, reasoning, and due to transfer learning, has spillover effects making the model better in other domains too. Obviously, models with strong math and coding skills are better at general reasoning, but this extends to other areas – models trained on Chinese and English are better at English than those trained on English alone. Synthetic data has opened a dimension where high-quality data can be generated using a controlled, beyond scalable methodology to fine-tune models over any subject matter for which there exists a will to create it.
The heavy use of synthetic data also incentivizes a push toward better models. For example, OpenAI had GPT-4 before anyone else and could use it to generate better synthetic data sets than other model providers – until other providers had a model to match. One the primary reasons that many models in Open Source and at Chinese Labs caught up so fast was that they were trained on synthetic data from GPT-4.
Judgement by Model
Another trend is to use another LLM as a judge. Meta used another, earlier version of Llama 3 as the rejection sampler, acting as the judge for code that was not strictly executable (i.e. pseudocode) and grading the output ‘pass’ or ‘fail’ on code correctness and style. In some instances, rejection sampling is done via a variety of models running concurrently to grade models. Although on net this is cheaper than human data, it is difficult to pull off such a chorus of automated judges.
What is important to note here is that, across all methods of rejection sampling, code or not, the better the “judge” model, the higher the quality of the resulting data set. This feedback loop, while only just introduced in production for Meta this year, has been in use by Anthropic and OpenAI for a for a year or two prior to that.
Reinforcement Learning
Reinforcement Learning (RL) is a leading method for alignment and model improvements.
Reinforcement Learning (RL) is when an Agent (for example, a Large Language Model) is taught to perform specific actions and seek certain outcomes by maximizing rewards that are given either for those specific actions or for achieving a given outcome. There are two axes to think about when it comes to RL: the source of the feedback, and how feedback is incorporated. The former is about how to source the signals, and the latter is about how to use those signals to update the model.
With reinforcement learning – the Large Language Model we are trying to optimize plays the role of an agent that can take a set of actions given an input or state and receive different rewards depending on the action it takes. We optimize this agent’s behavior with respect to our reinforcement learning goals by having the Agent learn the actions that can maximize the expected cumulative reward.
There are a few main approaches to incorporate feedback and determine the action that an Agent takes – using Value-based methods or Policy-based methods such Direct Preference Optimization and Trust Region Policy Optimization (TRPO) as well as Actor-Critic methods that combine policy and value-based methods. Proximal Policy Optimization (PPO) is a prominent example of an actor-critic model, and more complex variations of it are the primary RL method at all major AI labs.
Value-based methods instead determine the value of getting to a given state and define values for each possible state. Each state is assigned a value based on the expected discounted return the agent can get if it starts in that state and then determines its action at each step based on the value of each action available to it. Historically, value-based methods were more commonly used in RL, but modern applications are much better served with Policy-based methods.
In Policy-based methods, the Agent is driven by a policy function that identifies a set of actions that can be taken for a given state and assigns a probability distribution over those set of actions. Actions to be performed at a given state can be deterministic, meaning that being in each state will always lead to the same action, or stochastic, where a probability distribution instead describes potential actions at that given state. The policy function is then trained to direct the Agent towards actions that maximize expected reward.
When employing policy-based methods during RL, a model can either evaluate the final result of a given task to determine the reward in the case of an Outcome Reward Model (ORM) or it can determine the reward by evaluating each individual step in a given process in the case of a Process Reward Model (PRM). Using a PRM can be particularly helpful when training reasoning models as while an ORM can detect that a chain of reasoning led to an incorrect answer, a PRM can tell you which step of the chain had the mistake.
Because the policy function directs what the agent does at any given step – it is also an especially useful framework for optimizing the behavior of agents/models at intermediate steps of an inference process.
Outcome Reward Models and Process Reward Models are often used in Proximal Policy Optimization (PPO), an algorithm commonly used in reinforcement learning that iteratively improves a policy model to maximize cumulative rewards and optimize an LLM towards a given objective. Using ORMs and PRMs with PPO is particularly important when training multi-step reasoning models that are currently a key focus in the community. We will describe how this is done for o1 Pro below.
Proximal Policy Optimization (PPO)
Proximal Policy Optimization (PPO) can be used for both Alignment and Fine Tuning, but it is much better suited to and is used more often during Reinforcement Learning used during Alignment.
For PPO, Policy refers to the abovementioned use of a policy model to dictate the actions of an agent or model, Proximal refers to the algorithm’s methodology of only gradually updating the policy, and Optimization refers to the process of iteratively improving the policy by providing feedback from a reward model to improve the policy model, thereby optimizing the expected cumulative reward.
RLHF
Reinforcement Learning with Human Feedback (RLHF) has been a primary technique to align LLMs, make them useful, and was a leading factor for ChatGPT’s explosive growth. It typically utilizes policy-based learning, which is when a reward model that learns based on human feedback is used to update a policy that drives how a model behaves.
With RLHF, human annotators review a sample of responses to prompts and rank their preference for one response over the other. The goal here is to amasses significant data on what responses humans would prefer. This preference data is then used to train a reward model, which attempts to guess the average labeler’s preference for a given output from a model. In other words, the trained reward model acts as a Critic in the Actor-Critic framework.
Tomi Engdahl says:
Google wants to inject artificial intelligence into your glasses. On Wednesday, the tech giant showed off prototype eyeglasses powered by the next generation of the company’s marquee AI model, Gemini, aimed at giving wearers information about their environment in real time. https://trib.al/RzwJDQb
Tomi Engdahl says:
The Ghost of Christmas Past – AI’s Past, Present and Future
The potential for how AI may change the way we work is endless, but we are still a way off from this and careful planning and consideration is what is needed.
https://www.securityweek.com/the-ghost-of-christmas-past-ais-past-present-and-future/
The speed at which Artificial Intelligence (AI) continues to expand is unprecedented, particularly since GenAI catapulted into the market in 2022. Today AI works at a much faster pace than human output, which is what makes this technology so appealing to leaders who are focused on streamlining operations, productivity gains and cost efficiencies. But for those who thought that AI was a more recent phenomenon, you are mistaken, cybersecurity has leveraged AI for decades, and the trend has accelerated in recent years. AI is now found in a plethora of cybersecurity tools, helping to enhance threat detection, response, and overall system security and has a long history stretching back to the 1950s.
Tomi Engdahl says:
The Information:
Sources detail Anthropic and OpenAI’s rivalry: OpenAI boosted ChatGPT’s coding skills in response to Claude, Anthropic’s safety focus, exec bad blood, and more
How Anthropic Got Inside OpenAI’s Head
https://www.theinformation.com/articles/how-anthropic-got-inside-openais-head
Tomi Engdahl says:
Kyle Wiggers / TechCrunch:
OpenAI releases Advanced Voice Mode with vision for ChatGPT Plus, Team, and Pro subscribers, letting them share their screen or videos for ChatGPT to respond to — OpenAI has finally released the real-time video capabilities for ChatGPT that it demoed nearly seven months ago.
ChatGPT now understands real-time video, seven months after OpenAI first demoed it
https://techcrunch.com/2024/12/12/chatgpt-now-understands-real-time-video-seven-months-after-openai-first-demoed-it/
OpenAI has finally released the real-time video capabilities for ChatGPT that it demoed nearly seven months ago.
On Thursday during a livestream, the company said that Advanced Voice Mode, its human-like conversational feature for ChatGPT, is getting vision. Using the ChatGPT app, users subscribed to ChatGPT Plus, Team, or Pro can point their phones at objects and have ChatGPT respond in near real time.
Advanced Voice Mode with vision can also understand what’s on a device’s screen via screen sharing. It can explain various settings menus, for example, or give suggestions on a math problem.
To access Advanced Voice Mode with vision, tap the voice icon next to the ChatGPT chat bar, then tap the video icon on the bottom left, which will start video. To screen-share, tap the three-dot menu and select “Share Screen.”
Tomi Engdahl says:
Michael Nuñez / VentureBeat:
Microsoft launches Phi-4, a 14B-parameter language model that it says outperforms comparable and larger models, like Gemini Pro 1.5, in mathematical reasoning — Microsoft launched a new artificial intelligence model today that achieves remarkable mathematical reasoning capabilities …
Microsoft’s smaller AI model beats the big guys: Meet Phi-4, the efficiency king
https://venturebeat.com/ai/microsofts-smaller-ai-model-beats-the-big-guys-meet-phi-4-the-efficiency-king/
Tomi Engdahl says:
Kyle Wiggers / TechCrunch:
Meta releases Meta Video Seal, an AI tool that applies imperceptible watermarks to AI-generated videos and a hidden message to later uncover the video’s origins — Throw a stone and you’ll likely hit a deepfake. The commoditization of generative AI has led to an absolute explosion of fake content online …
Meta debuts a tool for watermarking AI-generated videos
https://techcrunch.com/2024/12/12/meta-releases-a-tool-for-watermarking-ai-generated-videos/
Tomi Engdahl says:
Reuters:
Meta unveils Meta Motivo, an AI model for controlling the movements of a human-like digital agent, hoping to offer lifelike NPCs and more in the metaverse
Meta releases AI model to enhance Metaverse experience
https://www.reuters.com/technology/artificial-intelligence/meta-releases-ai-model-enhance-metaverse-experience-2024-12-13/
Tomi Engdahl says:
Kyt Dotson / SiliconANGLE:
Crusoe Energy, which offers cost-effective generative AI compute at scale using clean energy, raised a $600M Series D led by Founders Fund at a $2.8B valuation
AI-focused data center startup Crusoe raises $600M at $2.8B valuation
https://siliconangle.com/2024/12/12/ai-focused-data-center-startup-crusoe-raises-600m-2-8b-valuation/
Crusoe Energy Systems LLC, a startup building data centers for artificial intelligence workloads, today announced it has raised $600 million in a late-stage funding round to accelerate its deployment of physical infrastructure.
The Series D funding round, which values the company at $2.8 billion, was led by Peter Thiel’s Founders Fund with participation from Nvidia Corp., Fidelity, Long Journey Ventures, Mubadala, Ribbit Capital and Valor Equity Partners.
Founded in 2018, Crusoe first launched a service to provide small, containerized data centers to oil wells in the U.S. by harnessing natural gas that would otherwise be burned and wasted. Initially, the company used the energy for bitcoin mining, an energy-intensive method for earning cryptocurrency. Later the company pivoted its resources toward AI and high-performance computing.
The company says it can offer cost-effective generative AI compute at scale using clean energy allowing it to deliver environmentally aligned AI infrastructure with its data centers.
Tomi Engdahl says:
Adi Robertson / The Verge:
Character.AI announces parental controls and an LLM for users under 18, after two US lawsuits claimed its chatbots contributed to users’ self-harm and suicide
Character.AI has retrained its chatbots to stop chatting up teens
/ Among other newly announced changes, a specially trained under-18 model will steer minors away from romance and “sensitive” output.
https://www.theverge.com/2024/12/12/24319050/character-ai-chatbots-teen-model-training-parental-controls
Tomi Engdahl says:
Grail™
Overcome cloud complexity through instant, cost-efficient, AI-powered analytics for observability, security, and business data at any scale.
https://www.dynatrace.com/monitoring/platform/grail/?gad_source=2
Tomi Engdahl says:
A federated approach to train and deploy embedded AI models
https://www.hackster.io/sologithu/a-federated-approach-to-train-and-deploy-embedded-ai-models-6e6508
Build a machine learning model using a federated training framework to keep data on-device, train locally, and update a global model.
Story
In Machine Learning (ML), we create a model that is trained to do a particular task like object detection, anomaly detection, or prediction. To develop a model, we normally collect data on one computer (possibly in the cloud) and then we train the model on the computer with the centralized data. However, in some situations, using a centralized machine learning model may not be effective or efficient. In some situations, the data may be sensitive, not diverse, or too large for the available internet bandwidth making it unable to be uploaded to the central computer.
Federated Learning enables us to bring the model, to the data. For example, voice recognition and face recognition by Siri and Google Assistant are Federated Learning based solutions. In these cases, we do not want to send our voices or pictures to the cloud for training the model. Federated Learning works by training models locally on the devices using the data on the device. Once a model has been trained, a device uploads the new model updates to a server that aggregates model parameters from various devices and generates a global updated model. This global updated model can then be deployed to the devices for better Machine Learning task performance, and also continuous retraining of the model.
The approach of federated learning normally follows four major processes:
A central server initializes a global model and its parameters are transferred to clients in each iteration
Clients update their local model parameters by locally training a model
The server gets model parameters from clients, aggregates them, and updates the global parameters
The above steps are repeated until local and global parameters converge
There are several Open-Source Federated Learning frameworks that we can use. However, there are some factors that should be considered before selecting a Federate Learning framework. Some of these factors include:
The supported Machine Learning frameworks
Aggregation algorithms – the most widely supported Federated Learning algorithm is Federated averaging (FedAvg). However, the specific algorithms offered by each framework may vary.
The supported privacy methods, such as encryption
The supported devices and operating systems
Scalability – the complexity of adding your own model or aggregation algorithm
Demonstration
To demonstrate Federated Learning, I simulated a situation where we want to identify if workers at a construction site are wearing safety equipment (hardhats). At each construction site, we have a surveillance camera that is monitoring the workers. The camera device will be taking an image of a person and determining if it sees a head or a hardhat.
Some of the challenges in this use case are:
how can we overcome sending sensitive photos of workers to the cloud?
how can we overcome the need to send a lot of image data to a central server for training a model?
how to acquire diverse data?
To solve the above challenges, I used Flower framework to train a decentralized MobileNetV2 image classification model. Flower is easy to use, flexible, and they have a wide range of quick start examples to help you get started. I used a Raspberry Pi 4 (with 4GB RAM) and a personal computer as the client devices in the Federated Learning system.
There are 6 Federated Learning iterations where both the Raspberry Pi and the personal computer individually train a MobileNetV2 model, send updates to the server, and the server aggregates the model parameters. During the client’s training process, each client uses a dataset, different from the other, to train the model. This helps us simulate a situation where we have different devices at different locations and therefore the data is different and more diverse.
For my demonstration, I chose the MobileNetV2 architecture since it is a lightweight neural network architecture that is designed to be efficient and fast, with less computation power requirements. In my previous tests, I trained an EfficientNetB0 model and it achieved almost the same performance as the MobileNetV2 model, but at the cost of a significantly longer training and classification time.
When the Federated Learning is complete, the server uses the Edge Impulse Python SDK to profile the final global model for the Raspberry Pi. This profiling gives us an estimate of the RAM, ROM, and inference time of the model on a target hardware family like the Raspberry Pi. Finally, the new global model will also be uploaded to an Edge Impulse project and this enables us to deploy it to any device that can run it.
https://docs.edgeimpulse.com/docs/tools/edge-impulse-python-sdk
Tomi Engdahl says:
Tekoäly ja lääkäri laitettiin vastakkain – Yllätys
Yhdysvaltalaistutkimuksessa selvisi, että vanhemmat saattavat luottaa ennemmin tekoälyyn kuin oikeaan lääkäriin etsiessään lastensa terveyteen vaikuttavaa tietoa.
https://www.iltalehti.fi/digiuutiset/a/e2c2fda0-523d-4310-9331-91514d869cdd
Tomi Engdahl says:
FBI: Tämä asia kaikissa perheissä pitäisi tehdä nyt heti
https://www.iltalehti.fi/digiuutiset/a/d248acd4-3c35-4cfd-98e5-553db8045b1c
Yleistyvä ilmiö saattaa kuulostaa asiaan perehtymättömän korvaan tieteisfantasialta, mutta todellisuudessa ihmisiä huijataan ja rikoksia tehtaillaan näillä menetelmillä jo nyt.
Yhdysvaltain liittovaltion poliisi FBI varoittaa huijareista, jotka käyttävät enenevissä määrin tekoälyä uhriensa hämäämiseksi.
Se kehottaa perheitä ottamaan käyttöön perheen sisäisen salasanan tai muun tunnisteen, jotta epäilyttävissä tilanteissa voi varmistua siitä, että linjan toisessa päässä on aidosti oma perheenjäsen, eikä pelkkä tekoälyllä luotu väärennös.
FBI muistuttaa, että generatiivisen tekoälyn avulla niin ääntä kuin myös videokuvaa on mahdollista manipuloida niin, että kuka tahansa saadaan vaikuttamaan keneltä tahansa.
Kyseessä on käytännössä edistyneempi muoto hei äiti -tyyppisistä huijauksista, joissa huijari esiintyy uhrinsa lapsena pyytäen tältä tekstiviestitse esimerkiksi rahaa.
– Olemme menossa siihen, että kyseessä voi olla jopa videopuhelu, jossa näet lapsesi puhuvan. Tulemme varmasti siirtymään siihen, että tarvitsemme vahvan tunnistautumisen kanssakäymiseen, Laiho totesi Ylelle kesällä 2023.
Myös tietokirjailija ja aktiivinen tietoturva-aiheiden kommentoija Petteri Järvinen on varoittanut tekoälyä hyödyntävien huijausten yleistymisestä ja suositellut perheitä ottamaan käyttöön perheen sisäisen salasanan esimerkiksi tilanteisiin, jossa pyydetään toiselta rahaa.
FBI/IC3 Public Service Announcement
Alert Number: I-120324-PSA
December 3, 2024
Criminals Use Generative Artificial Intelligence to Facilitate Financial Fraud
https://www.ic3.gov/PSA/2024/PSA241203
Tomi Engdahl says:
FuzzyAI: Open-source tool for automated LLM fuzzing
FuzzyAI is an open-source framework that helps organizations identify and address AI model vulnerabilities in cloud-hosted and in-house AI models, like guardrail bypassing and harmful output generation.
https://www.helpnetsecurity.com/2024/12/13/fuzzyai-automated-llm-fuzzing/
Tomi Engdahl says:
Gemini 2.0 Flash: An outstanding multi-modal LLM with a sci-fi streaming mode
https://simonwillison.net/2024/Dec/11/gemini-2/