3 AI misconceptions IT leaders must dispel

https://enterprisersproject.com/article/2017/12/3-ai-misconceptions-it-leaders-must-dispel?sc_cid=7016000000127ECAAY

 Artificial intelligence is rapidly changing many aspects of how we work and live. (How many stories did you read last week about self-driving cars and job-stealing robots? Perhaps your holiday shopping involved some AI algorithms, as well.) But despite the constant flow of news, many misconceptions about AI remain.

AI doesn’t think in our sense of the word at all, Scriffignano explains. “In many ways, it’s not really intelligence. It’s regressive.” 

IT leaders should make deliberate choices about what AI can and can’t do on its own. “You have to pay attention to giving AI autonomy intentionally and not by accident,”

6,254 Comments

  1. Tomi Engdahl says:

    OpenAI Introduces CriticGPT: A New Artificial Intelligence AI Model based on GPT-4 to Catch Errors in ChatGPT’s Code Output
    https://www.marktechpost.com/2024/06/28/openai-introduces-criticgpt-a-new-artificial-intelligence-ai-model-based-on-gpt-4-to-catch-errors-in-chatgpts-code-output/

    In the rapidly advancing field of Artificial Intelligence (AI), it is crucial to assess the outputs of models accurately. State-of-the-art AI systems, such as those built on the GPT-4 architecture, are trained via Reinforcement Learning with Human Feedback (RLHF). Because it is typically quicker and simpler for humans to evaluate AI-generated outputs than it is to create perfect examples, this approach uses human judgments to direct the training process. However, even specialists find it difficult to assess the accuracy and quality of these outputs consistently as AI models get more complex.

    To overcome this, OpenAI researchers have introduced CriticGPT, a very important tool that helps human trainers spot errors in ChatGPT’s responses. CriticGPT’s primary purpose is to produce thorough criticisms that draw attention to mistakes, especially in code outputs. This model has been created to overcome the inherent limitations of human review in RLHF. It offers a scalable supervision mechanism that improves the precision and dependability of AI systems.

    Reply
  2. Tomi Engdahl says:

    TIME:
    Sam Altman and Arianna Huffington announce Thrive AI Health, a new startup backed by OpenAI and Thrive Global to build a “hyper-personalized AI health coach” — A staggering 129 million Americans have at least one major chronic disease—and 90% of our $4.1 trillion …

    AI-Driven Behavior Change Could Transform Health Care
    https://time.com/6994739/ai-behavior-change-health-care/

    A staggering 129 million Americans have at least one major chronic disease—and 90% of our $4.1 trillion in annual health care spending goes toward treating these physical and mental-health conditions. That financial and personal toll is only projected to grow.

    We know this is unsustainable. But there are solutions, because health outcomes are shaped by more than just medical care or genes. Behavior change can be a miracle drug, both for preventing disease and for optimizing the treatment of disease.

    Yes, behavior change is hard. But through hyper-personalization, it’s also something that AI is uniquely positioned to solve.

    AI is already greatly accelerating the rate of scientific progress in medicine—offering breakthroughs in drug development, diagnoses, and increasing the rate of scientific progress around diseases like cancer. In fact, OpenAI is partnering with Color Health on an AI copilot to assist doctors in cancer screening and in creating treatment plans after a doctor has made a diagnosis.

    But humans are more than medical profiles. Every aspect of our health is deeply influenced by the five foundational daily behaviors of sleep, food, movement, stress management, and social connection. And AI, by using the power of hyper-personalization, can significantly improve these behaviors.

    These are the ideas behind Thrive AI Health, the company the OpenAI Startup Fund and Thrive Global are jointly funding to build a customized, hyper-personalized AI health coach that will be available as a mobile app and also within Thrive Global’s enterprise products. It will be trained on the best peer-reviewed science as well as Thrive’s behavior change methodology—including Microsteps, which are tiny daily acts that cumulatively lead to healthier habits.

    Reply
  3. Tomi Engdahl says:

    Bloomberg:
    IDC: ~3% of PCs shipped in 2024 will meet Microsoft’s AI PCs processing power threshold; a source says some big app makers rebuffed a push for on-device AI — – App makers like Adobe and Salesforce haven’t yet signed on — Only a small share of PCs sold this year will be AI-optimized

    Qualcomm, Microsoft Lean on AI Hype to Spur PC Market Revival
    https://www.bloomberg.com/news/articles/2024-07-08/qualcomm-microsoft-lean-on-ai-hype-to-spur-pc-market-revival

    App makers like Adobe and Salesforce haven’t yet signed on
    Only a small share of PCs sold this year will be AI-optimized

    Reply
  4. Tomi Engdahl says:

    Kyle Wiggers / TechCrunch:
    Quora’s Poe launches Previews, letting users create web apps like data visualizations, games, and animations using more than one LLM like Llama 3 and GPT-4o

    Quora’s Poe now lets users create and share web apps
    https://techcrunch.com/2024/07/08/quoras-poe-now-lets-users-create-and-share-web-apps/

    Poe, Quora’s subscription-based, cross-platform aggregator for AI-powered chatbots like Anthropic’s Claude and OpenAI’s GPT-4o, has launched a feature called Previews that lets users create interactive apps directly in chats with chatbots.

    Previews allows Poe users to build data visualizations, games and even drum machines by typing things like “Analyze the information in this report and turn it into a digestible and interactive presentation to help me understand it.” The apps can be created using more than one chatbot (say, Meta’s Llama 3 and GPT-4o) and draw on info from uploaded files including videos, and can be shared with anyone via a link.

    Previews are a lot like Anthropic’s recently introduced Artifacts, dedicated workspaces where users can edit and add to AI-generated content like code and documents. But Artifacts is limited to Anthropic’s models, whereas Previews supports HTML output — with CSS and Javascript functionality at the moment (and more to come in the future, Quora’s pledging) — from any chatbot.

    Reply
  5. Tomi Engdahl says:

    Maxwell Zeff / TechCrunch:
    Anthropic now lets developers use Claude 3.5 Sonnet to generate, test, and evaluate their prompts, and adds new features, like generating automatic test cases — Prompt engineering became a hot job last year in the AI industry, but it seems Anthropic is now developing tools to at least partially automate it.

    Anthropic’s Claude adds a prompt playground to quickly improve your AI apps
    https://techcrunch.com/2024/07/09/anthropics-claude-adds-a-prompt-playground-to-quickly-improve-your-ai-apps/

    Prompt engineering became a hot job last year in the AI industry, but it seems Anthropic is now developing tools to at least partially automate it.

    Anthropic released several new features on Tuesday to help developers create more useful applications with the startup’s language model, Claude, according to a company blog post. Developers can now use Claude 3.5 Sonnet to generate, test and evaluate prompts, using prompt engineering techniques to create better inputs and improve Claude’s answers for specialized tasks.

    Language models are pretty forgiving when you ask them to perform some tasks, but sometimes small changes to the wording of a prompt can lead to big improvements in the results. Normally you’d have to figure out that wording yourself, or hire a prompt engineer to do it, but this new feature offers quick feedback that could make finding improvements easier.

    The features are housed within Anthropic Console under a new Evaluate tab. Console is the startup’s test kitchen for developers, created to attract businesses looking to build products with Claude. One of the features, unveiled in May, is Anthropic’s built-in prompt generator; this takes a short description of a task and constructs a much longer, fleshed out prompt, utilizing Anthropic’s own prompt engineering techniques. While Anthropic’s tools may not replace prompt engineers altogether, the company said it would help new users, and save time for experienced prompt engineers.

    Within Evaluate, developers can test how effective their AI application’s prompts are in a range of scenarios. Developers can upload real-world examples to a test suite or ask Claude to generate an array of AI-generated test cases. Developers can then compare how effective various prompts are side-by-side, and rate sample answers on a five-point scale.

    In an example from Anthropic’s blog post, a developer identified that their application was giving answers that were too short across several test cases. The developer was able to tweak a line in their prompt to make the answers longer, and apply it simultaneously to all their test cases. That could save developers lots of time and effort, especially ones with little or no prompt engineering experience.

    Anthropic CEO and co-founder Dario Amodei said prompt engineering was one of the most important things for widespread enterprise adoption of generative AI

    Reply
  6. Tomi Engdahl says:

    The Information:
    Despite OpenAI’s move to shut out developers in China from its tech, they can still access its models via Azure China, Microsoft’s joint venture with 21Vianet
    https://www.theinformation.com/articles/openais-china-ban-doesnt-apply-to-microsofts-azure-china

    Reply
  7. Tomi Engdahl says:

    Rachel Metz / Bloomberg:
    Captions, which uses AI to let people create and edit videos, raised $60M led by Index Ventures at a $500M valuation, taking its total funding to $100M

    Index, Jared Leto Back AI Video Startup Valued at $500 Million
    https://www.bloomberg.com/news/articles/2024-07-09/ai-video-startup-is-valued-at-500-million-in-new-funding-round

    Kleiner, Andreessen, Sequoia also took part in the $60 million financing for Captions.

    Reply
  8. Tomi Engdahl says:

    Stephanie Palazzolo / The Information:
    Volley, which makes AI-based games played via voice commands on Amazon Alexa, Fire TV, and Roku TV, raised a $55M Series C, bringing its total funding to $75M+
    https://www.theinformation.com/articles/startup-volley-raises-55-million-to-create-voice-powered-ai-games

    Reply
  9. Tomi Engdahl says:

    Spain sentences 15 schoolchildren over AI-generated naked images
    https://www.theguardian.com/world/article/2024/jul/09/spain-sentences-15-school-children-over-ai-generated-naked-images

    Teenagers each given a year’s probation after creating and spreading faked images of female classmates in south-west Spain

    A court in south-west Spain has sentenced 15 schoolchildren to a year’s probation for creating and spreading AI-generated images of their female peers in a case that prompted a debate on the harmful and abusive uses of deepfake technology.

    Police began investigating the matter last year after parents in the Extremaduran town of Almendralejo reported that faked naked pictures of their daughters were being circulated on WhatsApp groups.

    The mother of one of the victims said the dissemination of the pictures on WhatsApp had been going on since July.

    “Many girls were completely terrified and had tremendous anxiety attacks because they were suffering this in silence,” she told Reuters at the time. “They felt bad and were afraid to tell and be blamed for it.”

    On Tuesday, a youth court in the city of Badajoz said it had convicted the minors of 20 counts of creating child abuse images and 20 counts of offences against their victims’ moral integrity.

    Each of the defendants was handed a year’s probation and ordered to attend classes on gender and equality awareness, and on the “responsible use of technology”.

    “The sentence notes that it has been proved that the minors used artificial intelligence applications to obtain manipulated images of [other minors] by taking girls’ original faces from their social media profiles and superimposing those images on the bodies of naked female bodies,” the court said in a statement. “The manipulated photos were then shared on two WhatsApp groups.”

    Police identified several teenagers aged between 13 and 15 as being responsible for generating and sharing the images.

    Under Spanish law minors under 14 cannot be charged but their cases are sent to child protection services, which can force them to take part in rehabilitation courses.

    In an interview with the Guardian five months ago, the mother of one of the victims recalled her shock and disbelief when her daughter showed her one of the images.

    “It’s a shock when you see it,” said the woman from Almendralejo. “The image is completely realistic … If I didn’t know my daughter’s body, I would have thought that image was real.”

    Spanish prosecutor to probe AI-generated images of naked minors
    https://www.theguardian.com/world/article/2024/jul/09/spain-sentences-15-school-children-over-ai-generated-naked-images

    MADRID, Sept 25 (Reuters) – A Spanish prosecutor’s office said on Monday it would probe whether AI-generated images of naked teenaged girls, allegedly created and shared by their peers in southwestern Spain, constituted a crime.
    The rise in use by children of such technologies has sparked widespread concern among parents worldwide. The U.S. Federal Bureau of Investigation warned in June that criminals were increasingly using artificial intelligence to create sexually explicit images to intimidate and extort victims.

    Reply
  10. Tomi Engdahl says:

    Can AI be Meaningfully Regulated, or is Regulation a Deceitful Fudge?
    https://www.securityweek.com/can-ai-be-meaningfully-regulated-or-is-regulation-a-deceitful-fudge/

    Few people understand AI, nor how to use nor control it, nor where it is going. Yet politicians wish to regulate it.

    Reply
  11. Tomi Engdahl says:

    “Fake it till you make it may work in Silicon Valley, but for the rest of us …”

    As long as people are worried that it’s a bubble, it’s not a bubble. Bubbles exist in euphoria.

    EXPERT WARNS THAT AI INDUSTRY DUE FOR HUGE COLLAPSE
    https://futurism.com/the-byte/expert-warns-ai-industry-collapse?fbclid=IwZXh0bgNhZW0CMTEAAR1PWxyn_IDANpVxGfcD-suG18nGawqNfv–5SeHKrCicB7yrX98WV6hsPI_aem_nH7wEzWIoTiFDbpVihqgeg

    Reply
  12. Tomi Engdahl says:

    Synthesized fats emit less than palm: https://ie.social/N3A2W

    ‘Eat fossil fuels’: Bill Gates-backed company makes butter out of thin air
    Savor uses chemistry to add hydrogen and oxygen molecules to CO2 and make fats like butter and milk.
    https://interestingengineering.com/innovation/butter-from-co2-us?utm_source=facebook&utm_medium=article_image

    Reply
  13. Tomi Engdahl says:

    AMD ostaa suomalaisen tekoäly-yhtiön
    https://www.uusiteknologia.fi/2024/07/10/amd-ostaa-suomalaisen-tekoaly-yhtion/

    Amerikkalainen piiriyhtiö AMD ostaa suomalaisen Silo.ai:n 665 miljoonan dollarin eli runsaalla kuudensadan miljoonan euron hinnalla. Kaupalla AMD pyrkii kisaamaan tekoälyn laskenta piireissä huimassa nousussa olleen Nvidian kanssa.

    Sirutalo AMD ja suomalainen Silo AI ovat tänään allekirjoittaneensa sopimuksen yrityskaupasta, jonka arvo on noin noin 665 miljoonan dollaria. Runsaan kuudensadan miljoonan euron kauppa maksetaan käteisellä. Yrityskaupan odotetaan toteutuvan vuoden 2024 toisella puoliskolla kun tarvittavat hyväksynnät on saatu.

    AMD nopeuttaa Silo.AI:n ostolla piireihinsä perustuvien – tekoälymallien ja ohjelmistoratkaisujen kehitystä ja käyttöönottoa. Suomalaisyrityksen osto on jo toinen merkittävä askel yhtiön strategiassa toimittaa avoimiin standardeihin perustuvia tekoälyratkaisuja piireilleen. Suomessa AMD-pohjaisia ratkaisuja hyödynnetään esimerkiksi CSC-IT Kajaanin Lumi-superkoneessa.

    ”Silo AI on ollut suurten kielimallien koulutuksen skaalauksessa edelläkävijä LUMI:lla, Euroopan nopeimmalla supertietokoneella, jossa ib yli 12 000 AMD:n Instinct MI250X -grafiikkasuoritinra”, kertoo Pekka Manninen, tiede- ja teknologiajohtaja CSC-IT Tiedekeskuksesta.

    Silo AI:n asiakkaita ovat muun muassa Allianz, Philips, Rolls-Royce, Unilever sekä Suomesta Nokia.

    AMD panostaa avoimiin suuriin kielimalleihin
    https://etn.fi/index.php/13-news/16396-amd-panostaa-avoimiin-suuriin-kielimalleihin

    AMD panostaa avoimiin suuriin kielimalleihin

    Julkaistu: 11.07.2024

    Business Artificial Intelligence

    Prosessorijätti AMD ostaa suomalaisen tekoäly-yritys Silo AI:n 665 miljoonalla dollarilla eli noin 614 miljoonalla eurolla. Amerikkalaistietojen mukaan Silo AI:n kehittäjät jatkavat avoimien suurten kielimallien kehitystä.

    AMD kertoo tästä Financial Times -lehden artikkelissa. Kysymys suljettujen ja avoimien mallien eroista on muuten jäänyt suurelta osin vaille huomiota yrityskaupan raportoinnissa. Amerikkalaisarvioissa pohditaan lähinnä sitä, miten AMD yrittää kaupan avulla haastaa Nvidiaa tekoälyä hyödyntävissä grafiikkakorteissa.

    Suomalaisvinkkelistä kauppa on erittäin mielenkiintoinen, muutenkin kuin kauppahinnan takia. Silo AI ja tarkemmin sen generatiivisen tekoälyn SiloGen-yksikkö on yhdessä akateemisten tutkijoiden kanssa kehittänyt Poro- ja Viking-mallit. Poro 34B on näistä osaavampi, sillä se koostuu 34,2 miljardistra parametristä. Testeissä Poro on osoittautunut etevimmäksi avoimeksi LLM-malliksi.

    Poro on avoin eli lisensoitavissa Apache 2.0 -lisenssin alla. Sitä voi siis käyttää sekä akateemisiin että kaupallisiin tarkoituksiin. Koulutukseen on käytetty datasettiä, joka koostuu biljoonasta tokenista. Token voi viitata sanaan, merkkiin ja joissakin tapauksessa kokonaisiin lauseisiin tai fraaseihin.

    Poro on myöhemmin julkistetun Viking 34B:n tavoin koulutettu LUMI-supertietokoneella Kajaanissa. Tarkalleen ottaen käytössä oli 512 AMD:n MI250X-grafiikkaprosessoria.

    Reply
  14. Tomi Engdahl says:

    https://etn.fi/index.php/13-news/16395-samsung-panostaa-tekoaelyyn-myoes-taivuteltavissa

    Samsung esitteli tänään Pariisissa joukon uusia tuotteita. Kirkkaimmassa valokeilassa olivat kuudennen sukupolven Galaxy Z Fold 6 sekä Galaxy Z Flip 6. Tekoälystä tulee yhä integroidumpi osa myös näitä todellisia premium-laitteita. Moni toteutus on varsin onnistunut.

    Reply
  15. Tomi Engdahl says:

    Michael Hennessey / Bloomberg:
    Paris-based Bioptimus releases H-optimus-0, an open source AI model trained on hundreds of millions of images to aid research and diagnoses of diseases

    France’s Bioptimus Releases AI Model for Disease Diagnosis
    https://www.bloomberg.com/news/articles/2024-07-10/french-startup-bioptimus-releases-ai-model-for-disease-diagnosis

    The model is trained off of hundreds of millions of images
    Firm is among those racing to use AI for medical advances

    The French startup Bioptimus is releasing an artificial intelligence model trained on hundreds of millions of images that, it said, will aid in the research and diagnoses of diseases.

    The model, called H-optimus-0, is capable of performing complex tasks including identifying cancerous cells and detecting genetic abnormalities in tumors, the Paris-based company said in a statement. Bioptimus described the system as the largest model

    Reply
  16. Tomi Engdahl says:

    Mark Bergen / Bloomberg:
    Helsing, which makes AI tools for defense, raised a €450M Series C led by General Catalyst, a source says at a €4.95B valuation, taking its funding to €769M

    Defense Startup Helsing Raises at €5 Billion Valuation to Expand Along NATO’s Eastern Flank
    https://www.bloomberg.com/news/articles/2024-07-11/defense-startup-helsing-nets-5-billion-valuation-plans-eastern-flank-expansion

    Company formed in 2021 has now raised nearly €770 million
    Helsing will open entity in Estonia as part of Baltic push

    Helsing, a startup developing artificial intelligence software for defense, has raised €450 million ($487 million) in venture capital funding that it plans to use to expand its presence in European nations bordering Russia.

    Formed in 2021, Helsing makes software designed to boost weapons capabilities, such as drones and jet fighters, and improve battlefield decisions. The company, which plans to announce its financing on Thursday, says it’s been active in Ukraine since 2022 and is in talks with countries on NATO’s eastern flank — Baltic and eastern European nations that face growing aggression from Russia.

    Reply
  17. Tomi Engdahl says:

    Rachel Metz / Bloomberg:
    A look at the rise in use of the sparkles emoji to market AI products as magic, as some say the imagery distracts from real-world issues the AI industry faces — Have you noticed artificial intelligence is looking a lot more sparkly lately? But first… Do you believe in magic?

    Has Artificial Intelligence Co-opted the Sparkle Emoji?
    https://www.bloomberg.com/news/newsletters/2024-07-10/openai-google-adobe-and-more-have-embraced-the-sparkle-emoji-for-ai?accessToken=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzb3VyY2UiOiJTdWJzY3JpYmVyR2lmdGVkQXJ0aWNsZSIsImlhdCI6MTcyMDYyNTQxNSwiZXhwIjoxNzIxMjMwMjE1LCJhcnRpY2xlSWQiOiJTR0VNVEFEV1gyUFMwMCIsImJjb25uZWN0SWQiOiIyMjNDRDM2NDg0QzY0OTc3QjY5ODE0Rjc1MTYxNDRGNyJ9.j8SHCP8Ndarutj-5K8zArPi0-Zik_EVoF-7IkwLOpOg

    Have you noticed artificial intelligence is looking a lot more sparkly lately? But first…

    Do you believe in magic?

    Over the past year, versions of the sparkles emoji have popped up all over the AI landscape, particularly when it comes to marketing AI products to consumers. Alphabet Inc.’s Google uses a blue version of it to denote content produced by its Gemini chatbot. OpenAI uses slightly different sparkles to differentiate between the AI models that power ChatGPT. Microsoft Corp.’s LinkedIn has its own variety of sparkle adorning suggested questions to ask a chatbot on the social network. And Adobe Inc.’s take on the icon beckons users to generate AI images with its Firefly software.

    It makes sense that companies would want to make it clear, in many cases, when a service is using generative AI. The technology is still new to many of us, not everyone wants to use it, it may not always do what you expect, and it can make mistakes.

    In the past, we’ve unthinkingly accepted the icons associated with different software features and technologies. Think a paper clip for attaching an image to an email or a cloud for cloud computing. But opinions are divided about whether the sparkles emoji is the right one for this tectonic technological shift.

    In some cases, these sparkles are, well, sparking anger. “I straight up REFUSE to let AI take the sparkle emoji from me,” one person recently posted on X. Other commentators, YouTubers and outlets have also noticed and sometimes protested the trend.

    The icons strike Luke Stark, an assistant professor at Western University in Ontario, Canada, who studies the impacts of AI and other technologies, as a way for companies to bring up magical imagery, which “ties these products to the unreality and wonder produced by science fiction stories.”

    References to magic are nothing new in technology marketing. Apple Inc., for instance, has been talking about its “magical” products for years, including, most recently, its new AI platform, Apple Intelligence. (On the hardware side, the company also currently sells a $99 keyboard that is literally called Magic Keyboard).

    But for Stark, the imagery distracts from the real-world issues facing the AI industry. For example, he notes the controversial uses of human labor and troves of creative content that are necessary for making the magic happen.

    Kristy Tillman, who has worked as a design director for major tech companies, also thinks the sparkles’ connotation of magic — which AI is not — isn’t a good thing. But she believes that regardless of the imagery, some people will still have “charged feelings” about it, since the emoji has become a proxy for peoples’ emotions about the spread of AI itself.

    AI sparkles have infiltrated corporate America, appearing in services like Zoom, Spotify and the Washington Post. “People just have lots of opinions, rightfully so, about a technology they didn’t really ask for,” Tillman said.

    But while she thinks there’s probably a better way to communicate that a product uses AI, Tillman said companies’ takes on the sparkles emoji are helpful because they’re creating a de facto standard that everyone can recognize and understand.

    This is much better than if different companies came up with their own symbols for AI, she said, and speaks to the challenges designers face when deploying software for millions of people across cultures, devices, and languages.

    “I bet the first person who used it had no idea they were setting a standard,” she said.

    https://emojipedia.org/sparkles

    Reply
  18. Tomi Engdahl says:

    Shubham Sharma / VentureBeat:
    AWS launches App Studio, which is a generative AI-powered service letting enterprise users create internal apps using text prompts, in public preview

    AWS App Studio turns text into enterprise apps in minutes
    https://venturebeat.com/ai/aws-app-studio-turns-text-into-enterprise-apps-in-minutes/

    Today, Amazon Web Services (AWS) kicked off its annual summit in New York and announced a bunch of new product developments, including an App Studio aimed at democratizing how enterprise-grade applications are created.

    Available under a limited preview, the App Studio is a generative AI-driven service that enables enterprise users to build scalable internal applications by simply describing what they need in natural language. The offering works on the command provided, uses the directed data sources and produces the required application in a matter of minutes while taking care of all critical coding-related aspects, right from testing and deployment to operation and maintenance.

    Handling Today’s Threatscape at Machine Scale

    “AWS App Studio opens up application development to an entirely new set of builders, helping them create enterprise-grade applications in minutes,” Dilip Kumar, vice president of applications at AWS, said in a statement. He said the technology is “a force multiplier” for technical employees at the largest enterprises and fastest-growing startups.

    The company claims App Studio is better than not only traditional development but also most low-code development tools, which do the job but fail to produce fully secure apps that comply with enterprise privacy and security policies. Multiple enterprises have already signed up for it and are using it to build applications capable of handling internal processes and workflows.

    As such, it competes with other no and low-code enterprise app creation platforms such as Creatio’s Quantum and Salesforce Platform.
    The problem facing enterprises today

    Today, enterprise operations – no matter what the sector – depend on data-heavy internal processes and workflows (imagine tracking of inventory).

    Small teams handle these processes through spreadsheets and documents, but it becomes a problem of scale in no time. As the company grows, users find it difficult and time-consuming to maintain the docs.

    Tailor-made applications that connect to systems of record can solve this, although building, deploying, running, and maintaining them has been a resource-intensive task in itself.

    If a team hires expert development resources to build the app from scratch, they can take several days to bring the product to life. On the other hand, if they use low-code tools for the task, the results may not be 100% up to the mark and struggle with security and scalability gaps.

    New apps in minutes, no dev experience required

    With the new App Studio, AWS is trying to solve this problem by providing enterprises with a generative AI-powered application development interface.

    As the company describes, all a user has to do is simply describe the app they need, what they want to do and the data sources they want it to integrate with (from AWS to third-party sources like Zendesk).

    In a matter of minutes, the underlying coding models (details of which remain undisclosed) will process these inputs and build the desired professional-grade application, complete with the required user interface and workflows for testing and deployment.

    For instance, when a user selects specific data sources using a drop-down menu and writes “create an app for tracking inventory across stores,” the App Studio will generate an outline to verify the user’s intent and build an application with a multi-page UI, a data model and business logic.

    The offering also comes with a conversational AI assistant which provides detailed guidance on modifying different elements of the generated application using a point-and-click interface. This way, users can tweak the application generated, depending on their respective needs.

    Secure app customization, sharing and controls

    Once the app is finalized, the developer can hit the “generate data” option to see how the application will handle information in real time and move to deploy it to end users. This will make the app’s custom link available to downstream users, allowing them to access it using their existing enterprise authentication tools and role-based access controls.

    AWS notes that the applications deployed via App Studio are not only secure and scalable but also transparent enough to give IT teams a clear picture of how the app is being used as well as options to control user and data access and set guardrails to maintain compliance with internal policies.

    “Using natural language, any user with some technical experience can simply describe the application they want to build, and App Studio takes care of the development process, delivering an application that employees can start using immediately. It has never been easier for technical professionals to build custom applications tailored to the unique needs of their business, ushering in a new world of productivity for businesses of all sizes,” Kumar added in the statement.

    Available in preview

    While the offering is interesting, especially considering how important application development is today, it is important to note that not every AWS customer can use it right away.

    As of now, the company is providing the App Studio to customers in the U.S. West (Oregon) region under a limited preview.

    Reply
  19. Tomi Engdahl says:

    Evan Gorelick / Bloomberg:
    OpenAI and Los Alamos National Laboratory announce a partnership to evaluate how multimodal AI models can be used safely by scientists in laboratory settings

    OpenAI Partners With Los Alamos to Test AI’s Value for Lab Work
    https://www.bloomberg.com/news/articles/2024-07-10/openai-partners-with-los-alamos-to-test-ai-s-value-for-lab-work

    Study will look at risks, rewards of using AI for research
    AI startup has made several recent health and biotech deals

    Reply
  20. Tomi Engdahl says:

    Answer: 52 percent of the time. A new study has found that 52 percent of the popular chatbot’s answers to computer programming questions contain inaccurate information. The study was conducted by Purdue University and presented at the Computer-Human Interaction Conference in Hawaii

    How often does ChatGPT answer programming questions incorrectly?
    Answer: 52 percent of the time.
    https://www.govtech.com/question-of-the-day/how-often-does-chatgpt-answer-programming-questions-incorrectly

    If you’re turning to ChatGPT to help you with computer programming, you may want to be extra careful to double-check its answers. A new study has found that 52 percent of the popular chatbot’s answers to computer programming questions contain inaccurate information.

    The study was conducted by Purdue University and presented at the Computer-Human Interaction Conference in Hawaii this month. The researchers looked at 517 programming questions on Stack Overflow before feeding them to ChatGPT.

    In addition to over half of the bot’s answers containing incorrect information, they found that 77 percent of the answers were verbose. What’s more, the programmers who participated in the study did not always catch the bot’s inaccuracies, overlooking “the misinformation in the ChatGPT answers 39 percent of the time.” However, ChatGPT is still a favorite among programmers, with the study participants preferring “ChatGPT answers 35 percent of the time due to their comprehensiveness and well-articulated language style.”

    Analysis of ChatGPT answers to 517 programming questions finds 52% of ChatGPT answers contain incorrect information. Users were unaware there was an error in 39% of cases of incorrect answers.
    https://www.reddit.com/r/science/comments/1cwhx0a/analysis_of_chatgpt_answers_to_517_programming/

    This is pretty consistent with the use I’ve gotten out of it. It works better on well known issues. It is useless on harder less well known questions.

    The more niche the questions the more gibberish they churn out.

    One of the biggest problems I’ve found was contextualization across multiple answers. Like giving me valid example code throughout a few answers that wouldn’t work together because some parameters weren’t compatible with each other even though syntax was fine.

    Look at the translation industry if you want to know what will end up happening here. “AI” will handle the easy part and professionals will be paid the same rates to handle the hard parts, even though that rate was set with the assumption that the time needed for the complex things would be balanced out by the comparative speed on easy things.

    As an experienced programmer I find LLMs (mostly chatgpt and GitHub copilot) useful but that’s because I know enough to recognize bad output. I’ve seen colleagues, especially less experienced ones, get sent on wild goose chases by chatgpt hallucinations.

    This is part of why I’m concerned that these things might eventually start taking jobs from junior developers, while still requiring the seniors. But with no juniors there’ll eventually be no seniors…

    Exactly this. I had real trouble explaining a problem to it once. A human would have gotten it. But each iteration I tried a different angle or adding more information. The response deteriorated continuously. In the end it would have been faster to just brute force and debug.

    It’s not just programming. I ask it a variety of question about all sorts of topics, and I constantly notice blatant errors in at least half of the responses.

    These AI chat bots are a wonderful invention, but they are COMPLETELY unreliable. Thr fact that the corporations using them put in a tiny disclaimer saying it’s “experimental” and to double check the answers is really underplaying the seriousness of the situation.

    With only being correct some of the time, it means these chat bots cannot be trusted 100% of the time, thus rendering them completely useless.

    I haven’t seen too much improvement in this area in the last few years. They have gotten more elaborate at providing lifelike responses, and the writing quality improves substantially, but accuracy sucks.

    I hate that the AI is often shoved in my face. I don’t want crappy AI answers at the top of my browser, or god forbid it takes up my entire page because I just wanted to scroll to the top and search for something else.

    One trains a machine to produce plausible-sounding text, then one wonders when the machine bullshits (in the technical sense).

    I’m a lawyer and I’ve asked ChatGPT a variety of legal questions to see how accurate it is. Every single answer was wrong or missing vital information.

    I’m not a lawyer but I can tell you legal questions are a pretty poor application of LLMs. Most have limited access to training on legal matters and are probably just pulling random armchair lawyer bs off forums and news articles. They aren’t really designed to give factual information about specific fields.

    random_noise

    2mo ago

    This is something I see intimately with programming related questions and every AI out there.

    One of the big problems I see is that I get outdated information or results in the wrong version of a language or for a different platform. I also get a whole lot of I don’t know how to do that, but this sounds similar.

    The other problem is the more complicated your ask the more likely there are errors.

    Simple one liners, they get many of those if the API’s or functions haven’t changed.

    More complicated tasks, that include say error handling or secure practices. Be extremely leery and skeptical about in its responses because most of them are going to be wrong for your use case.

    Github and those sources are a similar mess of quality, just like the information on the internet, most of what is there is horrendous coding.

    This is why I fear this wave of early adoption and later advertising based monetization.

    These tools generate a whole lot of wrong answers. Additionally, they are extremely lossy and wasteful of hardware resources and we’re still a long ways away from any real human like intelligence.

    For some tasks they can be lean and give better results, those are very specific tasks with few variables to them compared to these 100 billion entry models.

    That’s about ChatGPT-3.5 (and only talks about “containing incorrect information,” not being incorrect as such or not useful). ChatGPT-4 or 4o are much, much better.

    I agree with this. And you can literally train them. I have been training a bot to help me – feeding it public work documentation to start, then correcting it’s answers as it continues. It keeps improving!

    Plus tool use. Give GPT the option to run web searches, to extract information from documentation documents etc. and the response quality increases dramatically.

    Keep in mind it won’t remember anything past its context window (unless it’s GPT-4o, and even then only what the bot chooses to remember). So within the context window, it will keep learning, but once it leaves that window, it’s as if you never told it.

    I found it to be great for getting started on new tech or solving common issues. Once you’re a little bit into it, it just consistently gets something wrong every time. Even when you tell it what the error is, often the fix will introduce something else.

    Is Stack Overflow Obsolete? An Empirical Study of the
    Characteristics of ChatGPT Answers to Stack Overflow Questions
    https://dl.acm.org/doi/pdf/10.1145/3613904.3642596

    Reply
  21. Tomi Engdahl says:

    https://news.ycombinator.com/item?id=40465787

    Study finds that 52% of ChatGPT answers to programming questions are wrong (futurism.com)

    If 52% of responses have a flaw somewhere, then 48% of responses are flawless.

    That is amazing and important.

    The headline should be “LLM gives flawless responses to 48% of coding questions.”

    There are articles every day about how AI is replacing programmers, coding is dead etc, including from the nvidia ceo this week. This kind of thing shows we are not quite there yet. There are lots of folks on twitter etc who rave about how genAI built a full app for them, but in my experience that comes with a huge amount of human trial and error and understanding to know what needs to be tweaked.

    > This kind of thing shows we are not quite there yet

    I think you need to probably consider the time it took to go from 100% wrong, to 90, to 80, etc…. My guess is that interval is probably shrinking from milestone to milestone. This causes me to suspect that folks starting SWE careers in 2024 will not likely not be SWEs before 2030.

    That’s why I tell my grandkids they should consider plumbing and HVAC trades instead of college. My bet is within 10 years nearly every vocation that you require a college degree will be made partially or completely obsolete by AI.

    I tell my grandkids that vocational school is a perfectly decent and honorable way to get into a trade that pays better than retail.

    I also tell them that a good university is a perfectly decent and honorable way to begin a life of the mind. It’s not the only way, and a life of the mind isn’t the life everyone wants.

    I also tell them that the purpose of a university education is mostly not about training for a job.

    I’ve never been that fond of SO, but I find chatgpt very useful. SO tends to be simpler one time questions. My interactions with gpt are conversations working towards a solution.

    LLMs totally have a beginner problem. It’s much like the problem where a beginner knows they need to look something up but can’t figure out the right keywords to search for.

    Also chatgpt has never called me an idiot for asking a stupid question, having not read the question properly, and making the assumption it was the same as an existing question after skimming it. I wouldn’t ask SO a question these days, the response is more likely to be toxic than helpful.

    LLMs have been trained on these answers, and can generate “it depends” too. Sometimes they’re even too patronising and non-committal.

    Chat interface has an advantage of having user-specific context and follow up questions, so it can filter and refine answers for the user.

    With StackOverflow search it’s up to the user to judge whether the answer they’ve found applies to their situation.

    Getting some results with the help of infinitely-patient GPT may motivate people to learn more, as opposed to losing motivation from getting stuck, having trouble finding right answers without knowing the right terminology, and/or being told off by StackOverflow people that’s a homework question.

    People who want to grow, can also use GPT to ask for more explanations, and use it as a tutor. It’s much better at recalling general advice.

    And not everyone may want to grow into a professional developer. GPT is useful to lots of people who are not programmers, and just need to solve programming-adjacent problems, e.g. write a macro to automate a repetitive task, or customize a website.

    Same could be said of wrong stack overflow answers or random google results. Clearly they’ll become critical of the results if the code simply doesn’t compile, same as our generation sharpened our skills by filtering bad from good from google results

    I’ve been saying from the start that this is not a tool for beginners and learners. My students use it constantly and I keep telling them when they go to chat GPT for answers, it’s like they are going to a senior for help — they know a lot but they are often wrong in subtle and important ways.

    That’s why classes are taught by professors and not undergrads. Professors are at least supposed to know what they don’t know.

    When students think of ChatGPT as their drunk frat bro they see doing keg stands at the Friday basement party rather than as an expert they use it differently.

    “Additionally, this work has used the free version of ChatGPT (GPT-3.5)”

    This is a critical detail. GPT-4 is much better than 3.5 for programming, in my experience.

    Yeah I really don’t understand why research is still being published that uses GPT3.5 rather than GPT4 or both models. ~500 programming questions is maybe a few bucks on the API?

    Because 99% of users and probably 95% of programmers are using the free version while almost no one is using the paid version.

    jononomo 47 days ago | prev | next [–]

    I use ChatGPT for coding constantly and the 52% error rate seems about right to me. I manually approve every single line of code that ChatGPT generates for me. If I copy-paste 120 lines of code that ChatGPT has generated for me directly into my app, that is because I have gone over all 120 lines with a fine-toothed comb, and probably iterated 3-4 times already. I constantly ask ChatGPT to think about the same question, but this time with an additional caveat.

    I find ChatGPT more useful from a software architecture point of view and from a trivial code point of view, and least useful at the mid-range stuff.

    It can write you a great regex (make sure you double-check it) and it can explain a lot of high-level concepts in insightful ways, but it has no theory of mind — so it never responds with “It doesn’t make sense to ask me that question — what are you really trying to achieve here?”, which is the kind of thing an actually intelligent software engineer might say from time to time.

    ChatGPT isn’t the best coding LLM. Claude Opus is.

    Also as you can always tell if a coding response works empirically mistakes are much more easily spotted than in other forms of LLM output.

    Debugging with AI is more important than prompting. It requires an understanding of the intent which allows the human to prompt the model in a way that allows it to recognize its oversights.

    Most code errors from LLMs can be fixed by them. The problem is an incomplete understanding of the objective which makes them commit to incorrect paths.

    Being able to run code is a huge milestone. I hope the GPT5 generation can do this and thus only deliver working code. That will be a quantum leap.

    Can someone email the author and explain what a LLM is?

    People asking for ‘right’ answers, don’t really get it. I’m sorry if that sounds abrasive, but these people give LLMs a bad name due to their own ignorance/malice.

    I remember having some Amazon programmer trash LLMs for ‘not being 100% accurate’. It was really an iD10t error. LLMs arent used for 100% accuracy. If you are doing that, you don’t understand the technology.

    There is a learning curve with LLMs, and it seems a few people still don’t get it.

    The real problem is that it’s not marketed that way. WE may understand that but most people, heck even in my experience a large percentage of tech people, don’t. They think there is some kind of true intelligence (it’s literally in the name) behind it. Just like I also understand that the top results on Google are not always the best.. but my parents don’t.

    > LLMs arent used for 100% accuracy.

    I think you’re wrong about that. They shouldn’t be, but they clearly are.

    That article links to the actual paper, the abstract of which is itself quite readable: https://dl.acm.org/doi/pdf/10.1145/3613904.3642596

    > Q&A platforms have been crucial for the online help-seeking behav- ior of programmers. However, the recent popularity of ChatGPT is altering this trend. Despite this popularity, no comprehensive study has been conducted to evaluate the characteristics of ChatGPT’s an- swers to programming questions. To bridge the gap, we conducted the first in-depth analysis of ChatGPT answers to 517 programming questions on Stack Overflow and examined the correctness, consis- tency, comprehensiveness, and conciseness of ChatGPT answers. Furthermore, we conducted a large-scale linguistic analysis, as well as a user study, to understand the characteristics of ChatGPT an- swers from linguistic and human aspects. Our analysis shows that 52% of ChatGPT answers contain incorrect information and 77% are verbose. Nonetheless, our user study participants still preferred ChatGPT answers 35% of the time due to their comprehensiveness and well-articulated language style. However, they also overlooked the misinformation in the ChatGPT answers 39% of the time. This implies the need to counter misinformation in ChatGPT answers to programming questions and raise awareness of the risks associated with seemingly correct answers.

    Related presentation video on the CHI 2024 conference page:

    https://programs.sigchi.org/chi/2024/program/content/146667

    Reply
  22. Tomi Engdahl says:

    ChatGPT is wrong a lot when it answers programming questions, study says
    Programmers in the study would also often overlook the misinformation
    https://qz.com/chatgpt-answers-wrong-programming-openai-study-1851500242

    Artificial intelligence chatbots like OpenAI’s ChatGPT are being sold as revolutionary tools that can help workers become more efficient at their jobs, perhaps replacing those people entirely in the future. But a stunning new study has found ChatGPT answers computer programming questions incorrectly 52% of the time.

    “Our analysis shows that 52% of ChatGPT answers contain incorrect information and 77% are verbose,” the new study explained. “Nonetheless, our user study participants still preferred ChatGPT answers 35% of the time due to their comprehensiveness and well-articulated language style.”

    Disturbingly, programmers in the study didn’t always catch the mistakes being produced by the AI chatbot.

    Reply
  23. Tomi Engdahl says:

    Scientists find ChatGPT is inaccurate when answering computer programming questions
    https://techxplore.com/news/2024-05-scientists-chatgpt-inaccurate.html

    Alarmingly, the team found that user study participants preferred the answers given by ChatGPT 35% of the time. The researchers also found that the same users reading the answers given by ChatGPT quite often did not catch the mistakes that were made—they overlooked wrong answers 39% of the time.

    Reply
  24. Tomi Engdahl says:

    OpenAI’s CriticGPT Catches Errors in Code Generated by ChatGPT
    https://www.infoq.com/news/2024/07/openai-criticgpt/

    OpenAI recently published a paper about CriticGPT, a version of GPT-4 fine-tuned to critique code generated by ChatGPT. When compared with human evaluators, CriticGPT catches more bugs and produces better critiques. OpenAI plans to use CriticGPT to improve future versions of their models.

    When originally developing ChatGPT, OpenAI used human “AI trainers” to rate the outputs of the model, creating a dataset that was used to fine-tune it using reinforcement learning from human feedback (RLHF). However, as AI models improve, and can now perform some tasks at the same level as human experts, it can be difficult for human judges to evaluate their output. CriticGPT is part of OpenAI’s effort on scalable oversight, which is intended to help solve this problem. OpenAI decided first to focus on helping ChatGPT improve its code-generating abilities. The researchers used CriticGPT to generate critiques of code; they also paid qualified human coders to do the same. In evaluations, AI trainers preferred CriticGPT’s critiques 80% of the time, showing that CriticGPT could be a good source for RLHF training data.

    Reply
  25. Tomi Engdahl says:

    Rachel Metz / Bloomberg:
    OpenAI creates five levels to track its progress toward AGI: Chatbots, Reasoners, Agents, Innovators, and Organizations, and says it’s nearly at level two — The company believes its technology is approaching the second level of five on the path to artificial general intelligence

    OpenAI Scale Ranks Progress Toward ‘Human-Level’ Problem Solving
    https://www.bloomberg.com/news/articles/2024-07-11/openai-sets-levels-to-track-progress-toward-superintelligent-ai

    The company believes its technology is approaching the second level of five on the path to artificial general intelligence

    Reply
  26. Tomi Engdahl says:

    “Experts are growing increasingly concerned over early signs that the frenzy surrounding AI could collapse in on itself — a bubble which, if it bursts, could end in disaster.”

    EXPERT WARNS THAT AI INDUSTRY DUE FOR HUGE COLLAPSE
    https://futurism.com/the-byte/expert-warns-ai-industry-collapse?fbclid=IwZXh0bgNhZW0CMTEAAR1emzxc7vaWgytaLNHFWSbLQkhLhZFixezFSwSpGB58ytVpm_iRHojrf-s_aem_ftmCOGErPvxifzEh9i-KKQ

    Reply
  27. Tomi Engdahl says:

    “AI still remains, I would argue, completely unproven,” he told Somerset Webb, as quoted by Fortune. “And fake it till you make it may work in Silicon Valley, but for the rest of us, I think once bitten twice shy may be more appropriate for AI.”

    And it’s not just Ferguson who has voiced concerns.

    “This is precisely what happened with the Internet in 1999, autonomous driving in 2017 and now generative AI in 2024,” tech stock analyst Richard Windsor wrote in a March research note.

    Even tech leaders in the industry are warning things could end badly. Former Stability AI CEO Emad Mostaque warned bankers last summer that “I think this will be the biggest bubble of all time.”

    “I call it the ‘dot AI’ bubble, and it hasn’t even started yet,” he added.

    Ferguson noted some glaring shortcomings with the technology, including “hallucinations,” a term used to denote lies dreamed up by large language models like OpenAI’s GPT-4. It’s a problem that has persisted to this day, with some experts arguing that it’s an intrinsic quality of the tech, meaning that it may never be solved.

    “If AI cannot be trusted,” Ferguson told Somerset Weebb, “then AI is effectively, in my mind, useless.”

    Then there’s the sheer amount of electricity needed to train and upkeep these AI models, making them too “energy hungry.” Last week for instance, news emerged that Google’s emissions rose by almost 50 percent in five years, a trend driven by the company’s substantial investments in the AI space and which falls well short of its own climate targets.

    https://futurism.com/the-byte/expert-warns-ai-industry-collapse?fbclid=IwZXh0bgNhZW0CMTEAAR1emzxc7vaWgytaLNHFWSbLQkhLhZFixezFSwSpGB58ytVpm_iRHojrf-s_aem_ftmCOGErPvxifzEh9i-KKQ

    Reply
  28. Tomi Engdahl says:

    Nothing will ever compare to the late 90s. Everyone wants the next big thing. These are just improvements to the foundational technology of being connected via the WWW. Cars didn’t have windshields when they first came out. Don’t be fooled. There will never be an opportunity like the late 90s in that realm ever again. Blockchain, NFTs, LLM AI, and next will be Self aware AI. These will all be trends in tech the people will rush to try to get rich off of and lose everything.

    Reply
  29. Tomi Engdahl says:

    Current AIs only have the IQ level of a cat, asserts Google DeepMind CEO

    https://www.tomshardware.com/tech-industry/artificial-intelligence/current-ais-only-have-the-iq-level-of-a-cat-asserts-google-deepmind-ceo?utm_source=facebook.com&utm_campaign=socialflow&utm_content=tomsguide&utm_medium=social&fbclid=IwZXh0bgNhZW0CMTEAAR3urZYdQSvi-ANRgU9Oe8lQrnQx_bCresSwOV-jQK7llsLXGGRVX_-FYGM_aem_TFWhyFCjLofpeI1FsXBCTg

    The CEO of Google DeepMind has compared the IQ levels of contemporary artificial intelligence (AI) agents to domestic cats. “We’re still not even at cat intelligence yet, as a general system,” remarked Hassabis, answering a question about DeepMind’s progress in artificial general intelligence (AGI). However, research is progressing fast, with some huge cash and compute investments propelling it forward. Some expect it to eclipse human intelligence in the next half-decade.

    Demis Hassabis, the co-founder and CEO of Google DeepMind, made the artificial intelligence vs. cat IQ comparison in a public discussion with Tony Blair, one of Britain’s ex-Prime Ministers. The talk was part of the Future of Britain Conference 2024, organized by the Institute for Global Change.

    Hassabis highlights that his work is not focused on AI but on AGI.

    On the potential of AI to shape our lives, Hassabis boldly reckons that it will be as big as the Industrial Revolution or the harnessing of fire or electricity. In the future, and more specifically, the DeepMind CEO thinks one of the most exciting ways AI will become a leading light will be accelerating scientific discovery in energy, materials science, health care, climate, and mathematics – and it is already doing this. Interestingly, he said we were all talking about ‘big data’ in the noughties, and AI systems are the answer.

    Hassabis took the opportunity to plug a DeepMind project called Project Astra. This removes AI from the restrictions of being a mere Chatbot like ChatGPT or Google Gemini, with much more awareness of a user’s situation, environment, preferences, history, and so on. In this way, Project Astra aims to deliver a ‘universal AI agent’ that is helpful in everyday life.

    Meanwhile, the biggest hurdles remaining for outfits like DeepMind and human-level AGI achievements include planning, memory, tool use, and smart questioning. The DeepMind CEO knows that big breakthroughs and compute scaling are still needed for AGIs to achieve human IQ levels

    Reply
  30. Tomi Engdahl says:

    The company’s “most sophisticated image-generating model to date.”

    THE NEW STABLE DIFFUSION IS PRODUCING HORRIFIC MANGLED HUMAN BODIES
    https://futurism.com/the-byte/new-stable-diffusion-is-mangled?fbclid=IwZXh0bgNhZW0CMTEAAR3LPK-5FrqIYnFJ07HO0ELOitXeeOefK3uqmvA4FQQedtw7EHkCMqDL6K0_aem_oT36ZhokaguEVswktyL8Tg

    After a chaotic few months, embattled AI startup Stability AI released the latest version of its text-to-image AI model, Stable Diffusion 3 Medium. According to Stability, the new AI is its “most sophisticated image-generating model to date.” So why is it consistently generating freakish body horror monstrosities?

    As Ars Technica reports, disappointed Stable Diffusion users have taken to Reddit to complain that the new model often refuses to generate a picture of a human that isn’t a horrifyingly mangled, AI-generated mess of incoherent limbs.

    “I haven’t been able to generate a single decent image at all outside of the example prompts,” one irritated Redditor wrote

    Reply
  31. Tomi Engdahl says:

    ‘Nvidia is slowly becoming the IBM of the AI era’ says Jim Keller, perhaps forgetting how short-lived IBM’s PC monopoly was
    News
    By Jacob Fox published 9 July 2024
    Will the ever-inflating AI bubble burst?
    https://www.pcgamer.com/hardware/graphics-cards/nvidia-is-slowly-becoming-the-ibm-of-the-ai-era-says-jim-keller-perhaps-forgetting-how-short-lived-ibms-pc-monopoly-was/

    Reply
  32. Tomi Engdahl says:

    OpenAI failed to report a major data breach in 2023
    https://www.csoonline.com/article/2514383/openai-failed-to-report-a-major-data-breach-in-2023.html

    A hacker infiltrated OpenAI’s internal messaging system, gaining access to employee discussions regarding the company’s latest AI advancements, according to a New York Times report.

    Reply
  33. Tomi Engdahl says:

    Coders’ Copilot code-copying copyright claims crumble against GitHub, Microsoft
    A few devs versus the powerful forces of Redmond – who did you think was going to win?
    https://www.theregister.com/2024/07/08/github_copilot_dmca/

    Claims by developers that GitHub Copilot was unlawfully copying their code have largely been dismissed, leaving the engineers for now with just two allegations remaining in their lawsuit against the code warehouse.

    The class-action suit against GitHub, Microsoft, and OpenAI was filed in America in November 2022, with the plaintiffs claiming the Copilot coding assistant was trained on open source software hosted on GitHub and as such would suggest snippets from those public projects to other programmers without care for licenses – such as providing appropriate credit for the source – thus violating the original creators’ intellectual property rights.

    Microsoft owns GitHub and uses OpenAI’s generative machine-learning technology to power Copilot, which auto-completes source code for engineers as they type out comments, function definitions, and other prompts.

    Reply
  34. Tomi Engdahl says:

    Tekoäly voi tuoda miljardeja valtion kassaan, mutta toistaiseksi sen käyttöönotto on ollut hidasta
    Suomalaisyritykset eivät ole ottaneet tekoälyä käyttöön odotetulla innolla. Etenkin pienissä ja keskisuurissa yrityksissä tekoälyn hyödyntäminen on jäänyt vähäiseksi.
    https://yle.fi/a/74-20093883

    Reply
  35. Tomi Engdahl says:

    Amazon aikoo julkaista oman tekoälynsä, jonka rinnalla jopa ChatGPT kalpenee
    8.7.202411:19
    Aiemmin Amazonia ei ole juuri näkynyt tekoälybuumin aallonharjalla. Tilanne saattaa muuttua, jos uuteen jättimalliin perustuva tekoälyapuri todella lyö läpi.
    https://www.mikrobitti.fi/uutiset/amazon-aikoo-julkaista-oman-tekoalynsa-jonka-rinnalla-jopa-chatgpt-kalpenee/b33420dc-d134-427e-8112-7041f5c2fd4f

    Verkkokauppana tunnettu Amazon yrittää laajentaa toimintaansa myös tekoälyn pariin. Yhtiöllä on kehitteillä oma tekoälyapurinsa, jonka on tarkoitus haastaa muun muassa OpenAI:n ChatGPT.

    Business Insider ja ITpro kertovat, että tekoälyapurin projektinimi on Metis, oletettavasti kreikkalaisen tiedon ja viisauden jumalattaren mukaan. Metis perustuu lehden lähteiden mukaan Amazonin omaan Olympus-tekoälymalliin.

    ChatGPT could be facing some serious competition: Amazon is reportedly working on a new AI tool, ‘Metis’, to challenge the chatbot’s dominance
    News
    By Ross Kelly published July 3, 2024
    Amazon could be preparing to mount a serious challenge on ChatGPT’s dominance with the launch of a new chatbot service
    https://www.itpro.com/technology/artificial-intelligence/chatgpt-could-be-facing-some-serious-competition-amazon-is-reportedly-working-on-a-new-ai-tool-metis-to-challenge-the-chatbots-dominance

    Reply
  36. Tomi Engdahl says:

    Monocle: Open-source LLM for binary analysis search
    Monocle is open-source tooling backed by a large language model (LLM) for performing natural language searches against compiled target binaries
    https://www.helpnetsecurity.com/2024/07/08/monocle-open-source-llm-binary-analysis-search/

    Reply
  37. Tomi Engdahl says:

    The Risk of ‘Good Enough’ in Large Language Models
    https://www.forbes.com/sites/forbesbooksauthors/2024/07/08/the-risk-of-good-enough-in-large-language-models/

    Quick: what was the first car ever invented?

    If you said the Ford Model T, I’m sorry to say you are incorrect. Many automobiles were built from spare wagon and bicycle parts that sputtered about even before Mr. Ford thought up his first model.

    Everything about the car’s technology was up for debate during this time. Would it use electricity or gas for power? Three wheels or four? It wasn’t until the Ford company rolled out their Model T that the market found an affordable and reliable car that was “good enough” for the masses.

    As large language models (LLMs) gain more recognition, the term “good enough” has also been used to describe these revolutionary new technologies. How that adequate nature impacts our future is yet to be seen.

    AIs and Technological Adequacy
    LLM technologies, increasingly used in various aspects of life, are challenging our notions of technological adequacy. Advocates argue that as long as LLMs meet functional benchmarks, they are sufficient to write our legal briefs, provide us with poetry, or perform whatever else we may need.

    But the crucial question is: Are these models “good enough” when they mimic intricate human traits like language understanding or consciousness?

    Critics point out that LLMs lack genuine human cognition, emotion, and ethical reasoning despite their capabilities. These aren’t just philosophical debates; they have significant implications for integrating AI into our social and ethical frameworks.

    Particularly concerning is a phenomenon known as “hallucination,” or the tendency of these models to generate convincing but entirely fabricated information.

    Hallucinations highlight a critical flaw. While LLM capabilities are impressive, their lack of real understanding makes them unsuitable for our most important tasks. Good enough doesn’t cut it when a doctor needs a treatment plan.

    How could LLMs that are deemed “good enough” impact our future?

    This comparison is essential as we chart AI’s path. Accepting “good enough” may democratize AI access and speed up adoption, mirroring the Model T’s impact. But it carries risks. Rushed market entries could exacerbate overlooked societal impacts, ethical issues, and neglected nuances in human-AI interactions

    Accepting “good enough” could stifle innovation. If we set the bar at mere adequacy, we lose the drive for excellence and complex problem-solving. This would lead to complacency in tackling intricate challenges that push technological boundaries. We must look beyond LLMs’ appeal and consider what these technologies can do and what they should strive to achieve. Our choices will shape not only future technology but also society.

    In our discussion about LLMs and other AI systems’ adequacy, “good enough” should not be an endpoint. It’s where we begin. Our real goal is excellence—creating effective technologies, enriching the human experience, and upholding the highest ethical standards.

    Reply
  38. Tomi Engdahl says:

    How AI is bringing back the dead
    The dangers of digital immortality
    https://iai.tv/articles/the-dangers-of-digital-immortality-auid-2881

    Reply
  39. Tomi Engdahl says:

    ChatGPT just (accidentally) shared all of its secret rules – here’s what we learned
    News
    By Eric Hal Schwartz published July 4, 2024
    Saying ‘hi’ revealed OpenAI’s instructions until the company shut it down, but you can still find them
    https://www.techradar.com/computing/artificial-intelligence/chatgpt-just-accidentally-shared-all-of-its-secret-rules-heres-what-we-learned

    Reply
  40. Tomi Engdahl says:

    How Good Is ChatGPT at Coding, Really? Study finds that while AI can be great, it also struggles due to training limitations
    https://spectrum.ieee.org/chatgpt-for-coding

    Reply
  41. Tomi Engdahl says:

    How Good Is ChatGPT at Coding, Really? Study finds that while AI can be great, it also struggles due to training limitations
    https://spectrum.ieee.org/chatgpt-for-coding

    Programmers have spent decades writing code for AI models, and now, in a full circle moment, AI is being used to write code. But how does an AI code generator compare to a human programmer?

    A study published in the June issue of IEEE Transactions on Software Engineering evaluated the code produced by OpenAI’s ChatGPT in terms of functionality, complexity and security. The results show that ChatGPT has an extremely broad range of success when it comes to producing functional code—with a success rate ranging from anywhere as poor as 0.66 percent and as good as 89 percent—depending on the difficulty of the task, the programming language, and a number of other factors.

    While in some cases the AI generator could produce better code than humans, the analysis also reveals some security concerns with AI-generated code.

    Yutian Tang is a lecturer at the University of Glasgow who was involved in the study. He notes that AI-based code generation could provide some advantages in terms of enhancing productivity and automating software development tasks—but it’s important to understand the strengths and limitations of these models.

    “By conducting a comprehensive analysis, we can uncover potential issues and limitations that arise in the ChatGPT-based code generation… [and] improve generation techniques,” Tang explains.

    To explore these limitations in more detail, his team sought to test GPT-3.5’s ability to address 728 coding problems from the LeetCode testing platform in five programming languages: C, C++, Java, JavaScript, and Python.

    Overall, ChatGPT was fairly good at solving problems in the different coding languages—but especially when attempting to solve coding problems that existed on LeetCode before 2021. For instance, it was able to produce functional code for easy, medium, and hard problems with success rates of about 89, 71, and 40 percent, respectively.

    “However, when it comes to the algorithm problems after 2021, ChatGPT’s ability to generate functionally correct code is affected. It sometimes fails to understand the meaning of questions, even for easy level problems,” Tang notes.

    For example, ChatGPT’s ability to produce functional code for “easy” coding problems dropped from 89 percent to 52 percent after 2021. And its ability to generate functional code for “hard” problems dropped from 40 percent to 0.66 percent after this time as well.

    “A reasonable hypothesis for why ChatGPT can do better with algorithm problems before 2021 is that these problems are frequently seen in the training dataset,” Tang says.

    “ChatGPT may generate incorrect code because it does not understand the meaning of algorithm problems.”
    —YUTIAN TANG, UNIVERSITY OF GLASGOW

    Interestingly, ChatGPT is able to generate code with smaller runtime and memory overheads than at least 50 percent of human solutions to the same LeetCode problems.

    The researchers also explored the ability of ChatGPT to fix its own coding errors after receiving feedback from LeetCode. They randomly selected 50 coding scenarios where ChatGPT initially generated incorrect coding, either because it didn’t understand the content or problem at hand.

    While ChatGPT was good at fixing compiling errors, it generally was not good at correcting its own mistakes.

    The researchers also found that ChatGPT-generated code did have a fair amount of vulnerabilities, such as a missing null test, but many of these were easily fixable. Their results also show that generated code in C was the most complex, followed by C++ and Python, which has a similar complexity to the human-written code.

    Tangs says, based on these results, it’s important that developers using ChatGPT provide additional information to help ChatGPT better understand problems or avoid vulnerabilities.

    “For example, when encountering more complex programming problems, developers can provide relevant knowledge as much as possible, and tell ChatGPT in the prompt which potential vulnerabilities to be aware of,” Tang says.

    Reply
  42. Tomi Engdahl says:

    APPLE’S NEW AD SHOWING MACHINES CRUSHING HUMAN CREATIVITY IS A BIT ON THE NOSE
    https://futurism.com/the-byte/new-apple-ad-crushing-dreams

    Reply

Leave a Reply to Tomi Engdahl Cancel reply

Your email address will not be published. Required fields are marked *

*

*