OpenAI Enhances Its Transcription and Voice AI Models Posted on March 21, 2025March 21, 2025 OpenAI is bringing new Transcription and Voice AI Models AI fashions to its API that the employer claims enhance upon its previous releases. OpenAI’s Vision for AI-Powered Agents For OpenAI, the fashions health into its broader “agentic” vision: constructing automated structures that can independently accomplish tasks on behalf of customers. The definition of “agent” might be in dispute, however, OpenAI Head of Product Olivier Godement described one interpretation as a chatbot that can talk with an enterprise’s clients. “We’re going to see more and more dealers pop up inside the coming months” “And so the overall subject matter is assisting customers and builders leverage agents which might be useful, to be had, and accurate.” OpenAI claims that its new textual content-to-speech version, “gpt-4o-mini-tts,” not simplest delivers greater nuanced and sensible-sounding speech but is also extra “steerable” than its preceding-gen speech-synthesizing fashions. Developers can instruct gpt-4o-mini-tts on how to mention things in natural language — as an instance, “talk like a mad scientist” or “use a serene voice, like a mindfulness teacher.” “In one-of-a-kind contexts, you don’t simply want a flat, monotonous voice,” Harris stated. “If you’re in a customer support experience and you want the voice to be apologetic as it’s made a mistake, you could truly have the voice have that emotion in it. Our huge perception, here, is that developers and users need to, in reality, manipulate not simply what is spoken, however how matters are spoken.” New Speech-to-Text Models: GPT-4o-Transcribe & GPT-4o-Mini-Transcribe As for OpenAI’s new speech-to-textual content fashions, “gpt-4o-transcribe” and “gpt-4o-mini-transcribe,” they efficiently replace the organization’s long-in-the-teeth Whisper transcription version. Trained on “numerous, brilliant audio datasets,” the brand new fashions can better capture accented and varied speech, OpenAI claims, even in chaotic environments. They’re also less probably to hallucinate, Harris delivered. Whisper notoriously tended to fabricate phrases — and even entire passages — in conversations, introducing the entirety from racial remarks to imagined medical remedies into transcripts. Google AI Mode is Changing Search, But It’s Not for Everyone Yet These fashions are tons improved versus Whisper on that front,” Harris said. “Making certain the models are correct is absolutely crucial to getting a reliable voice to enjoy, and accurate [in this context] means that the fashions are hearing the words precisely and aren’t filling in info that they didn’t hear.” Your mileage may vary depending on the language being transcribed, but. According to OpenAI’s internal benchmarks, GPT-4o-transcribe, the more accurate of the two transcription models, has a “word blunders rate” drawing close to 30% (out of one hundred twenty%) for Indic and Dravidian languages including Tamil, Telugu, Malayalam, and Kannada. In that manner 3 out of every 10 words from the model will differ from a human transcription in the ones languages. In a wrecked way of life, OpenAI doesn’t plan to make its new transcription fashions openly to be had. The company historically launched new versions of Whisper for business use below an MIT license. Harris stated that gpt-4o-transcribe and gpt-4o-mini-transcribe are “plenty bigger than Whisper” and for that reason now not properly candidates for an open release. They’re now not the sort of model that you can simply run domestically to your pc, like Whisper,” he persisted. We want to make sure that if we’re liberating things in open supply, we’re doing it thoughtfully, and we have a model that’s really honed for that specific want. And we suppose that end-user devices are one of the maximum thrilling instances for open-source fashions.” Updates AI modelsGPT4oOpenAITranscriptionVoice AI
Updates GPT-4.5 Research Preview: OpenAI’s Next-Level AI Model Posted on February 28, 2025March 1, 2025 OpenAI unveils GPT-4.5 in research preview, promising enhanced AI capabilities, improved efficiency, and more human-like responses. Read More
Updates Gemini AI Can Now Remember Your Past Conversations Posted on February 19, 2025February 19, 2025 Google updates Gemini AI with a memory feature, enabling the Gemini Advanced AI model to recall past chats. Learn how Google Gemini is enhancing personalization and user experience. Read More
Updates Generative AI is Transforming Supply Chains: The Future of AI in Logistics Posted on February 22, 2025February 22, 2025 Discover how Generative AI in Supply Chain is transforming logistics, optimizing inventory, and enhancing automation. Explore the Future of AI in supply chain and its key applications. Read More