Gone are the automated voices of Alexa, Siri, and Google Assistant. OpenAI's ChatGPT now represents the new bar for voice assistant sound, with the arrival of GPT-4o, a modified version of GPT-4 capable of processing text, audio, images and videos, but above all to express yourself as if there was no voice assistant yet.
Here are ten examples that demonstrate the versatility of this update.
GPT-4o can chat in real time
It was actually possible to ask questions with your voice in ChatGPT, but the experience was somewhat disjointed. The dialogue was actually a series of independent requests, where we dictated our questions and where voice synthesis allowed us to hear the answers.
As shown here, GPT-4o allows you to chat in real time, without sending your requests one by one. The experience is like a normal conversation. As you can see, ChatGPT's voice is also much more realistic than that of other voice assistants to date, with inputs and emotions that adapt to the context. The sound is also somewhat reminiscent of the movie Artificial Intelligence HaWritten by Spike Jones.
GPT-4o can change a person's emotional level
ChatGPT with GPT-4o can change its voice, for example to make it more or less emotional or speak like a robot.
GPT-4o can change its rhythm
ChatGPT can also change its pace, as shown here, when an OpenAI employee asks it to count to 10 more or less quickly.
GPT-4o can sing
In this demo, ChatGPT with GPT-4o uses two different voices to sing in harmony with different vocal types (soprano, for example). In this second videoThe AI also sings (and whispers) a lullaby to babies. Artists can rest assured: the quality is fairly average, and ChatGPT's debut album is not for tomorrow.
GPT-4o can decrypt what it sees
GPT-4o is a so-called multi-modal generative AI, capable of analyzing text, video, audio and image. The tool can therefore be used to dissect our surroundings.
GPT-4o can help a blind person navigate
One potential benefit of GPT-4o vision is its use by blind people to explain to them what is around them. The tool isn't reliable enough to replace guide dogs, but it can certainly be used in a complementary way, as seen here when a user asks ChatGPT to tell him when a taxi is approaching.
GPT-4o can be cynical
ChatGPT was already capable of being sarcastic in its responses, but here we take it to a higher level, with a voice clearly laced with sarcasm. It remains to be seen whether AI can also detect this in our voice, which could be more useful.
GPT-4o can analyze mathematical problems
OpenAI showed off an impressive demo of ChatGPT on a tablet to help a user solve a math problem. Note that the app here is able to see the content of the app on the tablet, which opens up possibilities for developers wanting to integrate a more complete assistant into their software.
GPT-4o can translate in real time
Another concrete use of GPT-4o: real-time conversation translation.
GPT-4o can talk to another GPT-4o
Perhaps the most disturbing presentation OpenAI has made yet: two instances of ChatGPT with GPT-4o chatting to each other, from one phone to another.
GPT-4o is already available on the web and in the ChatGPT mobile app in text mode, for both free and paid OpenAI users. However, some audio and video features will be rolled out gradually over the coming weeks and months.