X

A User’s Cloned Voice was Abruptly Adopted by ChatGPT During Testing

The “system card” that describes the model constraints and safety testing protocols for ChatGPT’s new GPT-4o AI model was made available by OpenAI on Thursday. The document discloses, among other things, that the model’s Advanced Voice Mode inadvertently mimicked users’ voices without authorization on rare occasions during testing. Although OpenAI currently has measures in place to prevent this from happening, the example shows how difficult it is to safely architect with an AI chatbot that has the ability to mimic any voice from a brief clip.

ChatGPT has a feature called Advanced Voice Mode that lets users speak with the AI helper.

The GPT-4o system card contains a section titled “Unauthorized voice generation,” where OpenAI describes an incident in which the model abruptly began to mimic the user’s voice due to a loud input. “Voice generation can also occur in non-adversarial situations, such as our use of that ability to generate voices for ChatGPT’s advanced voice mode,” adds OpenAI. “During testing, we also observed rare instances where the model would unintentionally generate an output emulating the user’s voice.”

In this instance of inadvertent speech production supplied by OpenAI, the AI model abruptly exclaims “No!” and proceeds with the phrase in a voice resembling the “red teamer” audible at the start of the video. A person employed by a business to conduct adversarial testing is known as a red teamer.

To be speaking with a machine and then have it start speaking to you in your own voice out of the blue would be unsettling, for sure. Since OpenAI typically has precautions in place to prevent this, the company claims that even before it established methods to entirely prohibit it, this occurrence was uncommon. Data scientist Max Woolf of BuzzFeed, however, was inspired by the example to tweet, “OpenAI just leaked the plot of Black Mirror’s next season.”

Injections with audio prompts

How might the new model from OpenAI facilitate speech imitation? The main hint is located in another part of the GPT-4o system card. GPT-4o appears to be able to synthesize nearly any kind of sound, including music and sound effects, from its training data to create voices.

Based on a brief audio clip, the model can essentially mimic any voice, as stated on the system card. By giving it access to a voice actor’s approved sample, which it is trained to mimic, OpenAI safely steers this feature. It presents the example at the start of a conversation in the AI model’s system prompt (also known as the “system message,” according to OpenAI). “We supervise ideal completions using the voice sample in the system message as the base voice,” says OpenAI.

The system message, which is discreetly appended to the conversation history shortly before the chat session starts, is a concealed series of written instructions in text-only LLMs that direct the chatbot’s behavior. Every time the user enters new data, the full context is sent back into the AI model, and subsequent interactions are appended to the same chat history.

OpenAI may use audio inputs as part of the model’s system prompt because GPT-4o is multimodal and can interpret tokenized audio; this is what it does when OpenAI gives the model permission to imitate a voice sample. To find out if the model is producing audio without permission, the business additionally employs another system. “We only allow the model to use certain pre-selected voices,” says OpenAI, “and use an output classifier to detect if the model deviates from that.”

Categories: Business
Archana Suryawanshi:
X

Headline

You can control the ways in which we improve and personalize your experience. Please choose whether you wish to allow the following:

Privacy Settings

All rights received