Meta Unveils AI Models Capable of Generating Text and Images

Meta has published five new research models on artificial intelligence (AI), some of which can detect AI-generated speech inside longer audio clips and others that can generate text and visuals.

Meta’s Fundamental AI Research (FAIR) team made the models publicly available on Tuesday, June 18, the firm announced in a news statement on Tuesday.

The announcement from Meta stated, “We hope that by making this research publicly available, we can inspire iterations and ultimately help advance AI in a responsible way.”

According to the announcement, one of the new models is called Chameleon, and it belongs to a family of mixed-modal models that can comprehend and produce both text and visuals. These models are able to produce a combination of text and images from input that contains both text and images. In the release, Meta hinted that this feature may be used to provide descriptions for photos or to build new scenes using text prompts and images.

Pretrained models for code completion were also provided on Tuesday. According to the press release, these models were trained using Meta’s novel multitoken prediction technique, which trains large language models (LLMs) to predict several upcoming words simultaneously rather than one at a time as in the past.

More control over AI music creation is available with the third new model, JASCO. According to the announcement, this new model can take in several inputs, such as beats or chords, for the purpose of generating music, instead of solely depending on text inputs. This feature enables the integration of audio and symbols into a single text-to-music production model.

According to a press release, another new model called AudioSeal has an audio watermarking technique that allows the localized identification of AI-generated speech, or the ability to identify specific AI-generated portions inside a larger audio clip. Additionally, this model is up to 485 times faster than earlier techniques at identifying speech produced by AI.

The goal of the sixth new AI research model that Meta’s FAIR team unveiled on Tuesday is to broaden the geographical and cultural diversity of text-to-image generating systems. In order to enhance assessments of text-to-image models, the company has made geographic disparities evaluation code and annotations available for this purpose.

Capital spending on AI and the metaverse development division Reality Labs is expected to reach $35 billion to $40 billion by the end of 2024, according to a metaverse financial report released in April. This represents an increase in spending of $5 billion over the company’s initial projections.

During the company’s quarterly earnings call on April 24, Meta CEO Mark Zuckerberg stated, “We’re building a number of different AI services, from our AI assistant to augmented reality apps and glasses, to APIs [application programming interfaces] that help creators engage their communities and that fans can interact with, to business AIs that we think every business on our platform will use.”

Butterflies: World's First Social Platform Connecting Humans and AI Launched »

« Snap Introduces AI-Powered Augmented Reality Technologies

Categories: Technology

Tags: AI Researchartificial intelligenceChameleon ModelFAIRMeta AInewsText ToImage

Kajal Chavan:

Find Us

Headline