Meta unveils five AI models for multi-modal processing, music generation, and more

By writer
8 months Ago

[ad_1]

Meta has unveiled 5 main new AI fashions and analysis, together with multi-modal programs that may course of each textual content and pictures, next-gen language fashions, music era, AI speech detection, and efforts to enhance variety in AI programs.

The releases come from Meta’s Elementary AI Analysis (FAIR) group which has targeted on advancing AI by way of open analysis and collaboration for over a decade. As AI quickly innovates, Meta believes working with the worldwide neighborhood is essential.

“By publicly sharing this analysis, we hope to encourage iterations and finally assist advance AI in a accountable approach,” stated Meta.

Chameleon: Multi-modal textual content and picture processing

Among the many releases are key elements of Meta’s ‘Chameleon’ fashions below a analysis license. Chameleon is a household of multi-modal fashions that may perceive and generate each textual content and pictures concurrently—not like most massive language fashions that are sometimes unimodal.

“Simply as people can course of the phrases and pictures concurrently, Chameleon can course of and ship each picture and textual content on the identical time,” defined Meta. “Chameleon can take any mixture of textual content and pictures as enter and in addition output any mixture of textual content and pictures.”

Potential use circumstances are nearly limitless from producing artistic captions to prompting new scenes with textual content and pictures.

Multi-token prediction for quicker language mannequin coaching

Meta has additionally launched pretrained fashions for code completion that use ‘multi-token prediction’ below a non-commercial analysis license. Conventional language mannequin coaching is inefficient by predicting simply the subsequent phrase. Multi-token fashions can predict a number of future phrases concurrently to coach quicker.

“Whereas [the one-word] method is easy and scalable, it’s additionally inefficient. It requires a number of orders of magnitude extra textual content than what kids have to study the identical diploma of language fluency,” stated Meta.

JASCO: Enhanced text-to-music mannequin

On the artistic facet, Meta’s JASCO permits producing music clips from textual content whereas affording extra management by accepting inputs like chords and beats.

“Whereas current text-to-music fashions like MusicGen rely primarily on textual content inputs for music era, our new mannequin, JASCO, is able to accepting numerous inputs, reminiscent of chords or beat, to enhance management over generated music outputs,” defined Meta.

AudioSeal: Detecting AI-generated speech

Meta claims AudioSeal is the primary audio watermarking system designed to detect AI-generated speech. It may well pinpoint the precise segments generated by AI inside bigger audio clips as much as 485x quicker than earlier strategies.

“AudioSeal is being launched below a industrial license. It’s simply one among a number of traces of accountable analysis we’ve got shared to assist stop the misuse of generative AI instruments,” stated Meta.

Bettering text-to-image variety

One other vital launch goals to enhance the variety of text-to-image fashions which may usually exhibit geographical and cultural biases.

Meta developed computerized indicators to judge potential geographical disparities and performed a big 65,000+ annotation examine to grasp how folks globally understand geographic illustration.

“This permits extra variety and higher illustration in AI-generated pictures,” stated Meta. The related code and annotations have been launched to assist enhance variety throughout generative fashions.

By publicly sharing these groundbreaking fashions, Meta says it hopes to foster collaboration and drive innovation inside the AI neighborhood.

(Photograph by Dima Solomin)

See additionally: NVIDIA presents latest advancements in visual AI

Need to study extra about AI and large knowledge from trade leaders? Take a look at AI & Big Data Expo happening in Amsterdam, California, and London. The excellent occasion is co-located with different main occasions together with Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Discover different upcoming enterprise expertise occasions and webinars powered by TechForge here.

Tags: ai, artificial intelligence, audioseal, chameleon, fair, jasco, meta, meta ai, models, music generation, open source, text-to-image

[ad_2]

Source link

Categories: News

Chameleon: Multi-modal textual content and picture processing

Multi-token prediction for quicker language mannequin coaching

JASCO: Enhanced text-to-music mannequin

AudioSeal: Detecting AI-generated speech

Bettering text-to-image variety

Related Content

Palantir and Microsoft partner to provide AI services to government

How the chip giant missed a big opportunity

A new era for AI maths whizzes

Paige and Microsoft unveil next-gen AI models for cancer diagnosis