Mark Zuckerberg Chairman and CEO of Meta Platforms (formerly Facebook, Inc.) | Meta Platforms (formerly Facebook, Inc.)
For more than a decade, Meta’s Fundamental AI Research (FAIR) team has focused on advancing the state of the art in AI through open research. As the field rapidly innovates, collaboration with the global AI community is considered increasingly important.
Meta announced the release of five new FAIR research models to the global community. These include image-to-text and text-to-music generation models, a multi-token prediction model, and a technique for detecting AI-generated speech. By sharing this research publicly, Meta aims to inspire further iterations and advance AI responsibly.
The Chameleon model family, which can understand and generate both images and text simultaneously, is among the key components being released under a research-only license. Unlike most large language models that typically produce unimodal results, Chameleon can handle any combination of text and images as input or output. This capability opens up possibilities such as generating creative captions for images or creating new scenes from mixed text prompts and images.
A new approach called multi-token prediction has been proposed to train large language models (LLMs) more efficiently by predicting multiple future words at once instead of one at a time. This method is intended to improve the speed and efficiency of LLMs compared to traditional training objectives that require significantly more text data. The pretrained models for code completion using this approach are being released under a non-commercial, research-only license.
JASCO is another model introduced by Meta that offers enhanced control over AI music generation by accepting various inputs such as chords or beats in addition to text prompts. This allows for better control over generated music outputs while maintaining quality comparable to existing baselines.
Meta also unveiled AudioSeal, an audio watermarking technique designed for localized detection of AI-generated speech within longer audio snippets. AudioSeal's localized detection approach enhances speed and efficiency compared to traditional methods, making it suitable for large-scale real-time applications. It is being released under a commercial license as part of efforts to prevent misuse of generative AI tools.
To address geographical disparities in text-to-image generation systems, Meta developed automatic indicators and conducted a large-scale annotation study involving over 65,000 annotations. The study aimed to understand regional variations in perceptions of geographic representation across different regions. The geographic disparities evaluation code and annotations are being released today to help improve diversity in generative models across the community.