NVIDIA has unveiled a new platform named Cosmos, designed to enhance the development of physical AI systems such as autonomous vehicles and robots. The platform includes advanced generative world foundation models, tokenizers, guardrails, and an accelerated video processing pipeline optimized for NVIDIA data center GPUs.
The company highlights that developing physical AI models is often expensive and requires extensive real-world data and testing. With Cosmos world foundation models (WFMs), developers can generate large amounts of photorealistic, physics-based synthetic data to train and evaluate their existing models. Additionally, developers have the option to create custom models by fine-tuning these WFMs.
These models are accessible under an open model license to support the robotics and autonomous vehicle community. Developers can preview the initial models on the NVIDIA API catalog or download them from the NVIDIA NGC catalog or Hugging Face.
Notable companies in robotics and automotive sectors like 1X, Agile Robots, Agility, Figure AI, Foretellix, Fourier, Galbot, Hillbot, IntBot, Neura Robotics, Skild AI, Virtual Incision, Waabi, XPENG, and Uber are among the first adopters of Cosmos.
Jensen Huang, founder and CEO of NVIDIA stated: “The ChatGPT moment for robotics is coming. Like large language models, world foundation models are fundamental to advancing robot and AV development, yet not all developers have the expertise and resources to train their own.” He further added that Cosmos was created "to democratize physical AI and put general robotics in reach of every developer."
Cosmos WFMs allow developers to customize with datasets tailored to specific applications. These models are specifically built for generating physics-based videos using various inputs such as text or sensor data. They aim at high-quality simulation of industrial environments like warehouses or driving conditions.
During his keynote at CES, Jensen Huang demonstrated how Cosmos could be utilized for video search and understanding as well as photoreal synthetic data generation. Other uses include physical AI model development through reinforcement learning or multiverse simulation scenarios.
Cosmos also features an accelerated data processing pipeline powered by NVIDIA NeMo Curator that significantly reduces processing time compared to traditional CPU-only pipelines. The NVIDIA Cosmos Tokenizer offers improved compression rates for converting images into tokens.
Industry leaders across physical AI sectors are already integrating Cosmos technologies into their projects. For instance:
- 1X launched its World Model Challenge dataset using Cosmos Tokenizer.
- XPENG plans to use it for humanoid robot development.
- Hillbot and Skild AI are leveraging it for general-purpose robots.
Pras Velagapudi from Agility remarked: “Data scarcity and variability are key challenges...Cosmos’ text-, image- and video-to-world capabilities allow us to generate...scenarios...without needing as much expensive real-world data capture.”
In transportation:
- Waabi is evaluating Cosmos in AV software development.
- Wayve is considering it for edge case driving scenarios.
- Foretellix will utilize it alongside NVIDIA Omniverse Sensor RTX APIs for high-fidelity scenario generation.
- Uber partners with NVIDIA aiming at accelerating autonomous mobility solutions through rich datasets combined with Cosmos features.
Dara Khosrowshahi from Uber commented: “Generative AI will power the future of mobility...By working with NVIDIA we can help supercharge...autonomous driving solutions.”
NVIDIA emphasizes its commitment towards trustworthy AI principles ensuring privacy while mitigating risks associated with misinformation via invisible watermarks on generated content.
Cosmos WFMs are currently available under an open model license on platforms like Hugging Face with more services expected soon through NVIDIA NIM microservices. Additional support is offered via NVIDIA DGX Cloud deployment options alongside enterprise assistance through the company's software platform offerings.