Jensen Huang Founder, President and CEO at Nvidia | Official website
NVIDIA has introduced a new service, NVIDIA AI Foundry, along with NVIDIA NIM™ inference microservices to enhance generative AI capabilities for enterprises globally. The service allows organizations to build custom "supermodels" using the Llama 3.1 collection of models and NVIDIA's technology.
The AI Foundry enables enterprises and nations to create domain-specific models by training them with proprietary and synthetic data generated from Llama 3.1 405B and the NVIDIA Nemotron™ Reward model. Powered by the NVIDIA DGX™ Cloud AI platform, this service offers scalable compute resources co-engineered with leading public clouds.
Jensen Huang, founder and CEO of NVIDIA, stated, “Meta’s openly available Llama 3.1 models mark a pivotal moment for the adoption of generative AI within the world’s enterprises. Llama 3.1 opens the floodgates for every enterprise and industry to build state-of-the-art generative AI applications."
Mark Zuckerberg, founder and CEO of Meta, commented on the development: “The new Llama 3.1 models are a super-important step for open source AI.”
NVIDIA NIM inference microservices for Llama 3.1 models are now available for download from ai.nvidia.com. These microservices provide up to 2.5x higher throughput than running inference without NIM.
Accenture is among the first to adopt this service, using it to build custom Llama 3.1 models through its Accenture AI Refinery™ framework for internal use and client applications.
Julie Sweet, chair and CEO of Accenture, remarked, “Accenture has been working with NVIDIA NIM inference microservices for our internal AI applications... we can help clients quickly create and deploy custom Llama 3.1 models.”
The comprehensive service provided by NVIDIA AI Foundry includes software, infrastructure, expertise from the NVIDIA ecosystem, and support from global system integrator partners.
NVIDIA Nemotron-4 340B Reward model assists in generating synthetic data to improve model accuracy when creating custom supermodels. Enterprises can also use their own training data for domain-adaptive pretraining (DAPT) with NVIDIA NeMo.
NVIDIA has collaborated with Meta to offer a distillation recipe for smaller custom Llama 3.1 models suitable for various infrastructures like workstations and laptops.
Industry leaders such as Aramco, AT&T, and Uber are among the first to access these new services across sectors including healthcare, energy, financial services, retail, transportation, and telecommunications.
The Llama 3.1 collection comprises multilingual large language models (LLMs) in sizes ranging from 8B- to 405B-parameters trained on over 16,000 NVIDIA H100 Tensor Core GPUs.
New NeMo Retriever RAG Microservices further enhance response accuracy in production environments when combined with NIM inference microservices for Llama 3.1.
Hundreds of partners in the enterprise ecosystem can now integrate these microservices into their solutions supporting over five million developers and numerous startups within the NVIDIA community.
Production support is available through NVIDIA AI Enterprise while members of the NVIDIA Developer Program will soon have free access to NIM microservices for research purposes.