NEWS
NVIDIA Gives Away Cosmos 3 to Lock In the Robot Compute Race
NVIDIA handed the robotics industry a frontier-grade brain on June 1 and charged nothing for it. At its GTC session in Taipei, chief executive Jensen Huang unveiled NVIDIA Cosmos 3, an open omnimodel (a single model that takes in and generates text, images, video, ambient sound and physical actions at once) built to teach robots, cars and cameras how the physical world behaves. Two sizes went live the same day, free to download on Hugging Face, with training scripts and datasets posted to GitHub.
The free price tag is the strategy, not a concession. A model that becomes the default starting point for every robotics team also decides which chips that team buys to train and run it. That is the part of this launch worth slowing down for.
What NVIDIA Loaded Onto Hugging Face
Cosmos 3 is pitched as the world’s first fully open omnimodel for physical AI, a category NVIDIA uses to separate robots and self-driving systems from the text-and-image models most people already know. The company says it can cut physical AI training and evaluation cycles from months to days, the kind of claim that matters when a single robotics validation run used to eat a quarter of engineering time.
The model was trained on what NVIDIA calls one of the largest multimodal physical AI datasets assembled, billions of samples spanning text, image, video, sound and action trajectories. That last category is the unusual one. Most foundation models stop at pixels and words; Cosmos 3 also learned from the numerical record of how machines move.
NVIDIA split the release into three tiers aimed at different stages of a project, two shipping now and one on the way.
| Variant | Built for | Strength | Availability |
|---|---|---|---|
| Cosmos 3 Super | Post-training robotics and autonomous-vehicle models | Highest physics accuracy | Available on Hugging Face |
| Cosmos 3 Nano | Real-time video and action reasoning | Responses in fractions of a second | Available on Hugging Face |
| Cosmos 3 Edge | Smaller on-device gadgets | Local real-time inference | Coming soon |
You can read the full breakdown in NVIDIA’s Cosmos 3 launch announcement, which also lists the build.nvidia.com hosting and NVIDIA NIM (NVIDIA Inference Microservices, prepackaged containers for running models in production) deployment paths.
Why a Frontier Model Is Suddenly Free
Open weights do not mean free compute. Every one of those billions of training samples was processed on NVIDIA accelerators, and every developer who fine-tunes Cosmos 3 for a warehouse robot or a delivery van will reach for the same hardware. The model is the giveaway. The silicon underneath is where the money sits.
This is the CUDA playbook applied to robots. NVIDIA spent two decades making its software the default layer for AI research, which made its chips the default purchase. Releasing a frontier physical AI model into the open does the same job one rung higher: it sets the industry standard for how machines learn to move, and that standard runs best, and often only well, on NVIDIA’s stack. The NIM microservices and the build.nvidia.com hosting are the toll booths bolted onto the free road.
The big bang of physical AI is just around the corner thanks to breakthroughs in multimodal reasoning language, vision and world models.
That was Huang at the Taipei keynote. Read commercially, the sentence is a forecast of demand. If physical AI is about to explode, the company that owns the training substrate for it collects on every robot, vehicle and smart camera that gets built. Rivals understand the stakes, which is why startups such as Axelera AI keep raising money to break the dependence, as detailed in coverage of Axelera AI’s push to challenge NVIDIA with local AI chips. A free model that nudges developers toward NVIDIA edge hardware makes that fight harder.
The Coalition Reads Like a Customer Roster
Alongside the model, NVIDIA launched the Cosmos Coalition, a group it frames as open collaboration on world models. The framing is generous; the membership is strategic. Each name is a present or prospective buyer of NVIDIA compute, and pulling them into a shared open ecosystem keeps their roadmaps tethered to the same platform.
- World-model and generative labs: Black Forest Labs, Runway, Generalist and LTX, video and image specialists now contributing to a shared physical AI stack
- Robotics builders: Agile Robots and Skild AI, using Cosmos 3 to generate action-conditioned data for humanoid and general-purpose robot policies
- Industrial heavyweights: Samsung Electronics, LG Electronics and Doosan Robotics, training factory and automation systems on the platform
- Autonomous driving: Li Auto, applying the model to self-driving development
- Vision AI operators: Centific, Fogsphere, Linker Vision, Milestone Systems and Yuan, running camera-stream analysis at city scale
Linker Vision offers the cleanest example of the pull. The company is using Cosmos 3 to scan thousands of live camera feeds at once and identify the root cause of incidents across a smart-city deployment, a workload that, once standardized on this model, is awkward to migrate anywhere else.
A Model That Looks Before It Moves
The technical claim under the marketing is that Cosmos 3 reasons about a scene before it generates anything. NVIDIA credits a mixture-of-transformers design, an architecture that splits the work between two specialized components rather than forcing one network to do everything.
Reasoning Block Meets Generation Block
One transformer interprets what is happening in a scene: how objects interact, where they sit relative to each other, how they are likely to move. A second, the expert generation block, turns that understanding into physically grounded output. The sequence matters. By analyzing spatial and temporal relationships first, the model effectively previews the next state of the world before it commits to a video frame or a motion path, which is meant to cut the kind of physics errors that make a robot knock over the cup it was reaching for.
Three Ways Developers Plug It In
NVIDIA exposes the model through three deployment modes, each matched to a different job:
- Vision language model: multimodal reasoning that can also produce native action data, including joint angles, gripper positions and trajectory points that tell a robot exactly how to move
- World simulation model: a video foundation model that generates physically plausible sequences so teams can test environments and manufacture synthetic training data safely
- Action core: a base to fine-tune for a specific robot body, camera layout, workspace or task
That action-data output is the piece text-and-image models cannot match. Producing the actual numbers a motor needs is what separates a model that describes the world from one that can be wired into a machine. The deeper technical walkthrough sits in NVIDIA’s engineering blog on how Cosmos 3 thinks before it acts.
Topping the Physical-AI Leaderboards
NVIDIA is not shy about the scoreboard. Cosmos 3 variants took first place across a cluster of open-weight physical AI benchmarks, a result the company leans on to argue the open model is also the best model, not a stripped-down freebie.
- 7 leaderboard wins, including Physics-IQ, PAI-Bench, R-Bench, RoboLab, RoboArena, VANTAGE-Bench and TAR
- 3 model sizes spanning cloud, real-time and edge deployment
- Billions of multimodal samples in the training set, including action trajectories
- Months to days reduction in training and evaluation cycles, per NVIDIA’s own figure
Benchmark dominance does real work here. A developer choosing a starting model weighs both the price and the score, and when the free option also tops the charts, the case for building anywhere else weakens fast. For NVIDIA, every team that adopts Cosmos 3 on benchmark strength is a team it has quietly routed onto its hardware. The same dynamic showed up in the company’s latest results, where booming AI demand drove the numbers covered in this look at how chip stocks rallied on NVIDIA’s earnings beat.
The win for developers is genuine. So is the dependency that comes attached. If physical AI scales the way Huang predicts, the bill for that free model arrives later, denominated in GPUs, and it lands on everyone who built on the open standard NVIDIA just made the obvious choice.
Frequently Asked Questions
Is NVIDIA Cosmos 3 actually free to use?
Yes. The Cosmos 3 Super and Nano models are open and free to download from Hugging Face, with training scripts and datasets posted to GitHub. The cost is indirect: training and running the model is built for NVIDIA accelerators, so the hardware and cloud services around the free weights are where spending occurs.
What does omnimodel mean?
An omnimodel natively understands and generates several types of data at once. Cosmos 3 handles text, images, video, ambient sound and physical actions in a single system, rather than bolting separate models together for each.
Where can developers download Cosmos 3?
The open models are available on Hugging Face, with customization resources on GitHub and hosted access through build.nvidia.com. NVIDIA also offers the model as NIM microservices for production deployment.
What is the difference between Super, Nano and Edge?
Cosmos 3 Super targets the highest physics accuracy for robotics and autonomous-vehicle post-training. Nano prioritizes speed, returning video and action reasoning in fractions of a second. Edge, coming soon, is the compact version for real-time inference on smaller devices.
Which companies are already using Cosmos 3?
NVIDIA lists Samsung Electronics, LG Electronics, Doosan Robotics, Agile Robots, Skild AI, Li Auto and several vision AI firms including Linker Vision and Milestone Systems among early adopters, alongside Cosmos Coalition members such as Black Forest Labs and Runway.
What makes Cosmos 3 different from a chatbot model?
It generates native action data, such as joint angles, gripper positions and trajectory points, that can be wired directly into a robot. Text-and-image models describe the world; Cosmos 3 outputs the numbers a machine needs to move through it.
-
FINANCE2 days agoZcash Patched a Double-Spend Bug as ZEC Climbed 5%
-
ENTERTAINMENT3 days agoSteam Summer Sale 2026 Locks In June 25 to July 9 Dates
-
FINANCE2 days agoCitigroup Says ETF Outflows Drove Bitcoin’s Crash, Not Strategy’s Sale
-
NEWS3 weeks agoMeta Adds AI Replies to Threads, But Users Can’t Block It
-
FINANCE3 days agoCoinbase Invests in Ethena, ENA Jumps 10% on Open-Market Buy
-
NEWS3 days agoGigaton Lands $26M to Replace Heavy Industry’s Control Stack
-
NEWS7 days agoLondon AI Lab Inherent Raises $50m to Reinvent Science
-
NEWS3 days agoQuobly’s €115M Bet to Scale Silicon Quantum Computing
