Thank you NVIDIA and AMD. DGX Spark gifted by NVIDIA, Strix Halo gifted by AMD. Both camps handed us silicon and we intend to contnue proving to the world the raw power of these machines.

Open source. Pure Rust and CUDA. Verified on GB10.

The inference engine for the machine on your desk.

Atlas is an open source LLM engine we hand tuned for NVIDIA DGX Spark. One 2.5 GB binary, no Python, no PyTorch. What ships is what we verify, and we bench it every single release.

$ curl -fsSL https://atlasinference.io/quickstart.sh | sh

Star on GitHub Join the Discord Get running

Do not take our word for it. First token in under 90 seconds on a DGX Spark. Median of our GB10 runs, model cached, atlas 59616dc, Jul 2026. Same command below, run it and time it yourself.

serve matrix DGX Spark · GB10

▷ MLPerf submission in progress

✓liveness + coherenceenforced

◷throughput baselinesawaiting submission

atlas 2386637 2026-07-09

Merged into Hugging Face Transformers Qwen Dev Ambassadors MLPerf Agentic Edge task force Built with SCALE by Spectral Compute

// 01 · momentum

Built in the open, starred in the open.

Atlas went from one Reddit post to a whole crew of builders running it on their own Sparks. The curve below is live, regenerated from the GitHub API on every deploy.

585★

GitHub stars and climbing, live from the API.

Star the repo

“Night and day compared to the 10 minute torch.compile cycle. Startup in about 15 seconds and it just stays coherent in an agentic loop.”

ronald_15496, #general

“Testing Atlas on a DGX Spark in an agentic workflow for over an hour. Super impressed. Spark is actually awesome with Atlas.”

PersonWhoThinks, r/LocalLLaMA

“I had grown tired of the usual stack and was hoping for something like this. Really surprised and impressed. So glad I bought a Spark.”

tetsuro59, #general

// come build with us

The action is in Discord.

Hundreds of builders are running Atlas on their own Sparks right now. We are in there every single day, shipping fixes, taking model requests, and tuning kernels live. Your machine is the test fleet and your voice sets the roadmap. Pull up.

Join the Discord Active every day. Bring your Spark.

// 02 · verified

Every number is a receipt.

The website is a build artifact of the repo. Models come from recipes, performance comes from committed gate enforced baselines, stamped with commit and date. If a number is not in the repo, it is not on this page.

What the gate checks

An Atlas image ships only after the serve matrix passes: every model boots, stays coherent (greedy determinism, no token leakage, tool reliability), and holds throughput within 10% of its committed baseline. What “verified” means · gate_results.py

A release that ships slower than the committed baseline fails our gate. That one sentence is the whole positioning.

We are prepping a submission to MLPerf Inference v6.1, the same CUDA source submitted across NVIDIA GB10 and AMD gfx1151. Aiming to be the first to run identical CUDA on both.

Atlas is a member of MLCommons and sits on the MLPerf Agentic Edge task force, where we helped shape the BFCL-v4 edge agentic benchmark. the edge agentic benchmark work.

The MLPerf name and logo are registered and unregistered trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See mlcommons.org for more information.

python3 tests/run_all_models.py && python3 tests/gate_results.py --update-baselines

Beat these numbers or catch a regression, open an issue and we will feature it.

Every model card comes from a recipe in atlas-recipes.

serve matrix DGX Spark · GB10

▷ MLPerf submission in progress

We are prepping our numbers for a public MLPerf Inference submission. When they land they render right here in this receipt, gate enforced, reproducible, stamped. Until then the release gate holds every image to liveness and coherence, and you can reproduce any run yourself.

✓liveness + coherenceenforced

◷throughput baselinesawaiting submission

atlas 2386637 2026-07-09

// 03 · hardware

Prosumer first. Desk machines, not clusters.

NVIDIA DGX Spark🎁

GB10 · SM121

Verified today

One multi model binary serves a full matrix of hand tuned targets on a single GB10. NVFP4 and FP8, MTP speculative decoding, EP=2 across two Sparks. Every target passes the serve matrix before we cut an image.

Read the deployment guide →

AMD Strix Halo🎁

gfx1151 · RDNA 3.5

In bring up

One codebase, both camps. Our CUDA kernels compile straight for AMD gfx1151 with SCALE by Spectral Compute. No HIP port, no second kernel tree. Serving Qwen at NVFP4 quality on a dev branch and stabilizing now.

Join the bring up, PR #187 → Built with SCALE by Spectral Compute ↗

// 04 · models

Every model here has a recipe.

Pick a vendor, then a family. Every card maps to one recipe in atlas-recipes, so the site cannot list a model we do not ship. Copy the command and run it as is. Qwen3.6 leads because it is our flagship.

Qwen · Qwen3.6

6 recipes

Qwen3.6 27B FP8

FP8 single 27B

Qwen/Qwen3.6-27B-FP8

sparkrun run @atlas/qwen3.6-27b-fp8 --hosts localhost

Qwen3.6 27B FP8 MTP

FP8 single 27B

Qwen/Qwen3.6-27B-FP8

sparkrun run @atlas/qwen3.6-27b-fp8-mtp --hosts localhost

Qwen3.6 35B A3B FP8 Bf16head

FP8 single 35B

Qwen/Qwen3.6-35B-A3B-FP8

sparkrun run @atlas/qwen3.6-35b-a3b-fp8-bf16head --hosts localhost

Qwen3.6 35B A3B FP8 MTP

FP8 single 35B

Qwen/Qwen3.6-35B-A3B-FP8

sparkrun run @atlas/qwen3.6-35b-a3b-fp8-mtp --hosts localhost

Qwen3.6 35B A3B FP8 Nvfp4head

FP8 single 35B

Qwen/Qwen3.6-35B-A3B-FP8

sparkrun run @atlas/qwen3.6-35b-a3b-fp8-nvfp4head --hosts localhost

Qwen3.6 35B A3B NVFP4

NVFP4 single 35B

nvidia/Qwen3.6-35B-A3B-NVFP4

sparkrun run @atlas/qwen3.6-35b-a3b-nvfp4 --hosts localhost

Every recipe is the single source of truth in atlas-recipes, so the site cannot list a model we do not ship. EP=2 is Expert Parallelism across two GB10 nodes.

Our fused Qwen3.6 Gated DeltaNet kernel ships in Hugging Face Transformers. transformers #46423 · kernel repo on the Hub. We are Qwen Dev Ambassadors and we ship a recipe for every Qwen release. Qwen ambassadors ↗

// 05 · get running

Up and running in one command.

This is the first 60 seconds. Everything after, per model recipes, EP=2, tuning, lives in the docs.

bash

$ curl -fsSL https://atlasinference.io/quickstart.sh | sh
# checks for sparkrun, installs via uvx if missing, runs the flagship recipe

The script checks for sparkrun, installs it with uvx if missing, then runs the flagship Qwen3.6 recipe.

Prefer to inspect first?

Rather not pipe curl to a shell. Install sparkrun, then run the flagship recipe direct.

$ uvx sparkrun setup install

$ sparkrun run @atlas/qwen3.6-35b-a3b-fp8-mtp --hosts localhost

The first 60 seconds live here. Everything after, per model recipes, EP=2, tuning, lives in the docs. Read the deployment guide · README

// 06 · mission

Local AI worth having, open to all.

AI worth having should run on hardware you own. Prosumer machines like DGX Spark and Strix Halo are the first generation that makes that real, and we build for them first.

Pure Rust because the whole stack should be inspectable by one person, HTTP to kernel dispatch, no interpreter in the hot path. We develop on machines granted by NVIDIA and AMD. Both camps handed us silicon and we intend to contnue proving to the world the raw power of these machines.

Open to all. The test fleet is the community desks. If a model matters to you, it matters to us.

// 07 · contribute

Your machine is the test fleet.

Atlas grows from the desks it runs on. Every path below is real and linked. Contributions ship in the Community Edition under AGPLv3, and the CLA lets us re license for the Enterprise Edition.

Run the serve matrix

Boot the matrix on your own GB10 and report what you see. Regressions and wins both get featured.

Deployment guide →

Add or tune a recipe

Recipes are the model SSOT. Add a model, tune a quant, open a PR against atlas-recipes.

atlas-recipes →

Kernels in Rust and CUDA

Hand tuned attention, MoE, GDN, Mamba-2 for Blackwell. Register level work, no generic fallbacks.

Good first issues →

Docs, triage, ideas

Improve the guide, triage issues, or just tell us what you are running in Discord.

Discussions →

Contributions are AGPLv3 and the CLA permits Enterprise re licensing. See CONTRIBUTING.md.

// 08 · next up

What we are building next.

Everything real links to an issue, a PR, or the Discord where the work happens. The teasers are teasers, and we say so.

🎁Trifecta, three Sparks

Next up

Three GB10s in one rig for the really big models. More memory, more experts, more headroom. We are wiring up the topology now.

Talk trifecta in Discord →

Intel Arc Pro B70

In talks

Active conversations with Intel about bringing Atlas to the Arc Pro B70. The email chain is live and we are waiting on confirmation. Nothing signed yet, but we are fired up about it.

Follow along in Discord →

🎁AMD Strix Halo

In bring up

Native gfx1151 through SCALE. Serving Qwen at NVFP4 quality on a branch and stabilizing CI.

PR #187 →

MLPerf Inference v6.1

Prepping

The same CUDA source submitted across GB10 and gfx1151. No numbers until MLCommons publishes.

MLCommons →

Qwen GDN kernel upstream

Merged

Our fused Gated DeltaNet kernel for Qwen3.6 landed in Hugging Face Transformers.

transformers #46423 →

Bigger model support

Tracking

Large MoE NVFP4 ports across EP topologies, DeepSeek and Kimi class, tracked in the open.

Open issues →

// 09 · reach out

Come work with us.

Building on Spark or Strix, bringing hardware to the table, or wanting to partner or talk business. We want to hear from you and we move fast.

[email protected]

Or pull up in Discord

Business

Running Atlas in production or eyeing the Enterprise Edition. Tell us what you need and we will get you sorted.

Partnerships

Frameworks, benchmarks, standards bodies. If it makes local AI better we are all in.

Hardware

Got silicon you want Atlas running on. Send it our way and watch what we do with it.