What are the full specs of NVIDIA DGX Spark?

NVIDIA DGX Spark delivers up to 1 petaflop of AI compute using the Grace Blackwell architecture. It features 128GB of unified memory shared between the Grace CPU and Blackwell GPU, NVLink-C2C interconnect, up to 4TB NVMe SSD storage, ConnectX-7 networking, and fits in a compact desktop form factor running on standard power — no specialised cooling or data centre infrastructure required.

How much does NVIDIA DGX Spark cost?

NVIDIA has positioned DGX Spark with a starting price around USD $3,000-$4,999 for the base configuration, with enterprise-configured systems reaching higher depending on storage, memory, and support packages. Compared to cloud GPU costs of $2-8 per hour for comparable compute, DGX Spark typically reaches cost parity within 6-12 months of consistent use and delivers lower TCO within 18-24 months.

What is NVIDIA DGX Spark? The Complete Enterprise Guide

Q: What models can run on DGX Spark?

DGX Spark's 128GB unified memory can run models up to approximately 200 billion parameters at reduced precision (INT4/INT8). This includes Llama 3.1 70B at full FP16, Llama 3.1 405B at INT4 quantisation, Mixtral 8x22B, and most production-grade enterprise models. For fine-tuning, efficient techniques like LoRA and QLoRA enable adaptation of 70B+ parameter models within the memory envelope.

Q: Who is DGX Spark designed for?

DGX Spark is designed for enterprises that need local AI compute without data centre infrastructure: healthcare organisations running clinical AI on patient data, financial institutions deploying fraud detection on-premise, government agencies requiring air-gapped AI, research teams fine-tuning domain-specific models, and any organisation where data sovereignty prevents cloud AI adoption.

Q: How does DGX Spark compare to cloud GPU instances?

DGX Spark offers comparable compute to cloud instances like AWS p5.xlarge or Google Cloud a3-highgpu, but with zero data transfer, no per-hour billing, complete data sovereignty, and no internet dependency. Cloud wins for burst workloads and experimentation. DGX Spark wins for consistent inference workloads, regulated data, latency-sensitive applications, and air-gapped environments.

NVIDIA DGX Spark is the most significant shift in enterprise AI hardware since the introduction of the GPU for deep learning. One petaflop of AI compute. 128GB of unified memory. Desktop form factor. Standard power. It puts capabilities that previously required a data centre rack into a system that sits next to your monitor.

This is the definitive guide to DGX Spark — full technical specifications, real-world use cases, cost analysis, cloud comparison, and how it fits into the broader NVIDIA ecosystem. If you're evaluating DGX Spark for your enterprise, this is the resource you need.

What Is NVIDIA DGX Spark?

NVIDIA DGX Spark is a compact AI supercomputer built on the Grace Blackwell architecture. It combines NVIDIA's Grace CPU and Blackwell GPU into a single unified system with shared memory, delivering up to 1 petaflop (1,000 teraflops) of AI performance in a form factor roughly the size of a Mac Studio.

Announced at CES 2025 and shipping in the first half of 2025, DGX Spark represents NVIDIA's push to bring enterprise-grade AI compute out of the data centre and onto the desktop. It's not a consumer GPU. It's not a workstation graphics card. It's a purpose-built AI system designed for running, fine-tuning, and serving large language models locally.

The "Spark" name positions it as the entry point in NVIDIA's DGX family — above consumer hardware but designed for departmental deployment, edge AI, and sovereign compute scenarios where data cannot leave the premises.

What Are the Full Technical Specifications of DGX Spark?

Here are the key specifications that matter for enterprise AI workloads:

Architecture: NVIDIA Grace Blackwell — ARM-based Grace CPU + Blackwell GPU on a single module
AI Performance: Up to 1 petaflop (1,000 TFLOPS) of AI compute at FP4 precision
Memory: 128GB unified memory shared between CPU and GPU via NVLink-C2C, with up to 273 GB/s memory bandwidth
Storage: Up to 4TB NVMe SSD
Networking: ConnectX-7 for high-speed networking, supporting multiple DGX Spark units in cluster configurations
Form Factor: Compact desktop — approximately the size of a Mac Studio
Power: Standard wall power — no specialised electrical infrastructure required
Cooling: Air-cooled — no liquid cooling, no raised floors, no data centre HVAC
Operating System: NVIDIA DGX OS (Ubuntu-based Linux) with the full NVIDIA AI Enterprise software stack
Software Stack: Pre-installed with NeMo, NIM, CUDA, TensorRT, Triton Inference Server, RAPIDS

The unified memory architecture is the specification that matters most. Traditional systems separate CPU and GPU memory, forcing data transfers across PCIe that create bottlenecks for large models. Grace Blackwell's NVLink-C2C interconnect provides coherent shared memory at 900 GB/s — meaning a 70-billion-parameter model doesn't need to be partitioned or optimised for data movement. It simply fits.

NVIDIA DGX Spark hardware showing compact desktop form factor — DGX Spark: 1 petaflop of AI compute in a desktop form factor — no data centre required

What Models Can Run on DGX Spark?

The 128GB unified memory envelope determines what's possible. Here's the practical breakdown:

Full precision (FP16/BF16) inference:

Llama 3.1 70B — fits comfortably with room for KV cache and context
Mistral Large (123B) — requires careful memory management but runs
Mixtral 8x22B — sparse mixture-of-experts architecture fits within 128GB
Most enterprise models under 70B — with generous context windows

Quantised inference (INT4/INT8):

Llama 3.1 405B at INT4 — the largest open model, quantised to fit
Any model under 200B parameters at INT4 precision
Multiple smaller models running simultaneously for multi-agent architectures

Fine-tuning:

Full fine-tuning of models up to approximately 13B parameters
LoRA/QLoRA fine-tuning of 70B+ parameter models — the dominant enterprise fine-tuning approach
Adapter training for domain-specific customisation of any model that fits in memory

For enterprise deployments, this means a single DGX Spark can run a production-grade 70B language model serving multiple concurrent users, a domain-specific fine-tuned model for clinical decision support or financial analysis, or a multi-model pipeline combining a retrieval model with a generation model for RAG applications.

"128GB of unified memory isn't just a spec. It's the difference between running a toy demo and deploying a production AI system that handles real enterprise workloads."

Who Is DGX Spark Designed For?

DGX Spark serves a specific enterprise segment that previously fell between two chairs: too sensitive for cloud, too sophisticated for consumer GPUs.

Healthcare organisations. Hospitals, pathology labs, genomics companies, and pharmaceutical firms that need to run AI on patient data without it ever leaving the facility. A DGX Spark in a hospital's server room enables clinical AI — diagnostic assistance, drug interaction checking, radiology analysis, clinical document summarisation — with complete data sovereignty.

Financial institutions. Banks, insurance companies, and investment firms that need on-premise fraud detection, risk modelling, and customer analytics. Regulatory requirements under Bank Negara Malaysia's RMiT framework and similar ASEAN regulations make local compute a compliance necessity, not a preference.

Government and defence. Agencies requiring air-gapped AI for classified operations, intelligence analysis, document processing, and cybersecurity. DGX Spark enables these capabilities in physically isolated environments.

Legal firms. Law firms deploying AI-powered document review, contract analysis, and legal research on privileged client data that absolutely cannot touch a third-party cloud.

Research teams. Data scientists and ML engineers who need to iterate on model development, run experiments on proprietary datasets, and fine-tune models without uploading sensitive data to cloud environments. DGX Spark gives a single researcher more AI compute than entire departments had five years ago.

Edge and remote deployments. Operations in locations with limited or no internet connectivity — mining sites, offshore platforms, remote military installations, field hospitals — where AI must run completely locally.

How Much Does DGX Spark Cost?

NVIDIA has positioned DGX Spark starting around USD $3,000–$4,999 for the base configuration. Enterprise-configured systems with expanded storage, enhanced support packages, and NVIDIA AI Enterprise licensing reach higher price points.

The critical comparison isn't sticker price versus cloud hourly rates — it's total cost of ownership over the system's useful life:

Cloud GPU comparison (3-year TCO):

AWS p5.xlarge (1x H100, 80GB): ~$4.50/hour → $39,420/year → $118,260 over 3 years
Google Cloud a3-highgpu-1g (1x H100): ~$3.80/hour → $33,288/year → $99,864 over 3 years
Azure ND H100 v5: ~$4.00/hour → $35,040/year → $105,120 over 3 years
DGX Spark (enterprise configured): ~$5,000–$10,000 upfront + ~$2,000/year support → $11,000–$16,000 over 3 years

For consistent workloads — inference serving, daily fine-tuning runs, production model deployment — DGX Spark delivers 6–10x lower TCO than cloud GPU instances over three years. The breakeven typically occurs within 6–12 months.

Cloud retains its cost advantage for sporadic workloads (less than 10–15% utilisation), burst training runs, and experimental work where you don't yet know your compute requirements.

Enterprise AI infrastructure cost comparison — For consistent AI workloads, on-premise delivers dramatically lower TCO than cloud

How Does DGX Spark Compare to Cloud GPU Instances?

Beyond cost, the comparison involves several dimensions that matter differently depending on your use case:

Data sovereignty. DGX Spark: data never leaves your premises. Cloud: data transits to and is processed in a third-party data centre, potentially in another jurisdiction. For regulated industries, this alone decides the question.

Latency. DGX Spark: sub-millisecond inference latency with no network round-trip. Cloud: 50–200ms minimum depending on region, plus network variability. For real-time applications — clinical decision support, live fraud detection, voice agents — local inference is materially faster.

Availability. DGX Spark: available whenever the power is on. No GPU capacity shortages, no spot instance preemptions, no regional outages. Cloud: subject to capacity constraints (H100 instances remain scarce in many regions), provider outages, and network connectivity.

Scalability. Cloud wins here. Need 100 GPUs for a training run? Cloud delivers elastic scale that on-premise cannot match without massive capital investment. DGX Spark scales modestly — you can cluster multiple units via ConnectX-7, and NVIDIA's DGX SuperPOD provides rack-scale expansion — but elastic, on-demand scaling is cloud's structural advantage.

Operational complexity. DGX Spark requires local administration: OS updates, hardware monitoring, physical security. Cloud abstracts this away. The trade-off is control versus convenience. For organisations with IT teams (or managed service partners like NovaGenAI), this operational overhead is manageable and often preferred.

How Does DGX Spark Fit into the Broader NVIDIA Ecosystem?

DGX Spark isn't an isolated product. It's the entry point in a coherent ecosystem designed to scale from desktop to data centre to cloud:

DGX Spark → DGX Station → DGX SuperPOD. This is the on-premise scaling path. DGX Spark for departmental AI and edge deployment. DGX Station for workgroup-scale compute. DGX SuperPOD for enterprise-scale training and inference clusters. The software stack is identical across all three — models developed on Spark deploy to SuperPOD without modification.

DGX Cloud. NVIDIA's cloud-hosted DGX infrastructure, available through partnerships with Google Cloud, Microsoft Azure, and Oracle Cloud. DGX Cloud gives enterprises burst compute capacity without building their own data centre. A typical hybrid architecture uses DGX Spark for sensitive data and inference, with DGX Cloud for large-scale training runs on non-sensitive data.

NVIDIA AI Enterprise. The software platform that runs across all DGX hardware and cloud instances. It includes:

NeMo: Framework for training, fine-tuning, and customising large language models and multimodal AI
NIM (NVIDIA Inference Microservices): Pre-optimised, containerised inference endpoints for production deployment
CUDA: The foundational GPU computing platform — every AI framework runs on CUDA
TensorRT: Inference optimisation engine delivering up to 40x speedup over CPU inference through kernel fusion, precision calibration, and memory optimisation
Triton Inference Server: Production model serving with dynamic batching, multi-model support, and GPU/CPU scheduling
RAPIDS: GPU-accelerated data science — pandas, scikit-learn, and graph analytics at GPU speed for data preprocessing and feature engineering

This ecosystem coherence is DGX Spark's strategic advantage. You're not buying a box — you're entering an ecosystem where every piece of software, every optimisation, and every model format works seamlessly from your desk to the data centre to the cloud.

How Does NovaGenAI Deploy and Manage DGX Spark?

NovaGenAI is not a hardware reseller. We deploy DGX Spark as part of complete, production-grade AI systems. Here's what that means in practice:

Pre-deployment. We assess your workload requirements, data sensitivity classification, regulatory obligations, and existing infrastructure. We size the deployment correctly — DGX Spark for departmental AI, multiple units for higher throughput, or DGX SuperPOD for enterprise-scale requirements. We architect the complete system, not just the hardware.

Model development. We build custom AI models fine-tuned on your proprietary data, running entirely within your infrastructure. These aren't off-the-shelf models with a prompt template — they're purpose-built systems trained on your domain data: your clinical records, your financial transaction patterns, your operational documents. The models understand your business because they were built on your data.

Stack optimisation. We deploy and tune the full NVIDIA AI stack: NeMo for model management, NIM for optimised inference, TensorRT for maximum throughput, Triton for production serving, RAPIDS for data pipelines. The difference between a default installation and an optimised deployment can be 3–5x in inference performance.

Integration. DGX Spark connects to your existing systems: RAG pipelines pulling from your document management systems, API endpoints for your applications, SSO integration with your identity provider, audit logging to your SIEM. AI isn't useful in isolation — it must be woven into your operational workflow.

Ongoing management. Continuous monitoring, performance optimisation, model updates, security patching, and compliance reporting. We deploy and we stay. Your DGX Spark infrastructure is managed, monitored, and maintained as a production system — because that's what it is.

We also architect hybrid deployments where DGX Spark handles sensitive data on-premise while cloud infrastructure (Google Cloud, AWS, Azure) provides burst compute for training and non-sensitive workloads. The right architecture matches your regulatory reality, not a vendor's preference.

What Are the Limitations of DGX Spark?

No technology is a silver bullet. Understanding DGX Spark's limitations is essential for making the right deployment decision:

Training ceiling. 128GB unified memory limits full training to models around 13B parameters. For training larger models from scratch, you need DGX SuperPOD or cloud compute. However, the vast majority of enterprise AI involves fine-tuning, not training from scratch — and LoRA fine-tuning of 70B+ models works well within the memory envelope.
Single-GPU throughput. For high-concurrency production serving (thousands of simultaneous users), a single DGX Spark will hit throughput limits. Multiple units can be clustered, or the inference layer can be scaled with additional hardware.
No elastic scaling. If your workload is highly variable — massive training runs one week, minimal inference the next — cloud provides elasticity that on-premise cannot match without over-provisioning.
Operational responsibility. You (or your managed service partner) are responsible for hardware health, software updates, and physical security. This is a feature for sovereignty — but it's also a responsibility.

"DGX Spark isn't for everyone. It's for enterprises that need serious AI compute where the data lives — and that's a market that grows every quarter as regulations tighten."

What Does the Future of DGX Spark Look Like?

NVIDIA's product cadence suggests DGX Spark will follow the same rapid improvement trajectory as the data centre DGX line. The Grace Blackwell architecture is NVIDIA's current generation — the next generation (Rubin, expected 2026–2027) will likely bring significantly more memory and compute to the same form factor.

Three trends will accelerate DGX Spark adoption:

Model efficiency gains. Techniques like speculative decoding, sparse attention, and improved quantisation mean that models which require 128GB today will require 64GB tomorrow. The effective capability of DGX Spark is increasing even without hardware upgrades.

Regulatory expansion. Every major economy is tightening data sovereignty requirements. ASEAN's AI governance frameworks, Australia's Privacy Act reform, and sector-specific regulations in healthcare and finance all push more workloads toward on-premise deployment.

Enterprise AI maturity. As organisations move from AI experimentation to production deployment, the predictable economics and sovereignty guarantees of on-premise hardware become increasingly attractive. DGX Spark is positioned exactly at this inflection point.

The bottom line: DGX Spark puts genuine enterprise AI capability on your desk, under your control, with economics that beat cloud for consistent workloads. For organisations in regulated industries — or any enterprise that takes data sovereignty seriously — it's the most important piece of AI hardware released this decade.

Don Calaki

CEO & Founder, NovaGenAI

Don leads NovaGenAI's mission to build production-grade AI systems for enterprises across Southeast Asia and Australia. As an NVIDIA partner, NovaGenAI deploys DGX Spark infrastructure with custom models, optimised stacks, and managed services for regulated industries.

Frequently Asked Questions

DGX Spark delivers up to 1 petaflop of AI compute using Grace Blackwell architecture, with 128GB unified memory, NVLink-C2C interconnect, up to 4TB NVMe SSD, ConnectX-7 networking, in a compact desktop form factor running on standard power with air cooling.

Base configurations start around USD $3,000–$4,999, with enterprise-configured systems reaching higher. Compared to cloud GPU instances at $2–8/hour, DGX Spark typically reaches cost parity within 6–12 months and delivers 6–10x lower TCO over three years for consistent workloads.

128GB unified memory supports Llama 3.1 70B at full precision, Llama 3.1 405B at INT4 quantisation, Mixtral 8x22B, and most enterprise models. LoRA/QLoRA fine-tuning works on 70B+ parameter models. Multiple smaller models can run simultaneously for multi-agent architectures.

Enterprises needing local AI compute: healthcare organisations running clinical AI on patient data, financial institutions deploying on-premise fraud detection, government agencies requiring air-gapped AI, research teams fine-tuning models on proprietary data, and any organisation where data sovereignty prevents cloud adoption.

DGX Spark offers comparable compute with zero data transfer, no per-hour billing, complete data sovereignty, sub-millisecond latency, and no internet dependency. Cloud wins for burst workloads, elastic scaling, and experimentation. DGX Spark wins for consistent inference, regulated data, and air-gapped environments.