Distributed vLLM — Multi-Node LLM Serving on DGX Spark and Jetson Thor

Published:

Guide to running vLLM across multiple nodes including DGX Spark and Jetson Thor systems for distributed large language model serving.

Full tutorial: Distributed vLLM

What You’ll Learn

  • Setting up a Ray cluster across DGX Spark and Jetson Thor nodes
  • NCCL environment variable configuration for multi-node communication
  • Serving large models like Nemotron Super 120B across distributed GPU resources
  • Network configuration and performance optimization