Distributed vLLM — Multi-Node LLM Serving on DGX Spark and Jetson Thor
Published:
Guide to running vLLM across multiple nodes including DGX Spark and Jetson Thor systems for distributed large language model serving.
Full tutorial: Distributed vLLM
What You’ll Learn
- Setting up a Ray cluster across DGX Spark and Jetson Thor nodes
- NCCL environment variable configuration for multi-node communication
- Serving large models like Nemotron Super 120B across distributed GPU resources
- Network configuration and performance optimization
