NVIDIA Jetson AGX Thor Enhances Edge AI Models with 7x Performance Boost

NVIDIA has unveiled significant advancements in its Jetson AGX Thor platform, promising a remarkable 7x increase in generative AI performance since its launch in August 2025. This enhancement underscores NVIDIA’s commitment to continuous optimization across its software ecosystem, according to NVIDIA’s blog.

Enhanced Performance Through Software Updates

Initially launched with a 5x boost over previous models, the Jetson AGX Thor has seen its capabilities further expanded through regular software updates. These updates have enabled developers to leverage substantial improvements on AI models such as Llama and DeepSeek. NVIDIA’s approach includes supporting leading models soon after their release, allowing developers to experiment with the latest AI technologies swiftly.

Advanced AI Techniques and Support

The Jetson Thor platform accommodates major quantization formats, including the new NVFP4 from NVIDIA’s Blackwell GPU architecture. This helps optimize inference, a crucial component of edge computing. New techniques like speculative decoding are now supported, significantly accelerating generative AI workloads at the edge. Speculative decoding, in particular, has shown to boost the output tokens per second by 7x, as demonstrated in benchmarks with the Llama 3.3 70B model.

Continuous Optimization and Benchmarks

Recent updates, such as the vLLM container release, have further enhanced Jetson Thor’s performance. For instance, the platform now delivers up to 3.5x greater performance on the same model and quantization compared to its initial launch performance. This is evidenced by benchmarks showing increased output tokens per second on models like Llama 3.3 70B and DeepSeek R1 70B.

Day 0 Support and Future Prospects

Developers can take advantage of day 0 support for new models on Jetson Thor, exemplified by the early support for gpt-oss on platforms like llamacpp/ollama. This ensures that developers can run the latest generative AI models at the edge without delay. NVIDIA also provides week zero support for numerous NVIDIA Nemotron models, further enhancing the platform’s versatility.

Optimizing AI Performance

To fully exploit Jetson Thor’s potential, NVIDIA recommends employing techniques such as quantization and speculative decoding. Quantization, which reduces the numerical precision of a model’s data, allows for a smaller memory footprint and faster memory access, crucial for edge applications. Speculative decoding enhances performance by using a draft-verification approach, significantly reducing latency.

Combining these techniques with NVIDIA’s vLLM and EAGLE-3 support, developers can achieve substantial performance improvements for large language models on the Jetson Thor platform. This makes it a compelling choice for those seeking to deploy advanced AI applications at the edge.

Image source: Shutterstock

Source: https://blockchain.news/news/nvidia-jetson-agx-thor-enhances-edge-ai-models-7x-performance-boost