Register your interest
In the Furiosa AI RNGD
Furiosa AI RNGD is a data centre AI inference accelerator designed around a novel Tensor Contraction Processor (TCP) architecture. By using tensor contraction—rather than traditional matrix multiplication, RNGD achieves outstanding efficiency and performance, especially for large language model (LLM) inference tasks.
Built on TSMC’s 5 nm process, Furiosa AI RNGD supports multiple precision modes:
- 512 TFLOPS FP8 (BF16)
- 512 TOPS INT8
- 1024 TOPS INT4It comes complete with 48 GB of HBM3 memory, offering up to 1.5 TB/s memory bandwidth, plus an additional 256 MB on-chip SRAM to minimise data movement.
Physically, the Furiosa AI RNGD is deployed as a full-height, dual-slot PCIe Gen5 x16 accelerator card. It features passive cooling and operates at a rated TDP of 180 W—about half that of equivalent GPU accelerators.
Key Benefits & Innovations
- Efficiency Leader: Delivers up to 2.25× better performance per watt than legacy GPUs, dramatically reducing total cost of ownership.
- Built for Transformers: TCP architecture accelerates tensor-based operations (e.g., Q·K^T in attention mechanisms) natively—removing reshaping overhead and maximizing on-chip data reuse.
- Real-World Integration: Successfully adopted by LG AI Research for its Exaone LLM deployment, where RNGD showed higher throughput and energy savings.
- Programmable & Flexible: Supports virtualization (SR-IOV), secure boot, ECC-protected memory, and multi-instance deployment for AI cloud, on-prem, and hybrid use cases.
Technical Specifications
Specification | Value |
---|---|
Architecture | Tensor Contraction Processor (TCP) |
Process Node | TSMC 5 nm |
Compute (FP8 / INT8) | 512 TFLOPS / 512 TOPS |
Compute (INT4) | 1024 TOPS |
Memory | 48 GB HBM3 (1.5 TB/s BW); 256 MB SRAM |
Form Factor | PCIe Gen5 x16, dual-slot, full-height |
TDP | 180 W (air-cooled) |
Features | SR-IOV, ECC, Secure Boot, Virtualization |
Towards Sustainable, Scalable Inference
Furiosa AI RNGD signals a shift to more energy-efficient, high-density AI inference architecture. Its TCP approach aligns hardware compute with real tensor workloads, allowing service providers to maximise performance and reduce capex/opex.
Register your interest
In the Furiosa AI RNGD