Silicon built for real AI performance
RealPerf.ai designs inference and training accelerators that deliver more tokens per watt than anything on the market — so you can serve larger models at a fraction of the power and cost.
- Drop-in PCIe & OAM
- PyTorch & JAX ready
- Sampling Q3
Throughput
Efficiency
HBM used
Die temp
Tokens/sec & power draw
Last 12 hours
Powering AI clouds and research labs worldwide
Architecture
Engineered end to end for AI compute
From the dataflow core to the compiler, every layer of RealPerf silicon is co-designed to move data less and compute more.
Dataflow compute cores
1,024 tensor cores built on a 3nm process feed a spatial dataflow fabric that keeps utilization above 90% on real workloads.
192GB HBM3e
Massive on-package memory at 5.3 TB/s bandwidth lets you hold 100B+ parameter models resident without offloading.
Best-in-class efficiency
Up to 3.8 tokens per joule — delivering more inference per rack while cutting datacenter power and cooling costs.
Scale-out fabric
900 GB/s chip-to-chip links connect up to 64 accelerators into a single coherent pod with near-linear scaling.
On-die SRAM
256MB of distributed on-chip SRAM slashes memory round-trips, keeping attention and KV cache close to compute.
Open software stack
Native PyTorch and JAX support with an MLIR-based compiler — bring existing models and run them unmodified.
2.1 PFLOPS
Peak FP8 per accelerator
3.8×
Tokens/watt vs. leading GPU
5.3 TB/s
HBM3e memory bandwidth
64
Chips per coherent pod
The lineup
One architecture, from edge to rack
Every RealPerf accelerator shares the same dataflow core and software stack — scale from a single card to a full pod without rewriting a line of code.
RP-1 Edge
Low-power inference for edge servers and workstations.
- 48GB HBM3e
- 75W TDP
- Single-slot PCIe
- PyTorch & ONNX
RP-8 Server
The flagship accelerator for training and high-throughput inference.
- 192GB HBM3e @ 5.3 TB/s
- 700W liquid or air cooled
- OAM & PCIe Gen5
- 900 GB/s scale-out links
- Full compiler toolchain
RP-64 Pod
A turnkey rack of 64 accelerators as one coherent system.
- 12TB pooled HBM
- Coherent fabric mesh
- Rack-scale liquid cooling
- White-glove deployment
Put real performance in your racks
Reserve early access to RealPerf accelerators and cut the power, cost, and footprint of running AI at scale. Sampling begins Q3.