Every row is a model deployment. Every column is a GPU cluster configuration. The number inside is the upper bound of what one cluster of that shape can earn per day serving that model on UOMI.
How to read. Each cell shows the maximum daily revenue one cluster of that GPU configuration can earn serving the model on UOMI. Background color encodes the value on a log scale: pale gold = low, deep red = high, green = exceptional. The ★ marks each row's best configuration.
| Model | A 2×4090 | B 4×4090 | C 2×5090 | D 4×5090 | E 2×L40s | F 4×L40s | G 1×Pro6K | H 2×Pro6K |
|---|---|---|---|---|---|---|---|---|
| Qwen3.6 27B | up to$49.7 | up to$74.5 | up to$74.5 | up to$149★ | up to$49.7 | up to$74.5 | up to$74.5 | up to$149★ |
| Gemma 4 31B | up to$15.2 | up to$22.8 | up to$25.8 | up to$36.9★ | up to$13.6 | up to$22.8 | up to$25.8 | up to$45.5 |
| GLM 5 | — | — | — | — | — | — | — | up to$39.7★ |
| MiniMax M2.5 | — | — | — | — | — | up to$21.4 | — | up to$37.5★ |
| Gemma 4 31B | up to$17.6 | up to$29.3 | up to$29.3 | up to$44★ | up to$17.6 | up to$29.3 | up to$29.3 | up to$44 |
| Gemma 4 26B A4B | up to$34★ | up to$34 | up to$34 | up to$34 | up to$34 | up to$34 | up to$34 | up to$34 |
| DeepSeek V4 Flash | — | — | — | — | — | up to$10.7 | — | up to$21.3★ |
| Qwen3.5-9B | up to$10.5 | up to$21★ | up to$21 | up to$21 | up to$10.5 | up to$21 | up to$21 | up to$21 |
| Qwen3.6 35B A3B | up to$90.3 | up to$136 | up to$136 | up to$136 | up to$67.8 | up to$136 | up to$136 | up to$271★ |
| Qwen3.5-35B-A3B | up to$79 | up to$119 | up to$119 | up to$237★ | up to$79 | up to$119 | up to$119 | up to$237 |
| MiniMax M2.5 | — | — | — | — | — | up to$21.3 | — | up to$35.5★ |
| Llama 3.3 70B | up to$2.36 | up to$13.7 | up to$3.97 | up to$21.6 | up to$8.39 | up to$13.7 | up to$15.1 | up to$25.2★ |
| DeepSeek V4 Flash | — | — | — | — | — | up to$8.86 | — | up to$17.7★ |
| Gemma 4 26B A4B | up to$48 | up to$48 | up to$96★ | up to$96 | up to$48 | up to$48 | up to$96 | up to$96 |
| gpt-oss-120b | up to$4.75 | up to$9.5 | up to$6.33 | up to$9.5 | up to$9.5 | up to$19★ | up to$9.5 | up to$19 |
Why 2× RTX Pro 6000 dominates. With 96 GB on a single GPU, a 1T-parameter MoE that needed 28 clusters per instance on 2× 4090 only needs 1, and the interconnect penalty stops compounding. The bigger the model, the more bigger GPUs pay off.
Caveats. Throughput estimates are batched-aggregate at FP8 with sublinear scaling for multi-cluster sharding (interconnect_factor = 1/√(1 + 3.0·(N−1))) calibrated for public-internet latency (~50 ms RTT, ~500 Mbps effective). A provider running entirely within one cloud region with private peering would see closer to K=0.5; the K=3.0 used here is for cross-region operator distribution. Single-cluster numbers (where N=1) are unaffected and most reliable.
80% of every dollar earned lands directly in the wallet of the GPU that served the request. The other 20% buys back $UOMI on the open market.