What your GPU could earn.

Every row is a model deployment. Every column is a GPU cluster configuration. The number inside is the upper bound of what one cluster of that shape can earn per day serving that model on UOMI.

The full $/cluster/day matrix

Pick a model. Pick a GPU.

How to read. Each cell shows the maximum daily revenue one cluster of that GPU configuration can earn serving the model on UOMI. Background color encodes the value on a log scale: pale gold = low, deep red = high, green = exceptional. The ★ marks each row's best configuration.

$0$280+★ best config for that model

Model	A 2×4090	B 4×4090	C 2×5090	D 4×5090	E 2×L40s	F 4×L40s	G 1×Pro6K	H 2×Pro6K
Qwen3.6 27B	up to$49.7	up to$74.5	up to$74.5	up to$149★	up to$49.7	up to$74.5	up to$74.5	up to$149★
Gemma 4 31B	up to$15.2	up to$22.8	up to$25.8	up to$36.9★	up to$13.6	up to$22.8	up to$25.8	up to$45.5
GLM 5	—	—	—	—	—	—	—	up to$39.7★
MiniMax M2.5	—	—	—	—	—	up to$21.4	—	up to$37.5★
Gemma 4 31B	up to$17.6	up to$29.3	up to$29.3	up to$44★	up to$17.6	up to$29.3	up to$29.3	up to$44
Gemma 4 26B A4B	up to$34★	up to$34	up to$34	up to$34	up to$34	up to$34	up to$34	up to$34
DeepSeek V4 Flash	—	—	—	—	—	up to$10.7	—	up to$21.3★
Qwen3.5-9B	up to$10.5	up to$21★	up to$21	up to$21	up to$10.5	up to$21	up to$21	up to$21
Qwen3.6 35B A3B	up to$90.3	up to$136	up to$136	up to$136	up to$67.8	up to$136	up to$136	up to$271★
Qwen3.5-35B-A3B	up to$79	up to$119	up to$119	up to$237★	up to$79	up to$119	up to$119	up to$237
MiniMax M2.5	—	—	—	—	—	up to$21.3	—	up to$35.5★
Llama 3.3 70B	up to$2.36	up to$13.7	up to$3.97	up to$21.6	up to$8.39	up to$13.7	up to$15.1	up to$25.2★
DeepSeek V4 Flash	—	—	—	—	—	up to$8.86	—	up to$17.7★
Gemma 4 26B A4B	up to$48	up to$48	up to$96★	up to$96	up to$48	up to$48	up to$96	up to$96
gpt-oss-120b	up to$4.75	up to$9.5	up to$6.33	up to$9.5	up to$9.5	up to$19★	up to$9.5	up to$19

Why 2× RTX Pro 6000 dominates. With 96 GB on a single GPU, a 1T-parameter MoE that needed 28 clusters per instance on 2× 4090 only needs 1, and the interconnect penalty stops compounding. The bigger the model, the more bigger GPUs pay off.

Caveats. Throughput estimates are batched-aggregate at FP8 with sublinear scaling for multi-cluster sharding (interconnect_factor = 1/√(1 + 3.0·(N−1))) calibrated for public-internet latency (~50 ms RTT, ~500 Mbps effective). A provider running entirely within one cloud region with private peering would see closer to K=0.5; the K=3.0 used here is for cross-region operator distribution. Single-cluster numbers (where N=1) are unaffected and most reliable.

Plug in. Get paid.

80% of every dollar earned lands directly in the wallet of the GPU that served the request. The other 20% buys back $UOMI on the open market.

Run a node →How the split works