UOMI LIVE NOWCA: 0x3628d69aa2d66e9efe95ab1267d440dec24389b6
TRADE NOW ONUNISWAP
UOMI Logo
UOMI · Inference Network

Your GPU. Agent's AI calls. Your Wallet.

A distributed Inference Network running frontier open-source models on consumer GPUs. Permissionless, verifiable, paid per token.

The flywheel

100% buy pressure. Then split.

Every dollar of inference revenue market-buys $UOMI on-chain. The bought tokens then split: 80% to the GPU that served the request, 20% to the burn address.

→ Step 01 · Buy-back
100%

Every dollar spent by autonomous AI agents — Open Claw, Hermes, UOMI Agents — and aggregators like OpenRouter is used to market-buy $UOMI on-chain. No treasury cut. All revenue becomes buy pressure on the token.

→ Step 02 · To GPU provider
80%

Of the $UOMI just bought lands directly in the wallet of the GPU that served the request. Paid in $UOMI, on-chain, instantly.

→ Step 03 · To burn
20%

Of the $UOMI just bought is sent to the burn address. Permanent supply reduction, every burn tx is publicly verifiable on-chain.

How it works

A request walks through the network.

Three steps from API call to settlement. Verifiable end-to-end.

01 · DEMAND

An app hits the API.

Apps query UOMI directly or via aggregators — Llama, Qwen, DeepSeek, Mixtral — whatever the open-weights frontier looks like today. OpenAI-compatible. Drop-in.

02 · COMPUTE

A consumer GPU answers.

The job lands on the fastest free GPU in the mesh. Output is verified by Optimistic Proof of Computation and Deterministic Indeterminism before payment clears.

03 · SETTLE

The buy-back and split happen on-chain.

Revenue market-buys $UOMI on-chain, 100% of it. 80% to the provider's wallet, 20% to the burn address. Same block, every block, publicly auditable.

200B+
Daily tokens served by Hermes alone
via OpenRouter
$100M+
Annualized aggregated inference spend
doubling every few months
3
Live testnets battle-testing OPoC
Babbage · Finney · Turing
1 yr+
Continuous testnet operation
since 2024
Inference integrity

Trust the cluster, not the GPU.

Pay-for-work only works if the work is verifiable. The integrity guarantees are inherited from Optimistic Proof of Computation and Deterministic Indeterminism, peer-reviewed algorithms published in 2025 and running unmodified across three UOMI testnets for over a year.

01

The worker commits to an output.

When a node serves a request, it returns the generated tokens with the top-k log-probabilities the model assigned to those tokens. This is the inference's fingerprint.

02

Validators re-score against the same model.

A second node runs the worker's claimed input and output through the same model architecture and recomputes the log-probabilities. Re-scoring an existing answer is reproducible.

03

The check is tolerant by design.

Two honest GPUs produce log-probs that differ by a tiny floating-point delta. The protocol accepts that delta. A worker running a smaller model or no model at all diverges far beyond it, and is rejected.

Learn more about Deterministic Indeterminism
04

Optimistic sampling keeps verification cheap.

The network doesn't re-validate every inference. Validators are sampled randomly, so cheaters never know which call gets checked. Combined with slashing, the expected cost of cheating exceeds any gain.

Learn more about OPoC
Privacy & node integrity

Encrypted binaries: the second lock.

Deterministic Indeterminism catches a node that lies about its output. Encrypted binaries make sure the node can't lie about its execution either, and can't quietly read what passes through.

Continuous integrity attestation.

The node proves it's running the official build on every cycle, not just at launch.

No prompt sniffing.

Operators can't enable logs, can't read request payloads in cleartext, and can't exfiltrate user data.

Defense in depth.

Output-level checks and execution-level checks catch different attacker capabilities. Both have to fail for fraud to land.

Who it's for

One network. Three doors.

Pick the side of the market that fits — UOMI rewards all three.

For developers

Frontier OSS, OpenAI-compatible.

Swap one base URL and you're done. Stream, function-call, JSON mode. Pay per token, in dollars or $UOMI.

For GPU providers

Turn idle silicon into yield.

RTX 4090 or better. One-line installer. The moment a request hits your node, 80% lands in your wallet.

For $UOMI holders

Usage funds the buyback.

The Scarcity Engine. Every served token converts external revenue into $UOMI buy pressure, visible on-chain, hourly.

Models

Frontier OSS, permanently warm.

No cold starts, no model-loading tax. The biggest open-source models stay hot across the GPU mesh.

metaLlama 3.3 70B
$0.34 / M tok128k ctx
deepseekDeepSeek V3
$0.28 / M tok64k ctx
qwenQwen 2.5 Coder 32B
$0.18 / M tok128k ctx
mistralMixtral 8×22B
$0.41 / M tok64k ctx
googleGemma 2 27B
$0.16 / M tok8k ctx
nousHermes 3 405B
$1.12 / M tok128k ctx
For developers

If you can call OpenAI, you can call UOMI.

A single base-URL swap. Keep your SDK. Keep your prompts. Lose the closed-API tax.

inference.py
from openai import OpenAI

# UOMI Router is fully OpenAI-compatible
client = OpenAI(
  base_url="https://gateway.uomi.ai/v1",
  api_key="sk-uomi-...",
)

# Streaming chat completion
stream = client.chat.completions.create(
  model="Qwen/Qwen3.6-27B-FP8",
  messages=[
    {"role": "system",
      "content": "You are a helpful assistant."},
    {"role": "user",
      "content": "Explain decentralized inference."},
  ],
  max_tokens=512,
  temperature=0.7,
  stream=True,
)
for chunk in stream:
  delta = chunk.choices[0].delta.content
  if delta:
    print(delta, end="", flush=True)

# 80% of this call paid the GPU that served it.
# 20% just bought back $UOMI on-chain.

Open models. Open hardware. Open books.

Pick a side of the market. Inference today, paid today.