Skip to content

Your best model from last month?
You probably can't reproduce it.

And when someone asks what changed… you don't have an answer.

One command. A complete record of what actually happened. Works with any training script — no code changes, no framework.

works with PyTorch, JAX, TF — no wrappers, no decorators
The reality

Most ML teams can't explain their own model.

Not because they're careless — the tools that were supposed to track this are heavier than the work itself. Then someone asks what changed.

01

Last month's best model. Nobody can reproduce it.

02

Two engineers changed the config. Neither remembers what.

03

A checkpoint in S3. No one knows what code produced it.

How it works

One command. No instrumentation.

Prefix any training script with roar run. A runtime observer records what actually happened — not what a logger remembered to log. If it ran, roar saw it.

  • env & deps
  • data & S3 objects
  • config & args
  • git SHA & diff
  • GPU / CUDA state
  • model artifacts
install
% uv pip install roar-cli
% roar init
# wrap any script — no code changes
% roar run python train.py
% roar run torchrun --nproc 8 train.py
Capture. Store. Control.

Capture what ran. Store the record. Control what runs next.

Use any piece on its own. Together they give you a complete record of how your models are actually built — and what gets to run next.

roar

If it ran, roar saw it.

A CLI that observes your training at runtime — data, code, environment, artifacts. No instrumentation. No declarations.

Read about roar →
GLaaS

Every model has a recipe.

A content-addressable registry of every run and artifact. Resolve any hash back to the code, data, and environment that made it.

Read about GLaaS →
TReqs

Approve before compute.

Training requests as pull requests. Nothing runs until someone — or a policy — says go.

Why TReqs →
What you get

Outcomes, not dashboards.

Questions you couldn't answer before, now answered on demand.

complete record

Every run captured automatically — a queryable record of what actually happened, not a convention or a log file.

run it again exactly

Re-run any past run with the same env, data, and config. Not "close enough."

what changed

Diff any two runs: code, data, hyperparams, hardware.

artifact trace

Every checkpoint points back to the exact run that produced it.

cost attribution

Know which experiment cost what. Know who spent what.

approve before compute

Nothing runs until someone — or a policy — says go. Runs are reviewed with context before they start.

TReqs

Decide what gets to run — and why — before the money moves.

Right now, anyone — or any agent — can burn $1,000 on a run. And you find out after.

TReqs turns that into a request before it runs.

  • file a request: config + data + compute budget
  • team or policy reviews it before it starts
  • roar captures what actually ran
  • the request resolves with a real artifact
  • Without TReqs: runs happen. You investigate later.
  • With TReqs: runs are reviewed with context before they start.

Pull requests for training runs. Control plane for ML compute.

Pricing

Free for individuals. Paid when you're a team.

Readers are always free. Agents don't count as seats.

Free
$0 / forever

Individual use. The CLI and local lineage.

  • roar CLI, unlimited runs
  • local GLaaS store
  • run diff & replay
  • free for academic use
Install roar
Enterprise
Custom

VPC, SSO, audit. When procurement asks.

  • everything in Team
  • self-hosted GLaaS
  • SSO / SAML, audit logs
  • support SLAs
Talk to us

Also a Starter tier at $19/seat for 1–3 ML engineers · see full pricing →

Try it on your last run

Point it at your last training run. See what you've been missing.

Thirty seconds to install. No account needed. No code changes.

$ uv pip install roar-cli