Your best model from last month?
You probably can't reproduce it.
And when someone asks what changed… you don't have an answer.
One command. A complete record of what actually happened. Works with any training script — no code changes, no framework.
Most ML teams can't explain their own model.
Not because they're careless — the tools that were supposed to track this are heavier than the work itself. Then someone asks what changed.
Last month's best model. Nobody can reproduce it.
Two engineers changed the config. Neither remembers what.
A checkpoint in S3. No one knows what code produced it.
One command. No instrumentation.
Prefix any training script with roar run. A runtime
observer records what actually happened — not what a logger
remembered to log. If it ran, roar saw it.
- env & deps
- data & S3 objects
- config & args
- git SHA & diff
- GPU / CUDA state
- model artifacts
Capture what ran. Store the record. Control what runs next.
Use any piece on its own. Together they give you a complete record of how your models are actually built — and what gets to run next.
If it ran, roar saw it.
A CLI that observes your training at runtime — data, code, environment, artifacts. No instrumentation. No declarations.
Read about roar →Every model has a recipe.
A content-addressable registry of every run and artifact. Resolve any hash back to the code, data, and environment that made it.
Read about GLaaS →Approve before compute.
Training requests as pull requests. Nothing runs until someone — or a policy — says go.
Why TReqs →Outcomes, not dashboards.
Questions you couldn't answer before, now answered on demand.
complete record
Every run captured automatically — a queryable record of what actually happened, not a convention or a log file.
run it again exactly
Re-run any past run with the same env, data, and config. Not "close enough."
what changed
Diff any two runs: code, data, hyperparams, hardware.
artifact trace
Every checkpoint points back to the exact run that produced it.
cost attribution
Know which experiment cost what. Know who spent what.
approve before compute
Nothing runs until someone — or a policy — says go. Runs are reviewed with context before they start.
Decide what gets to run — and why — before the money moves.
Right now, anyone — or any agent — can burn $1,000 on a run. And you find out after.
TReqs turns that into a request before it runs.
- file a request: config + data + compute budget
- team or policy reviews it before it starts
- roar captures what actually ran
- the request resolves with a real artifact
- Without TReqs: runs happen. You investigate later.
- With TReqs: runs are reviewed with context before they start.
Pull requests for training runs. Control plane for ML compute.
Free for individuals. Paid when you're a team.
Readers are always free. Agents don't count as seats.
Individual use. The CLI and local lineage.
- roar CLI, unlimited runs
- local GLaaS store
- run diff & replay
- free for academic use
Shared lineage, unlimited readers, TReqs coordination.
- everything in Free
- hosted & shared GLaaS
- full TReqs coordination
- free reader seats
- API & cost attribution
VPC, SSO, audit. When procurement asks.
- everything in Team
- self-hosted GLaaS
- SSO / SAML, audit logs
- support SLAs
Also a Starter tier at $19/seat for 1–3 ML engineers · see full pricing →
Point it at your last training run. See what you've been missing.
Thirty seconds to install. No account needed. No code changes.