These aren't edge cases.
They're last quarter.
Every scenario below has happened to a team shipping AI fast — before they had lineage, before they had a review layer, sometimes before they knew they needed either.
Best result in weeks. The engineer had local tweaks that never got committed — a dataset filter, a different LR schedule, something. The model is in S3. The context that made it work is gone. Nobody can reproduce it, and nobody can explain why it was better.
roar refuses to run on a dirty repo — fast failure before the compute starts. And if you ran it clean, every input is recorded: dataset hash, git SHA, args, environment. Use the model's hash to get the full recipe back.
You need to know what data trained the model you're about to deploy. You know the bucket. You don't know which version of the dataset, whether the preprocessing filter was applied, whether there is test contamination. The filenames have changed three times. The engineer who ran it remembers "roughly what it was."
roar hashes inputs at runtime — not filenames, the content.
Run roar inputs on any model artifact and get back
every input, every version, every dataset that fed it.
Membership queries work too: if you need to know whether a
specific file was in training, whether PII reached the model,
or whether you have a compliance exposure, you can check.
v1 had a quirk. v2 fixed it and outperformed everything. Now v3 is worse — not by much, but consistently. The runs were three weeks apart, different cluster, possibly a different engineer on one of them. You know something changed. Finding it means reconstructing both runs from memory, Slack messages, and whatever made it into git.
roar diff compares the full DAG of any two runs: data version, preprocessing steps, hyperparams, code, environment. The thing that changed between v1 and v2 is a diff, not a reconstruction. You can see exactly what you did right — and do it again deliberately.
She trained the model everything else is built on. She knows the dataset provenance, the preprocessing decisions that aren't in the README, the hyperparams that were tried and quietly abandoned. She's leaving Friday. The handoff doc she wrote is four bullet points.
Every run she launched with roar is queryable — dataset hash, code version, full environment, args, the W&B links roar scraped along the way. The team doesn't inherit a checkpoint. They inherit the complete record of how everything was made, and they can reproduce any of it from scratch.
An agent decided a model needed fine-tuning and launched the run. Or a new engineer, moving fast, kicked off four jobs to try different configs. Nobody approved it. Nobody knew the budget. You find out from the invoice. Two of the runs were near-duplicates. One was on the wrong dataset version.
TReqs turns "launch a run" into "file a training request." Dataset, compute, budget, rationale — filed before anything starts. A teammate or a policy reviews it. Configurable thresholds mean prototypes don't wait, while expensive runs don't slip through. When the job runs, roar captures everything. The training request resolves with a real artifact, a real cost, a real record. You were never investigating — you were approving.
Thirty seconds. No code changes. No account required.
Point roar at your next training run. Everything that matters gets recorded automatically — whether you're on one node or twenty.