Prismatic Labs / Vetch

Stop runaway inference.

Vetch detects stalled agents, RAG bloat, retry storms, and unattributed spend — then warns, kills, reroutes, or throttles wasteful inference before it burns budget, latency, energy, and carbon.

pip install vetch

The problem

Old cloud waste was idle. AI waste is active.

Provider dashboards show total spend by model. They cannot attribute cost by feature, customer, or workflow — and they cannot stop the next occurrence. Vetch can.

Pattern What it looks like Status
Stalled agent loop Agent iterating without meaningful output progress ✓ Implemented
RAG bloat Retrieval context overwhelming the prompt with low-signal content ✓ Implemented
Prompt cache misses Repeated prompt structures that could be cached but aren’t ✓ Implemented
Unattributed spend Inference cost not tied to a feature, customer, or workflow ⚠ Partial
Retry storm Burst of repeated failed or near-identical calls 🔜 Planned
Zombie inference Active calls past expected session completion 🔜 Planned
Premium model overuse Expensive model used for tasks a cheaper one handles 🔜 Planned

Get started

One import. Vetch instruments all LLM calls automatically.

Detect and stop stalls

import vetch

vetch.instrument(region="us-east-1", tags={"service": "chat-api"})
vetch.set_stall_action("kill")   # or "warn" or "reroute"

# Your existing agent loop — unchanged.
# Vetch raises StallDetected before the next wasted call.
try:
    response = client.chat.completions.create(...)
except vetch.StallDetected:
    session.clear_stall()        # human-in-the-loop, then resume

Attribute cost, energy, and carbon

with vetch.wrap(tags={"feature": "rag-search", "customer": "acme"}) as ctx:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": query}]
    )

print(f"Cost:    ${ctx.event['estimated_cost_usd']:.5f}")
print(f"Energy:  {ctx.event['estimated_energy_wh']:.4f} Wh")
print(f"Carbon:  {ctx.event['estimated_carbon_g']:.4f} gCO2e")
print(f"Quality: {ctx.event['signal_quality']}")   # "live" | "estimated"

Vetch does not read prompts or completions — only model, tokens, latency, region, tags, and session context.

Reroute to a cheaper model automatically

vetch.set_stall_action("reroute", fallback_model="gpt-4o-mini")
# On STALL-001, Vetch silently substitutes gpt-4o-mini for the stalled call.
# Your code sees a normal response.

Attribution

Know exactly which feature, customer, and workflow is driving your LLM bill.

Tag every call with any keys you define. Vetch accumulates cost, energy, and carbon per tag combination and per session — making “our LLM bill went up” into “the RAG search feature for enterprise customers is the culprit.”

Built-in tag keys

feature · customer · team · workflow · environment

Session scope

Parent/child hierarchy, distributed propagation via W3C-compatible HTTP headers.

Required tags

vetch.require_tags(["customer"]) — flag untagged calls before they become unattributed spend.

Observability

Works with your existing observability stack.

Datadog Grafana Any OTLP-compatible sink

Fail-open

If Vetch encounters an error, your LLM call proceeds normally. Observability never blocks inference.

Fail-loud

Every log includes a signal_quality field. Stale or estimated data is clearly flagged.

Privacy-first

Zero access to prompts or completions. Only model, tokens, latency, region, and tags are observed.

Get started

Know exactly where your inference spend is going — in 7 days.

A structured adoption path. By day 7 you’ll have attributed spend by feature, customer, and workflow — and you’ll know which waste patterns are active and what they’re costing you.

Day 1 — Instrument

One import. Every LLM call across all providers tracked automatically — no other code changes.

import vetch
vetch.instrument(
  region="us-east-1",
  tags={"service": "chat-api"}
)

Days 1–7 — Tag and observe

Attribute spend to features, customers, and workflows. Run in warn-only mode — advisories fire without any intervention. See which patterns are active.

Day 7 — Audit and act

Run vetch audit for advisory events and a session token summary. Promote confident advisories to kill or reroute. Full attribution reports (spend by feature, customer, model) are in active development.

vetch audit

Enterprise and compliance

CSRD and SEC Scope 3 reporting

Every inference call produces per-call energy (Wh) and carbon (gCO2e) with documented uncertainty bounds — structured for EU CSRD and US SEC Scope 3 disclosure. Run vetch methodology for full provenance.

FinOps attribution

Attribute inference spend by feature, customer, team, workflow, and environment. Move from “the LLM bill went up” to “the RAG search feature for enterprise customers is the culprit” — in a week.

Zero prompt capture

Vetch observes metadata only — model, tokens, latency, region, and tags. No prompt or completion content ever leaves your execution environment. Air-gapped operation supported.

Ready to stop runaway inference?

Two lines of code. No prompt access. Fail-open. Start the 7-day audit or talk to us about your production setup.

Prismatic Labs

Open-source AI infrastructure tools.