Prismatic Labs
Cloud & Research Roadmap
Prismatic Labs is building open-source tooling to detect, attribute, and control inference waste in production AI systems. This page describes where we’re headed technically, the infrastructure that would accelerate it, and the principles we apply to compute resources.
Technical direction
Four near-term priorities for Vetch and the Prismatic Labs research programme.
Priority 1
Multi-cloud inference coverage
Vetch currently instruments OpenAI, Anthropic, Vertex AI, and Azure OpenAI. The next step is parity across managed inference APIs — including Amazon Bedrock — with consistent waste detection, attribution, and OTLP export across providers.
Needs: API access to Bedrock, Vertex AI Gemini endpoints, and Azure OpenAI at scale for cross-provider benchmarking.
Priority 2
GPU energy model calibration
Current per-call energy estimates rely on token-count heuristics and provider power profiles. To improve accuracy — particularly for local models running on Ollama, vLLM, and llama.cpp — Vetch needs calibration against real GPU power measurements under controlled inference workloads.
Needs: GPU compute (NVIDIA A100/H100 class) for controlled inference benchmarking runs.
Priority 3
Public benchmarks and hosted dashboards
Inference waste benchmark results, regional carbon comparisons, and the Vetch inference calculator should be publicly accessible with reproducible methodology. This means hosting lightweight static dashboards and benchmark artefacts at stable URLs, with documented data provenance.
Needs: Storage and hosting for static assets, benchmark data, and public demo endpoints.
Priority 4
Observability pipeline automation
Vetch emits OTLP-compatible events today. The roadmap includes pre-built Grafana dashboard templates, CI/CD benchmark automation for regression testing of waste detection accuracy, and structured compliance exports aligned with EU AI Act and CSRD reporting requirements.
Needs: Grafana Cloud or self-hosted Grafana, OTLP pipeline infrastructure, CI compute for benchmark runs.
Responsible compute
We use cloud resources to measure and reduce waste — not create more of it.
Inference efficiency tooling should be held to the same standard it promotes. These are our operating principles for cloud and compute resources.
Budget-bounded tests
All benchmark runs are scoped to defined cost and token budgets. No open-ended experiments. Every run has a budget ceiling and a clear stopping condition.
Open-source outputs
All benchmark methodology, scripts, and results are published under Apache 2.0. The point is public evidence of inference efficiency, not proprietary advantage.
Reproducible artefacts
Results are published with the environment, model versions, and data sources needed to reproduce them. Uncertainty is documented, not hidden.
No unnecessary training runs
Vetch is an inference observability tool. The roadmap does not include model training. Compute is used for calibration, benchmarking, and tooling, not training runs.
Measurement-first
Compute decisions follow measurement. We benchmark before scaling, track energy and cost per run, and publish what we find — including the cases where estimates were wrong.
Developer tooling focus
The output of cloud compute is tooling that helps other AI teams use cloud resources more efficiently. Infrastructure investment here has a multiplier effect on the wider ecosystem.
Ecosystem value
Why inference efficiency tooling matters for cloud AI infrastructure.
Better AI resource utilisation
Stalled agent loops, RAG bloat, and retry storms waste compute on every iteration. Tools that detect and stop these patterns reduce unnecessary load on cloud inference infrastructure — freeing capacity and lowering the cost per useful output.
Observability around AI workloads
AI teams lack per-call visibility into cost, energy, and carbon. Vetch fills a gap that provider dashboards cannot — attributing spend and resource use to the features, customers, and workflows that drive it. This makes AI workloads legible in the same way APM tools made web services legible.
Public education and open tooling
Public benchmark dashboards, reproducible methodology, and open-source tooling create educational artefacts that raise the quality of inference efficiency work across the ecosystem — not just at Prismatic Labs.
Get in touch
For cloud and infrastructure partnership enquiries, startup programme reviewers, or research collaborations — contact Prismatic Labs directly.
