Skip to content

Observability for Any LLM Provider

Anosys is vendor-agnostic. While we offer native integrations for OpenAI and Anthropic, you can connect any LLM provider to the Anosys Platform using OpenTelemetry (OTLP) or our REST API. If your model can be called from code, it can be observed with Anosys.


Supported LLM Providers

Anosys works with every major LLM vendor and model hosting platform:

Provider Models Integration Method
Google Gemini Gemini 2.5 Pro, Gemini 2.5 Flash OTLP, REST API
Meta Llama Llama 4, Llama 3.3, Code Llama OTLP, REST API
Mistral AI Mistral Large, Mistral Medium, Codestral OTLP, REST API
Cohere Command R+, Embed, Rerank OTLP, REST API
AWS Bedrock Claude, Llama, Titan, Mistral via Bedrock OTLP, REST API
Azure OpenAI GPT-4o, GPT-4.1, GPT-5 via Azure OTLP, REST API
Google Vertex AI Gemini, PaLM, custom models OTLP, REST API
Hugging Face Inference API, Inference Endpoints OTLP, REST API
Fireworks AI Llama, Mixtral, custom fine-tunes OTLP, REST API
Together AI Open-source models at scale OTLP, REST API
Groq Llama, Mixtral on LPU hardware OTLP, REST API
Replicate Open-source models on demand OTLP, REST API
Ollama Local models (Llama, Mistral, Phi) OTLP, REST API
vLLM / TGI Self-hosted inference servers OTLP, REST API
Custom / Private Fine-tuned models, proprietary endpoints OTLP, REST API

Don't see your provider listed? It doesn't matter — if you can call it from code, you can observe it with Anosys.


How to Integrate Any LLM

There are two approaches to adding observability for any model provider:

If your application is already instrumented with OpenTelemetry, or you're using a framework that supports it (LangChain, LlamaIndex, Haystack, CrewAI, AutoGen, etc.), point your OTLP exporter at your Anosys endpoint and data flows automatically.

Configure your environment:

1
2
3
4
5
6
export OTEL_SERVICE_NAME="my-llm-app"
export OTEL_TRACES_EXPORTER="otlp"
export OTEL_METRICS_EXPORTER="otlp"
export OTEL_LOGS_EXPORTER="otlp"
export OTEL_EXPORTER_OTLP_PROTOCOL="http/protobuf"
export OTEL_EXPORTER_OTLP_ENDPOINT="YOUR_ANOSYS_OTLP_ENDPOINT"

Replace YOUR_ANOSYS_OTLP_ENDPOINT with the OTLP endpoint URL from your Anosys Console pixel of type Agentic AI.

This works with any OTEL-compatible library, including:

  • Python: opentelemetry-sdk, opentelemetry-instrumentation-*
  • JavaScript/TypeScript: @opentelemetry/sdk-node
  • Go: go.opentelemetry.io/otel
  • Java: io.opentelemetry

For a full OTLP/HTTP setup example in Python, see the OpenTelemetry integration guide.

Option 2 — REST API

Wrap your LLM calls with a simple HTTP POST to the Anosys ingestion endpoint. This works from any language without any SDK dependency.

Python example — instrumenting a Google Gemini call:

import os
import time
import requests
import google.generativeai as genai

# Configure Gemini
genai.configure(api_key=os.environ["GEMINI_API_KEY"])
model = genai.GenerativeModel("gemini-2.5-flash")

# Anosys ingestion endpoint
ANOSYS_URL = "https://api.anosys.ai/ingestion/YOUR_UNIQUE_PATH"

# Make the LLM call and measure it
prompt = "Explain quantum computing in simple terms"
start_ts = time.time()

response = model.generate_content(prompt)

duration_ms = (time.time() - start_ts) * 1000

# Ship to Anosys
requests.post(ANOSYS_URL, json={
    "event_type": "llm_call",
    "s1": prompt,                        # prompt text
    "s2": response.text,                 # response text
    "s3": "gemini-2.5-flash",            # model name
    "s4": "google",                      # provider
    "n1": duration_ms,                   # latency
    "n2": response.usage_metadata.prompt_token_count,    # input tokens
    "n3": response.usage_metadata.candidates_token_count # output tokens
})

cURL example:

curl -X POST "https://api.anosys.ai/ingestion/YOUR_UNIQUE_PATH" \
  -H "Content-Type: application/json" \
  -d '{
    "event_type": "llm_call",
    "s1": "Explain quantum computing",
    "s2": "Quantum computing uses qubits...",
    "s3": "llama-4-maverick",
    "s4": "meta",
    "n1": 342.5,
    "n2": 12,
    "n3": 156
  }'

Option 3 — Python Decorator

For the lightest instrumentation, use the Anosys decorator to auto-capture any function that wraps an LLM call:

from anosys_logger import anosys_logger

@anosys_logger(source="mistral-chat")
def ask_mistral(prompt):
    # Your Mistral API call here
    response = mistral_client.chat.complete(
        model="mistral-large-latest",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

answer = ask_mistral("What are the benefits of RAG?")

The decorator automatically captures the function name, arguments, return value, and execution time.


What to Capture

Regardless of the integration method, we recommend sending these fields for maximum observability:

Field Type Description
event_type String Event category (e.g. llm_call, agent_step, embedding)
s1 / prompt String The user's input prompt
s2 / response String The model's response text
s3 / model String Model name and version
s4 / provider String Vendor name (e.g. google, meta, mistral)
n1 / duration Number End-to-end latency in milliseconds
n2 / input_tokens Number Input token count
n3 / output_tokens Number Output token count
n4 / cost Number Estimated cost per request (optional)

Agentic Framework Support

Many popular agentic frameworks already emit OpenTelemetry traces. Configure their OTLP exporter to point at Anosys and you get observability for free:

Framework Language OTEL Support
LangChain / LangGraph Python, JS Built-in via callbacks
LlamaIndex Python Built-in instrumentation
CrewAI Python OTEL-compatible
AutoGen Python OTEL-compatible
Haystack Python OTEL-compatible
Semantic Kernel C#, Python OTEL-compatible
Vercel AI SDK TypeScript OTEL-compatible
Mastra TypeScript OTEL-compatible

For frameworks without built-in OTEL support, use the REST API or Python decorator approach.


What You'll See in Anosys

Once data is flowing from any LLM provider, the Anosys Platform surfaces:

  • Request traces — every LLM call with prompt, response, model, and timing metadata
  • Cross-model comparison — side-by-side latency, cost, and quality metrics across providers and models
  • Token usage trends — track consumption over time by model, provider, or project
  • Latency analysis — identify slow calls, p50/p95/p99 breakdowns, and time-to-first-token
  • Error tracking — rate limits, timeouts, and API errors with automatic classification
  • Cost dashboards — per-request and aggregate cost estimates based on token usage and model pricing
  • Anomaly detection — ML-powered baselines that alert on latency spikes, cost overruns, or quality degradation without manual threshold configuration
  • Root cause analysis — causal graphs that connect failures to upstream triggers across providers, models, and agent steps
  • Alerts — context-aware notifications via Slack, email, PagerDuty, or webhooks for errors, cost overruns, and performance regressions
  • Automated metric generation — Anosys automatically generates key metrics from your traces and logs so you get dashboards in minutes, not days
  • Custom dashboards — build your own views or start with auto-generated dashboards for model health, provider comparison, and cost attribution
  • Custom pipelines — enrich, route, and transform your LLM telemetry with automated remediation workflows
  • Labeling — tag and annotate calls by provider, model, project, or team for segmentation and drill-down analysis
  • Natural language interface — ask questions about your LLM data in plain English and get answers backed by your telemetry

Next Steps