Observability for Any LLM Provider¶

Anosys is vendor-agnostic. While we offer native integrations for OpenAI and Anthropic, you can connect any LLM provider to the Anosys Platform using OpenTelemetry (OTLP) or our REST API. If your model can be called from code, it can be observed with Anosys.

Supported LLM Providers¶

Anosys works with every major LLM vendor and model hosting platform:

Provider	Models	Integration Method
Google Gemini	Gemini 2.5 Pro, Gemini 2.5 Flash	OTLP, REST API
Meta Llama	Llama 4, Llama 3.3, Code Llama	OTLP, REST API
Mistral AI	Mistral Large, Mistral Medium, Codestral	OTLP, REST API
Cohere	Command R+, Embed, Rerank	OTLP, REST API
AWS Bedrock	Claude, Llama, Titan, Mistral via Bedrock	OTLP, REST API
Azure OpenAI	GPT-4o, GPT-4.1, GPT-5 via Azure	OTLP, REST API
Google Vertex AI	Gemini, PaLM, custom models	OTLP, REST API
Hugging Face	Inference API, Inference Endpoints	OTLP, REST API
Fireworks AI	Llama, Mixtral, custom fine-tunes	OTLP, REST API
Together AI	Open-source models at scale	OTLP, REST API
Groq	Llama, Mixtral on LPU hardware	OTLP, REST API
Replicate	Open-source models on demand	OTLP, REST API
Ollama	Local models (Llama, Mistral, Phi)	OTLP, REST API
vLLM / TGI	Self-hosted inference servers	OTLP, REST API
Custom / Private	Fine-tuned models, proprietary endpoints	OTLP, REST API

Don't see your provider listed? It doesn't matter — if you can call it from code, you can observe it with Anosys.

How to Integrate Any LLM¶

There are two approaches to adding observability for any model provider:

Option 1 — OpenTelemetry (Recommended)¶

If your application is already instrumented with OpenTelemetry, or you're using a framework that supports it (LangChain, LlamaIndex, Haystack, CrewAI, AutoGen, etc.), point your OTLP exporter at your Anosys endpoint and data flows automatically.

Configure your environment:

export OTEL_SERVICE_NAME="my-llm-app"
export OTEL_TRACES_EXPORTER="otlp"
export OTEL_METRICS_EXPORTER="otlp"
export OTEL_LOGS_EXPORTER="otlp"
export OTEL_EXPORTER_OTLP_PROTOCOL="http/protobuf"
export OTEL_EXPORTER_OTLP_ENDPOINT="YOUR_ANOSYS_OTLP_ENDPOINT"

Replace YOUR_ANOSYS_OTLP_ENDPOINT with the OTLP endpoint URL from your Anosys Console pixel of type Agentic AI.

This works with any OTEL-compatible library, including:

Python: opentelemetry-sdk, opentelemetry-instrumentation-*
JavaScript/TypeScript: @opentelemetry/sdk-node
Go: go.opentelemetry.io/otel
Java: io.opentelemetry

For a full OTLP/HTTP setup example in Python, see the OpenTelemetry integration guide.

Option 2 — REST API¶

Wrap your LLM calls with a simple HTTP POST to the Anosys ingestion endpoint. This works from any language without any SDK dependency.

Python example — instrumenting a Google Gemini call:

import os
import time
import requests
import google.generativeai as genai

# Configure Gemini
genai.configure(api_key=os.environ["GEMINI_API_KEY"])
model = genai.GenerativeModel("gemini-2.5-flash")

# Anosys ingestion endpoint
ANOSYS_URL = "https://api.anosys.ai/ingestion/YOUR_UNIQUE_PATH"

# Make the LLM call and measure it
prompt = "Explain quantum computing in simple terms"
start_ts = time.time()

response = model.generate_content(prompt)

duration_ms = (time.time() - start_ts) * 1000

# Ship to Anosys
requests.post(ANOSYS_URL, json={
    "event_type": "llm_call",
    "s1": prompt,                        # prompt text
    "s2": response.text,                 # response text
    "s3": "gemini-2.5-flash",            # model name
    "s4": "google",                      # provider
    "n1": duration_ms,                   # latency
    "n2": response.usage_metadata.prompt_token_count,    # input tokens
    "n3": response.usage_metadata.candidates_token_count # output tokens
})

cURL example:

curl -X POST "https://api.anosys.ai/ingestion/YOUR_UNIQUE_PATH" \
  -H "Content-Type: application/json" \
  -d '{
    "event_type": "llm_call",
    "s1": "Explain quantum computing",
    "s2": "Quantum computing uses qubits...",
    "s3": "llama-4-maverick",
    "s4": "meta",
    "n1": 342.5,
    "n2": 12,
    "n3": 156
  }'

Option 3 — Python Decorator¶

For the lightest instrumentation, use the Anosys decorator to auto-capture any function that wraps an LLM call:

from anosys_logger import anosys_logger

@anosys_logger(source="mistral-chat")
def ask_mistral(prompt):
    # Your Mistral API call here
    response = mistral_client.chat.complete(
        model="mistral-large-latest",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

answer = ask_mistral("What are the benefits of RAG?")

The decorator automatically captures the function name, arguments, return value, and execution time.

What to Capture¶

Regardless of the integration method, we recommend sending these fields for maximum observability:

Field	Type	Description
`event_type`	String	Event category (e.g. `llm_call`, `agent_step`, `embedding`)
`s1` / prompt	String	The user's input prompt
`s2` / response	String	The model's response text
`s3` / model	String	Model name and version
`s4` / provider	String	Vendor name (e.g. `google`, `meta`, `mistral`)
`n1` / duration	Number	End-to-end latency in milliseconds
`n2` / input_tokens	Number	Input token count
`n3` / output_tokens	Number	Output token count
`n4` / cost	Number	Estimated cost per request (optional)

Agentic Framework Support¶

Many popular agentic frameworks already emit OpenTelemetry traces. Configure their OTLP exporter to point at Anosys and you get observability for free:

Framework	Language	OTEL Support
LangChain / LangGraph	Python, JS	Built-in via callbacks
LlamaIndex	Python	Built-in instrumentation
CrewAI	Python	OTEL-compatible
AutoGen	Python	OTEL-compatible
Haystack	Python	OTEL-compatible
Semantic Kernel	C#, Python	OTEL-compatible
Vercel AI SDK	TypeScript	OTEL-compatible
Mastra	TypeScript	OTEL-compatible

For frameworks without built-in OTEL support, use the REST API or Python decorator approach.

What You'll See in Anosys¶

Once data is flowing from any LLM provider, the Anosys Platform surfaces:

Request traces — every LLM call with prompt, response, model, and timing metadata
Cross-model comparison — side-by-side latency, cost, and quality metrics across providers and models
Token usage trends — track consumption over time by model, provider, or project
Latency analysis — identify slow calls, p50/p95/p99 breakdowns, and time-to-first-token
Error tracking — rate limits, timeouts, and API errors with automatic classification
Cost dashboards — per-request and aggregate cost estimates based on token usage and model pricing
Anomaly detection — ML-powered baselines that alert on latency spikes, cost overruns, or quality degradation without manual threshold configuration
Root cause analysis — causal graphs that connect failures to upstream triggers across providers, models, and agent steps
Alerts — context-aware notifications via Slack, email, PagerDuty, or webhooks for errors, cost overruns, and performance regressions
Automated metric generation — Anosys automatically generates key metrics from your traces and logs so you get dashboards in minutes, not days
Custom dashboards — build your own views or start with auto-generated dashboards for model health, provider comparison, and cost attribution
Custom pipelines — enrich, route, and transform your LLM telemetry with automated remediation workflows
Labeling — tag and annotate calls by provider, model, project, or team for segmentation and drill-down analysis
Natural language interface — ask questions about your LLM data in plain English and get answers backed by your telemetry

Next Steps¶

OpenAI Agents — native Python SDK integration for OpenAI
OpenAI ChatKit Apps — observability for ChatKit-powered chat widgets
Anthropic Agents — zero-code setup for Claude Code
OpenTelemetry Integration — universal OTEL guide for any system that speaks OpenTelemetry
Data Ingestion Options — all integration methods with code examples
FAQ — frequently asked questions