About AnoSys

AnoSys is an AI observability company building the monitoring and analytics platform for the agentic AI era. Our platform gives engineering teams end-to-end visibility into AI agents, LLM pipelines, and ML-powered applications — from user interaction through model inference to downstream business outcomes. We help organizations detect silent regressions, trace non-deterministic behavior, and turn massive telemetry streams into actionable insights, all in real time. Backed by leading investors and built by a team of infrastructure and AI veterans, AnoSys is defining the category of AI-native observability.

Who We're Looking For

We're a small, high-impact team, and every person here shapes the product, the culture, and the trajectory of the company. We look for intellectually curious individuals who combine critical thinking with meticulous attention to detail — people who can identify problems early, reason through ambiguity, and solve challenges independently.

If you thrive in fast-paced, high-ownership environments — where your work directly shapes a category-defining product — we'd love to hear from you.

About the Role

AI observability is a fundamentally new discipline. Unlike traditional monitoring — where expected behavior can be defined by static thresholds and deterministic rules — observing AI systems requires reasoning about stochastic outputs, emergent behaviors, multi-step agent trajectories, and latent quality regressions that only surface under specific input distributions. The research challenges here are deep and largely unsolved.

As a Research Scientist at AnoSys, you will work at the frontier of this problem space. You will design, prototype, and productionize novel algorithms for anomaly detection in non-deterministic systems, causal inference across multi-agent workflows, automated evaluation of LLM quality and safety, and intelligent root-cause analysis that operates across heterogeneous telemetry signals.

This is not a pure research role. You will be expected to take ideas from concept through experimentation to production deployment — building systems that run against real-world telemetry streams at scale. You will collaborate closely with backend engineers to integrate your models into the platform and with product designers to surface insights in ways that are immediately actionable for customers.

What You'll Do

Research and develop novel approaches to anomaly detection, distribution drift monitoring, and root-cause analysis specifically designed for non-deterministic AI systems and agentic workflows
Design and implement evaluation frameworks for LLM quality, safety, factual accuracy, and performance consistency across diverse production workloads and prompt distributions
Build causal inference and attribution models that connect upstream agent behavior (tool calls, reasoning chains, retrieval steps) to downstream business outcomes and user experience metrics
Develop statistical methods for detecting silent model degradation — performance regressions that do not trigger hard failures but subtly erode output quality over time
Analyze large-scale, heterogeneous telemetry datasets (traces, logs, metrics, embeddings) to uncover patterns, validate hypotheses, and inform both product direction and customer insights
Design and run rigorous experiments with proper statistical methodology — including A/B tests, offline evaluations, and backtesting frameworks for detection algorithms
Publish findings and contribute to the broader AI observability and ML monitoring research community through papers, blog posts, and open-source contributions

What We're Looking For

MS or PhD in Computer Science, Machine Learning, Statistics, Applied Mathematics, or a related quantitative field
Strong publication record or demonstrated research output in one or more of: anomaly detection, time-series analysis, causal inference, LLM evaluation, statistical testing, or Bayesian methods
Proficiency in Python and modern ML frameworks (PyTorch, JAX, or TensorFlow) with demonstrated experience taking models from research prototypes to production-grade systems
Deep familiarity with statistical methods, experiment design, hypothesis testing, and rigorous evaluation methodology — you know when a result is meaningful and when it isn't
Experience working with large-scale data processing frameworks (Spark, BigQuery, Pandas at scale) and comfortable operating in cloud-native environments
Intellectual curiosity paired with strong independent judgment — you can frame ambiguous problems, design research plans, and execute against them with minimal supervision
Excellent written communication skills — you can distill complex technical work into clear, accessible narratives for both technical and non-technical audiences

Nice to Have

Experience with observability, monitoring, or AIOps — particularly in production ML systems or LLM-powered applications
Familiarity with OpenTelemetry, distributed tracing, or telemetry data modeling
Background in information retrieval, NLP, or embedding-based search — particularly in the context of RAG systems or semantic evaluation
Experience with Bayesian optimization, active learning, or online learning methods

Apply for This Role

Prefer to attach your resume? Email us at hiring@anosys.ai