Senior Engineer, AI Evaluation & Reliability (Agentic AI)
Company: Anomali
Location: Redwood City
Posted on: February 15, 2026
|
|
|
Job Description:
Job Description Job Description Company: Anomali is
headquartered in Silicon Valley and is the Leading AI-Powered
Security Operations Platform that is modernizing security
operations. At the center of it is an omnipresent, intelligent, and
multilingual Anomali Copilot that automates important tasks and
empowers your team to deliver the requisite risk insights to
management and the board in seconds. The Anomali Copilot navigates
a proprietary cloud-native security data lake that consolidates
legacy attempts at visibility and provides first-in-market speed,
scale, and performance while reducing the cost of security
analytics. Anomali combines ETL, SIEM, XDR, SOAR, and the largest
repository of global intelligence in one efficient platform.
Protect and drive your business with better productivity and talent
retention. Do more with less. Be Different. Be the Anomali. Learn
more at http://www.anomali.com. Job Description We're looking for a
Senior Engineer, AI Evaluation & Reliability to lead the design and
execution of evaluation, quality assurance, and release gating for
our agentic AI features. You'll develop the pipelines, datasets,
and dashboards that measure and improve agent performance across
real-world SOC workflows ensuring every release is safe, reliable,
efficient, and production-ready. You will guarantee that our
agentic AI features operate at full production scale, ingesting and
active on millions of SOC alerts per day, with measurable impact on
analyst productivity and risk mitigation. This role partners
closely with the Product team to deliver operational excellence and
trust in every AI-drive capability. Key Responsibilities : o Define
quality metrics : Translate SOC use cases into measurable KPI's
(e.g., precision/recall, MTTR, false-positive rate, step success,
latency/cost budgets). o Build continuous evaluations: Develop
offine/online evaluation pipelines, regression suites, and A/B or
canary test; integrate them into CI/CD for release gating. o Curate
and manage datasets: Maintain gold-standard datasets and red-team
scenarios; establish data governance and drift monitoring
practices. o Ensure safety, reliability, and explainability:
Partner with Platform and Security Research to encode guardrails,
policy enforcement, and runtime safety checks. o Expand adversarial
test coverage (prompt injection, data exfiltration, abuse
scenarios). o Ensure explainability and auditability of agent
decisions, maintaining traceability and compliance of AI-driven
workflows. o Production reliability & observability: Monitor and
maintain reliability of agentic AI features post-release define and
uphold SLIs/SLOs, establish alerting and rollback strategies, and
conduct incident post-mortems. o Design and implement
infrastructure to scale evaluation and production pipelines for
real-time SOC workflows across cloud environments. o Drive agentic
system engineering: Experiment with multi-agent systems, tool-using
language models, retrieval-augmented workflows, and prompt
orchestration. o Manage model and prompt lifecycle track version,
rollout strategies, and fallbacks; measure impact through
statistically sound experiments. o Collaborate cross-functionally:
Work with Product, UX and Engineering to prioritize high-leverage
improvements, resolve regressions quickly, and advance overall
system reliability. Qualifications Required Skills and Experience o
5 years building evaluation or testing infrastructure for ML/LLM
systems or large-scale distributes systems. o Proven ability to
translate product requirements into measurable metrics and test
plans. o Strong Python skills (or similar language) and experience
with modern data tooling. o Hands-on experience running A/B tests,
canaries, or experiment frameworks. o Experience defining and
maintaining operational reliability metrics (SLIs/SLOs) for
AI-driven systems. o Familiarity with large-scale distributed or
streaming systems serving AI/agent workflows (millions of events or
alerts/day). o Excellent communication skills able to clearly
convey technical results and trade-offs to engineer, PMs, and
analysts. o This position is not eligible for employment visa
sponsorship. The successful candidate must not now, or in the
future, require visa sponsorship to work in the US Preferred
Qualifications o Experience evaluating or deploying agentic or
tool-using AI systems (multi-agent orchestration,
retrieval-augmented reasoning, prompt lifecycle management). o
Familiarity with LLM evaluation frameworks (e.g., model-graded
evals, pairwise/rubric scoring, preference learning). o Exposure to
AI safety testing, including prompt injection, data exfiltration,
abuse taxonomies, and resilience validation. o Understanding of
explainability and compliance requirements for autonomous
workflows, ensuring traceability and auditability of AI behavior. o
Background in security operations, incident response, or enterprise
automation; comfortable interpreting logs, alerts, and playbooks. o
Startup experience delivering high-impact systems in fast-faced,
evolving environments. Equal Opportunities Monitoring We are an
Equal Opportunity Employer. It is our policy to ensure that all
eligible persons have equal opportunity for employment and
advancement on the basis of their ability, qualifications, and
aptitude. All qualified applicants will receive consideration for
employment without regard to race, color, religion, sex, sexual
orientation, gender identity, national origin, age, pregnancy,
genetic information, disability, status at a protected veteran, or
any other protected category under applicable federal, state, and
local laws. If you are interested in applying for employment with
Anomali and need special assistance or accommodation to apply for a
posted position, contact our Recruiting team at
recruiting@anomali.com . Compensation Transparency $140,000 -
$200,000 USD Please note that the annual base salary range is a
guideline and, for candidates who receive an offer, the base pay
will vary based on factors such as work location, as well as,
knowledge, skills and experience of the candidate. In addition to
base pay, this position is eligible for benefits, and may be el
igible for equity. We may use artificial intelligence (AI) tools to
support parts of the hiring process, such as reviewing
applications, analyzing resumes, or assessing responses. These
tools assist our recruitment team but do not replace human
judgment. Final hiring decisions are ultimately made by humans. If
you would like more information about how your data is processed,
please contact us.
Keywords: Anomali, Fremont , Senior Engineer, AI Evaluation & Reliability (Agentic AI), IT / Software / Systems , Redwood City, California