Opportunity Description
AI Benchmark Engineer (Data Analysis) - 75045
Role Overview: Design and develop high‑quality multi‑agent benchmark tasks that evaluate analytical reasoning, coordination, and execution capabilities of advanced AI systems.
Build realistic benchmark tasks requiring AI agents to analyze large, messy, multi‑source datasets; decompose work across specialist sub‑agents; and reach specific, verifiable conclusions.
Day‑to‑day Responsibilities:
- Design and author multi‑agent benchmark tasks centered on complex data analysis workflows.
- Create realistic synthetic or curated real‑world style datasets across domains such as finance, operations, security, and market analysis.
- Build tasks that require cross‑referencing, anomaly detection, contradiction detection, and statistical computation across multiple sources.
- Develop decomposition guides that split analytical work across specialist sub‑agents.
- Write prec...
Interested in this opportunity? Apply now through Expertini.
Apply for this Position