We're looking for a Freelance Agent Evaluation Engineer to create challenging tasks and evaluation criteria for AI coding agents. The role involves building virtual companies, assembling and calibrating tasks, and designing tests to evaluate AI models.
Requirements
- Degree in Computer Science, Software Engineering, or related fields
- 5+ years in software development, primarily Python (FastAPI, pytest, async/await, subprocess, file operations)
- Background in full-stack development, with experience building React-based interfaces (JavaScript/TypeScript) and robust back-end systems
- Experience writing tests (functional, integration — not just running them)
- Docker containers, and familiarity with infrastructure tools (Postgres, Kafka, Redis)
- CI/CD understanding (GitHub Actions as a user: triggers, labels, reading results)
- English proficiency - B2
Benefits
- Flexibility to choose when and how to work
- Opportunity to work on challenging projects with AI models
- Competitive hourly rate of up to $30
Originally posted on Himalayas