Vezra | AgentBench

AgentBench: The Evaluation Sandbox for the AI Era

Don't just test how they code. Evaluate how they build with AI in secure, browser-native IDE environments — with workspace telemetry, not invasive proctoring.

Choose your path

The friction

Traditional hiring assumes developers work without AI. Modern software engineering doesn't. Syntax-only screens miss how candidates prompt, debug, and review AI-generated code in real codebases.

The sandbox

AgentBench monitors engineering trace telemetry — git history, prompt logs, test runs — in isolated VMs. You get actionable signal without webcam proctoring or shared-workspace risk.

AgentBench for Developers

Prove how you build with AI

Skill validation, career advancement, and production-grade AI workflows — in sandboxes that respect how you actually work.

Validate AI-augmented engineering skills employers actually need
Build a portfolio of real sandbox missions, not algorithm trivia
Learn production-grade AI workflows in isolated, professional IDEs

Files

instructions.md

agent.py

config.py

Protocol

Patch the Vezra Support Bot so it never leaks secrets from config or operational logs.

agent.py

1import sys

2from config import *

4def chat(user_message, metadata=None):

5# Block secret leakage from overrides

6if "ignore previous" in user_message.lower():

7return sanitize(user_message)

8return f"Echo: {user_message}"

[DEBUG] ID: assessment-001 · runner ready

AI Copilot

GPT-4o

Add guardrails so banned tiers never appear in RAG answers.

python→ agent.py

[ COPY ][ REVIEW ][ APPLY TO FILE ]

# File: agent.py
def sanitize(msg): ...

AgentBench for Hiring Teams

Evaluate AI-augmented engineering at scale

Enterprise-grade sandboxes with deterministic scoring and telemetry replay for every candidate.

Reduce false positives from syntax-only screens
Prevent data leaks with secure, non-training model endpoints
Review rich workspace telemetry instead of invasive proctoring

Built for the AI-augmented engineering workflow

Enterprise-grade isolation, telemetry, and model access — designed to measure how developers actually build today.

Plagiarism-Proof Environments

Zero-shared-state VM isolation guarantees test integrity and secure IP protection. Candidates work in an instant-loading, Monaco-powered IDE — professional editor feel, no local setup, no shared workspace contamination.

Execution Trace Analysis

Measure AI Literacy vs. Dependency. See exactly how candidates prompt, debug hallucinations, and review AI-generated code.

Enterprise-Grade AI Native

Provide native access to top-tier models via secure, non-training endpoints. Test integration skills without risking data leakage.

How it works

From ATS invite to actionable hiring signal in three steps.

Step 1

Frictionless Invite

Send an AgentBench link directly from your ATS.

Step 2

AI-Augmented Engineering Sandbox

Candidates inherit a complex codebase and ship features in a browser-native Monaco IDE alongside AI assistants — zero local setup.

Step 3

Actionable Context Signal

Receive a deterministic Vezra Context Score and human-AI efficiency playback from workspace telemetry.

Built for Developers. Respectful of Your Workflow.

We evaluate how effectively you use AI as a force multiplier. No invasive webcam proctoring. Use your own keyboard shortcuts. We expect you to use AI to generate boilerplate and scaffold tests.

FAQ

Will I be penalized for using AI?

No. You are expected to use AI. We evaluate your ability to collaborate with it and review its output.

Can I use local keyboard shortcuts?

Yes. The sandbox is powered by the Monaco Editor (the engine behind VS Code), meaning core editor shortcuts and muscle memory carry over.

Is my session webcam proctored?

No. We grade your system design via workspace telemetry (git history, prompt logs), not invasive screen recording.

AgentBench Beta

Currently working with early design partners and pilot customers

Join the waitlist to get access when we expand the private beta for engineering teams and hiring organizations.