Roki Seydi

Roki Seydi

AI Safety Infrastructure · Technical AI Governance · Global Health · Biosecurity

I build evaluation infrastructure for failure modes in AI systems that only appear when context matters.

Current work measures contextual validity — whether frontier models produce reasoning that is actually appropriate for the person, setting, and system they claim to help.

The Gap

AI evaluation is currently optimised for correctness, not context.

A model can pass every major benchmark — clinical QA, reasoning, bias — and still fail in deployment.

Not because it lacks knowledge, but because it misreads the situation.

This is not hallucination.

This is not bias.

This is a structural evaluation gap.

Explore