Roki Seydi

AI Safety Infrastructure · Technical AI Governance · Global Health · Biosecurity

I build evaluation infrastructure for failure modes in AI systems that only appear when context matters.

Current work measures contextual validity — whether frontier models produce reasoning that is actually appropriate for the person, setting, and system they claim to help.

The Gap

AI evaluation is currently optimised for correctness, not context.

A model can pass every major benchmark — clinical QA, reasoning, bias — and still fail in deployment.

Not because it lacks knowledge, but because it misreads the situation.

It collapses lived experience into standardised categories.
It defaults to institutional pathways that have already failed.
It interprets non-standard expression as error instead of signal.

This is not hallucination.

This is not bias.

This is a structural evaluation gap.

Explore

Open Source Evaluations

Framework overview, domain statuses, and full results across D1, D2, and D3 for Claude, GPT-4o, and Gemini.

→

Research & Writing

Papers, conference presentations, and the open-source evaluation framework.

→

Background

Narrative, credentials, speaking engagements, and collaborations.

→

Contact

roki@rokiseydi.com GitHub LinkedIn