I believe in the power of good evals for good models.

My work's all about the intersection of model behaviour, user safety/redteaming and taste. I care about measurable, actionable ways for video, image, audio and coding models to perform even better in a world that refuses to be simple.

Bio

I built safety evals and policies from end to end for multimodal models at ByteDance's Seed Research Lab, and developed APAC's first AI app consumer standards as a regulator.

My AI safety work began in 2021 on trust and safety content moderation systems at TikTok, before consulting for Big Tech and early-stage startups on AI safety and policy at Plug and Play's incubator. I like to operate where research, product and GTM intersect.

Evals

Multimodal safety evals

My work is shaped by my journey from Big Tech lobbying and setting AI regulations, to working with early stage startups and shipping for global audiences in a frontier model lab.

01

Model Behaviour & Frontier Capabilities

Safety policies and evaluations for global model launches (e.g. Seedance), including a wide range of products such as chatbots, virtual assistants, and image/video generation apps. Focused on model personality, user harm, image/video quality, agentic capabilities and potential abuses of models.

02

AI app consumer standards

Regulatory work and redteaming to build APAC's first consumer standards for AI apps, focused on clear expectations for user protection, transparency, and keeping pace with new agentic/AI research developments.

03

Testing live reasoning

A daily benchmark for Claude models that evaluates answers to current-affairs questions with web search. It measures how well a model handles fast-moving facts, contested reporting, and limits of certainty.

View repository
04

Trust and safety systems

Early AI safety work on trust and safety content moderation systems at TikTok, followed by policy and safety consulting for large technology companies and early-stage AI startups.

SG->SF

From Singapore to San Francisco

Bridging 2 homes that teach new ways to think about technology and taste.

Bookshelf

Books I return to

Eliot Higgins

We Are Bellingcat

A book that maps onto my SOCMINT hobby: following traces, finding unlikely sources of knowledge, and learning new hacks or networks. It's the same curiosity I bring to evals: keep asking where the evidence lives, who can see or measure it, and searching for unknown unknowns.

Contact

Open for thoughtful collaborations, eval work, and good reading recommendations.