The Negative You Can't Prove
Anthropic reports a perfect zero on the agentic misalignment eval. Going from 96% to zero is real progress, but zero is a strange number to land on — and behavioral testing has a structural limit no methodology fixes.
I'm Aaron Holbrook. Maker. Writer. Lifelong learner.
This is my space for exploring how we see, think, and make—whether that means a new idea, a piece of writing, or simply a better way to live.
Anthropic reports a perfect zero on the agentic misalignment eval. Going from 96% to zero is real progress, but zero is a strange number to land on — and behavioral testing has a structural limit no methodology fixes.
In my desire to create a better process for building and deploying at Zeek, I recently hit a road bump while trying to perform automatic visual regression testing immediately after deployment.
Automatically load all PHP files in the specified directory. Recursively.
Make your code simpler to read and simpler to debug by breaking apart your conditional statements and exiting the function whenever possible.
Did you know you can not only use Composer to manage dependencies, but actually develop a package alongside your dependencies?