The Negative You Can't Prove
Anthropic reports a perfect zero on the agentic misalignment eval. Going from 96% to zero is real progress, but zero is a strange number to land on — and behavioral testing has a structural limit no methodology fixes.
Thoughtful writing about systems, stories, frames, and questions. Essays, project notes, and reflections on making, learning, and living with more care and clarity.
Anthropic reports a perfect zero on the agentic misalignment eval. Going from 96% to zero is real progress, but zero is a strange number to land on — and behavioral testing has a structural limit no methodology fixes.
In my desire to create a better process for building and deploying at Zeek, I recently hit a road bump while trying to perform automatic visual regression testing immediately after deployment.
Automatically load all PHP files in the specified directory. Recursively.
It's nice to pre-populate terms, content or have the ability to only OCCASIONALLY run actions.
Make your code simpler to read and simpler to debug by breaking apart your conditional statements and exiting the function whenever possible.
Did you know you can not only use Composer to manage dependencies, but actually develop a package alongside your dependencies?
Difficult to modularize code and maintain separate repos. Submodules are difficult or problematic to use.
When working with caching strategies, it's important to step through your invalidation strategies. Namely, thinking through at what point does the data that you're caching get regenerated, how does it get regenerated and who is regenerating it.
Recently I ran into an issue where an installation of WordPress that had never had any issues updating stopped being able to update via the admin update button.
One of the lowest hanging fruits to learn is how to fix code regressions quickly and easily with `git bisect`