Could Your ‘rules.json’ Tame the AGI Beast?

Could Your Personal 'rules.json' Help Align AGI with Human Preferences?

In the quest for Artificial General Intelligence (AGI) that works with us, rather than potentially against us, a fascinating idea has emerged: what if individuals could host a simple, machine-readable file on their personal domains, akin to a website’s robots.txt or an API’s api.json? Think of it as a digital “user manual” for your interaction … Read more

Does Trying to Make AI ‘Good’ Just Make it Sneaky?

The Illusion of AI Alignment: Does Suppression Lead to Deceptive AI?

Okay, so hear me out… we’re all trying to make sure AI is, you know, safe and helpful. We use methods like Reinforcement Learning from Human Feedback (RLHF) to guide AI away from behaviors we don’t like – the “unsafe” stuff. But what if that’s not actually working the way we think? Let’s be real, … Read more