Tag: Confession
-

OpenAI’s Confession Booth: Teaching AI to Rat Itself Out
OpenAI trains LLMs to self-report missteps via ‘confessions’, improving honesty and safety with minimal performance cost.
gekko

OpenAI trains LLMs to self-report missteps via ‘confessions’, improving honesty and safety with minimal performance cost.