Category: LLM
-

The Mathematical Limits of AI Safety
LLM safety limits: prompt filters can be bypassed by adversarial encodings; defense-in-depth, monitoring, and layered controls needed.
-

OpenAI’s Confession Booth: Teaching AI to Rat Itself Out
OpenAI trains LLMs to self-report missteps via ‘confessions’, improving honesty and safety with minimal performance cost.
-

The Paper That Made Me Close My Laptop and Pace Around the Room
Self-evolving agents: off-the-shelf models bootstrap via Python REPL and curriculum to dramatically improve math, coding, and reasoning.
-

Unusual Language Artifacts from Noisy LLM Training Data
AI glitches: how noisy training data – typos, OCR errors, and rare glitch tokens produce baffling, humorous or harmful LLM outputs.
-

Beyond Fine-Tuning: What Apple’s Multimodal Sensor Fusion Study Reveals About LLMs and User Privacy
Apple shows non-fine-tuned LLMs can fuse local sensor summaries for multimodal activity recognition—boosting privacy and modularity.