PATENT

Patent Abstract

Regression-Firewalled Alignment Training: Micro-Canaries, Safe-Merge, Reward-Hacking Controls, and Drift Barriers

Patent application 202641024338 in the Manish KL patent portfolio, covering alignment training safety and related technical systems.

202641024338 2026-03-01 Alignment Training Safety

Summary

Overview

A regression firewall for alignment post-training evaluates candidate updates against capability canaries and can accept, constrain, partially merge, or reject updates based on regression risk.

Abstract

Technical Abstract

Micro-canaries are dynamically refreshed under token and runtime budgets, hard and soft capability thresholds are enforced, and Safe-Merge logic partitions update deltas into mergeable components such as layer groups or low-rank bases. Reward-hacking detection and drift-triggered certification provide additional control signals.

Search Context

SEO Keywords

alignment training patent, micro canary patent, safe merge patent, reward hacking control patent, model regression patent

Related Patents

More Patents in AI agents, alignment, and enterprise orchestration

These filings sit nearby in the portfolio and strengthen internal linking across related patent topics.