IRQ-to-userspace latency, EEVDF wakeups, cgroup-aware latency budgets, softirq interference, and why controllable microseconds become seconds in tool-using AI control loops.
perf/bpftrace data before making empirical claims.Agentic AI workloads are not simply long-running compute jobs. They are latency-amplified control loops that repeatedly wait on network responses, local tools, file metadata, vector index pages, subprocesses, and small I/O completions. Each tool response traverses an IRQ-to-userspace path: device completion, MSI-X delivery, Local APIC dispatch, IDT entry, Linux IRQ handling, softirq or threaded IRQ execution, wakeup, CPU selection, and finally userspace resumption. This note argues that the highest-leverage kernel change is not blindly raising agent processes to real-time priority. It is a budgeted agent latency class that coordinates scheduler wakeup placement, IRQ shielding, cpuidle guardrails, small-I/O attribution, and cgroup-enforced fairness. The goal is not to eliminate external tool latency; it is to compress the controllable local tail that occurs after a tool reply reaches the machine.
Agentic systems expose a pathological kernel profile: they are bursty enough to trigger fairness, migration, idle-state, and interrupt effects, but frequent enough that every wakeup penalty compounds.
A classic server workload can tolerate isolated 50–200 µs delays if throughput remains high. An agent loop cannot: it serializes decisions. A planner waits for a tool, parses a result, schedules another tool, touches another index page, writes another trace, and repeats. The critical resource is not average throughput. It is p99/p999 step completion latency.
Real agentic loops often wait on external systems: a search API, a database, a browser worker, a local vector store, or a code-execution sandbox. The total step time is therefore not purely a kernel quantity.
The kernel cannot make a remote service respond faster. But it can reduce the controllable local tail: the time between “a response has arrived” and “the agent control loop is running again.” That is precisely where IRQ delivery, softirq processing, scheduler placement, page faults, and syscall wakeups enter the critical path.
When a tool reply arrives over a NIC, or a small read completes from NVMe, the agent does not immediately run. The completion walks a layered control path.
Device completion (NIC / NVMe)
↓ PCIe MSI-X delivery ~100–500 ns
Local APIC accepts vector ~100–300 ns
↓
IDTR + IDT[vector] lookup ~tens of ns
↓
x86 interrupt entry + state save ~200 ns–1 µs
↓
Linux IRQ dispatch / irq_desc ~1–5 µs
↓
Driver top-half ~1–10 µs
↓
softirq / NAPI / threaded IRQ ~5–50+ µs
↓
sk_buff/block completion/accounting ~2–30 µs
↓
wake_up_process() / try_to_wake_up() ~1–20 µs
↓
select_task_rq_fair() + enqueue_task_fair() ~5–100+ µs
↓
context switch + userspace resumes ~2–20 µs
The ranges above are order-of-magnitude annotations for reasoning. Actual values depend on hardware, interrupt moderation, PREEMPT_RT, CPU frequency, cache state, NUMA placement, and load.
Intel and AMD x86-64 systems share the essential interrupt model: the CPU receives a vector, indexes the Interrupt Descriptor Table, saves interrupted state, switches stacks if needed, and enters a kernel stub.
IDTR → IDT base + limit
Vector → index into IDT
IDT[n] → gate descriptor / handler
RIP → interrupted instruction pointer
CS → code segment selector
RFLAGS → IF, priority, flags
RSP/SS → user stack state
TSS → RSP0 / IST kernel stackIRR → interrupt request register
ISR → in-service register
TPR → task priority threshold
EOI → end-of-interrupt signal
LVT → local vector table
MSI/MSI-X → PCIe message interruptThe scheduler path that matters starts when a completion wakes an agent task. The first useful patch target is not the final EEVDF picker; it is CPU selection and wakeup placement.
IRQ / socket / io_uring completion
↓
try_to_wake_up() kernel/sched/core.c
↓
select_task_rq_fair() kernel/sched/fair.c
↓
enqueue_task_fair() kernel/sched/fair.c
↓
check_preempt_wakeup_fair() kernel/sched/fair.c
↓
pick_next_task_fair()
↓
pick_eevdf()
| Patch point | Why it matters | Agent-aware behavior |
|---|---|---|
select_task_rq_fair() | Chooses the CPU for the woken task. | Prefer previous/warm CPU; avoid IRQ-heavy CPUs and deep-idle cores. |
enqueue_task_fair() | Places the runnable entity into the CFS/EEVDF tree. | Use latency-sensitive placement without replacing EEVDF. |
check_preempt_wakeup_fair() | Decides whether the wakee should preempt current task. | Allow short agent bursts to preempt throughput-heavy tasks within budget. |
newidle_balance() | Moves tasks when CPUs go idle. | Avoid migrating hot agent control threads across NUMA/cache domains. |
chrt and renice Are Useful but InsufficientUsing chrt -f -p 99 can prove that wakeup latency matters, but it is not a production architecture.
# Good experimental ladder
renice -n -10 -p $PID
sudo chrt -r -p 20 $TID
sudo taskset -cp 8 $TID
# Then move NIC/NVMe IRQs away from CPU 8.
/* include/uapi/linux/sched/types.h */
#define SCHED_FLAG_AGENT_LATENCY 0x10000000ULL
/* include/linux/sched.h */
static inline bool task_agent_latency(struct task_struct *p)
{
return p->sched_flags & SCHED_FLAG_AGENT_LATENCY;
}
The proposed agent latency class should not replace existing Linux controls. It should sit between ordinary fair scheduling and hard real-time policy. SCHED_DEADLINE is appropriate when a task has a known runtime and period. Agent loops are different: they are bursty, event-driven, and often blocked on tools. Likewise, cgroup v2 controls such as cpu.weight and cpu.max manage fairness and quota, but they do not express “wake this task quickly when its tool reply arrives.”
| Mechanism | What it controls | Why it is not enough alone | How agent latency class differs |
|---|---|---|---|
nice/renice | Fair-scheduler weight | Improves share, not deterministic wakeup or IRQ avoidance. | Targets wakeup placement and local tail latency. |
SCHED_FIFO/RR | Strict RT priority | Can starve the system and still ignores IRQ/page-fault paths. | Budgeted, bounded, and fallback-safe. |
SCHED_DEADLINE | Runtime/deadline/period | Requires known periodic structure; agents are irregular and event-driven. | Uses short-burst hints tied to wakeups and completions. |
cpu.weight | Proportional CPU share | Does not tell the kernel which wakeups are latency-amplified. | Adds semantic latency intent. |
cpu.max | Quota / hard cap | Can limit abuse but does not prioritize the critical wakeup path. | Combines quota with low-latency privilege. |
A deployable design should expose the latency privilege through cgroup v2 so operators can budget it per tenant or per agent pool:
# conceptual cgroup v2 interface
/sys/fs/cgroup/agents/agent.latency.enable = 1
/sys/fs/cgroup/agents/agent.latency.max_us = 200
/sys/fs/cgroup/agents/agent.latency.burst_us = 5000
/sys/fs/cgroup/agents/agent.latency.refill_us = 1000
/sys/fs/cgroup/agents/agent.latency.irq_shield = 1
/* kernel/sched/fair.c: conceptual sketch */
static int select_task_rq_fair(struct task_struct *p, int prev_cpu,
int wake_flags)
{
if (task_agent_latency(p)) {
int cpu = agent_select_warm_quiet_cpu(p, prev_cpu);
if (cpu >= 0)
return cpu;
}
return select_task_rq_fair_default(p, prev_cpu, wake_flags);
}
static int agent_select_warm_quiet_cpu(struct task_struct *p, int prev_cpu)
{
if (cpu_online(prev_cpu) &&
!cpu_irq_hot(prev_cpu) &&
!cpu_deep_idle(prev_cpu) &&
task_fits_cpu(p, prev_cpu))
return prev_cpu;
return find_low_irq_idle_cpu(task_numa_node(p));
}
struct agent_latency_budget {
u64 tokens_ns;
u64 max_tokens_ns;
u64 refill_rate_ns;
u64 last_refill_ns;
};
static bool agent_budget_allow(struct task_struct *p, u64 cost_ns)
{
refill_agent_budget(p);
if (p->agent_budget.tokens_ns < cost_ns)
return false;
p->agent_budget.tokens_ns -= cost_ns;
return true;
}
/* conceptual per-cpu counter updated by IRQ entry/exit */
DEFINE_PER_CPU(u64, irq_runtime_window_ns);
bool cpu_irq_hot(int cpu)
{
return per_cpu(irq_runtime_window_ns, cpu) > sysctl_agent_irq_hot_ns;
}
#define MADV_AGENT_HOTSET 90 /* protect reused index/context pages */
#define MADV_AGENT_SCAN 91 /* repo/document scan: avoid cache pollution */
madvise(index_addr, index_len, MADV_AGENT_HOTSET);
madvise(scan_addr, scan_len, MADV_AGENT_SCAN);
The first serious experiment should measure, not patch. The goal is to attribute each slow step to IRQ time, softirq time, scheduler wakeup, page faults, or block I/O.
bpftrace -e '
tracepoint:sched:sched_wakeup {
@wakeup[args->pid] = nsecs;
}
tracepoint:sched:sched_switch /@wakeup[args->next_pid]/ {
@lat_us = hist((nsecs - @wakeup[args->next_pid]) / 1000);
delete(@wakeup[args->next_pid]);
}'
For many agent runtimes, the important userspace gates are epoll_wait() and io_uring_enter(). These are where the agent blocks waiting for a tool reply, socket completion, or file I/O completion. Measuring the time spent inside these syscalls ties the kernel path directly to agent-visible latency.
bpftrace -e '
tracepoint:syscalls:sys_enter_epoll_wait,
tracepoint:syscalls:sys_enter_epoll_pwait,
tracepoint:syscalls:sys_enter_epoll_pwait2
/comm == "agent"/
{
@epoll_start[tid] = nsecs;
}
tracepoint:syscalls:sys_exit_epoll_wait,
tracepoint:syscalls:sys_exit_epoll_pwait,
tracepoint:syscalls:sys_exit_epoll_pwait2
/@epoll_start[tid]/
{
@epoll_wait_us = hist((nsecs - @epoll_start[tid]) / 1000);
delete(@epoll_start[tid]);
}'
bpftrace -e '
tracepoint:syscalls:sys_enter_io_uring_enter
/comm == "agent"/
{
@uring_start[tid] = nsecs;
}
tracepoint:syscalls:sys_exit_io_uring_enter
/@uring_start[tid]/
{
@uring_enter_us = hist((nsecs - @uring_start[tid]) / 1000);
delete(@uring_start[tid]);
}'
Useful derived metric:
bpftrace -e '
tracepoint:irq:irq_handler_entry { @irq_start[args->irq] = nsecs; }
tracepoint:irq:irq_handler_exit /@irq_start[args->irq]/ {
@irq_us[args->name] = hist((nsecs - @irq_start[args->irq]) / 1000);
delete(@irq_start[args->irq]);
}'
bpftrace -e '
tracepoint:irq:softirq_entry { @soft[args->vec] = nsecs; }
tracepoint:irq:softirq_exit /@soft[args->vec]/ {
@softirq_us[args->vec] = hist((nsecs - @soft[args->vec]) / 1000);
delete(@soft[args->vec]);
}'
bpftrace -e '
tracepoint:exceptions:page_fault_user {
@faults[comm] = count();
}
tracepoint:sched:sched_switch /comm == "agent"/ {
@switches = count();
}'
bpftrace -e '
tracepoint:block:block_rq_issue { @rq[args->sector] = nsecs; }
tracepoint:block:block_rq_complete /@rq[args->sector]/ {
@bio_us = hist((nsecs - @rq[args->sector]) / 1000);
delete(@rq[args->sector]);
}'
The following charts show representative target deltas for an experimental ladder. Replace these with measured data from your box.
A clean evaluation separates remote tool service time from local kernel overhead. For a remote API, record the server-side response timestamp or gateway timestamp. For local tools, record the completion timestamp at the server process. Then compare it with the client agent’s epoll/io_uring return timestamp. The difference is the controllable local tail.
| Experiment | Command/Mechanism | Measure | Expected signal |
|---|---|---|---|
| Baseline | Default scheduler, irqbalance on | p99 step latency, wakeup histogram | High tail variance |
| Renice | renice -10 | Step latency | Small improvement |
| RT moderate | chrt -r 20 | Wakeup histogram | Wakes improve, but IRQ tails remain |
| CPU pinning | taskset | Cache misses, migrations | Lower migration/cold-cache cost |
| IRQ shielding | Move NIC/NVMe IRQs away | softirq time on agent CPU | Large p99 improvement |
| Agent-aware kernel | Patch sketches above | p99/p999, fairness, thermals | Low latency without RT starvation |
The kernel subsystem modification most likely to make agentic performance “rocket” is a coordinated latency path: agent-aware scheduler wakeup placement, IRQ shielding, and budgeted latency privilege. Real-time priority is a useful proof-of-problem, but not the final mechanism.
The deeper insight is that agentic systems need an attributable kernel. Every agent step should be traceable across IRQs, softirqs, scheduler wakeups, page faults, and block I/O. Once the kernel can attribute delay to a step, it can optimize the correct path instead of treating the workload as ordinary batch compute.