Writings

Technical essays, explained with enough depth to be useful.

Long-form writing across AI infrastructure, memory systems, local agent runtimes, and accelerator architecture. This section is where project ideas and patent-adjacent concepts get room to breathe as essays rather than just landing-page summaries.

Current essays

51 essays · updated April 2026
Vera Rubin Cooling AI Facilities

April 2026

Why Vera Rubin Changes Everything About Cooling: 45°C Supply Temperature, Fan-Free Trays, and the Infrastructure Behind Them

A systems essay on how Vera Rubin NVL72 changes AI data center cooling with 45°C supply temperatures, fan-free trays, hose-free design, and liquid-cooled busbars.

Blackwell Cooling Stack AI Infrastructure

April 2026

The Cooling Stack Is the New Critical Path: How Blackwell GB300 NVL72 Racks Manage 142 kW

A systems essay on the five-layer direct liquid cooling architecture of Blackwell GB300 NVL72 racks and the suppliers behind each thermal layer.

vLLM GPU Thermals Scheduler Internals

April 2026

vLLM Internals: Where to Actually Cut Batch Size When the GPU is Melting

A systems essay on where thermal control actually belongs inside vLLM, from scheduler decisions and swap behavior to tensor-parallel batch cutting.

HBM Thermals Observability GPU Infra

April 2026

Why HBM Thermal Throttling Is Silent: Reading the Tea Leaves in nvidia-smi

A technical guide to detecting silent HBM thermal throttling on H100 and H200 clusters when standard GPU temperature dashboards look deceptively healthy.

HBM Thermals KV Cache Inference Systems

April 2026

Thermal Debt Is a Memory Problem — How Hot Dies Throttle Your KV Prefetch

A systems essay on HBM thermal telemetry, KV fetch stalls, and why hot memory dies turn thermal debt into an inference scheduling problem.

Photonics Memory Fabrics MoE Serving

April 2026

CPO Isn’t About Power. It’s About Making Memory Disaggregation Schedulable

A systems essay on why co-packaged optics matters because it makes disaggregated memory and expert movement schedulable under tight latency budgets.

LLM Inference KV Cache HBM Thermals

April 2026

HBM Throttling-Safe KV Admission and ReuseNet for LLM Inference

A technical essay on thermal-safe KV admission, HBM backpressure, reuse prediction, and production-serving policy design for H100 and H200 inference clusters.

Photonics AI Networking Market Map

April 2026

The Photonics Stack: Who Builds What for AI Networking — Part 1

A technical primer mapping the AI optical networking stack across fiber, lasers, transceivers, DSPs, switches, and test infrastructure through the companies building each layer.

Transformer Inference KV Cache Memory Policy

April 2026

The Attention Sink Problem: Why Transformer Inference Wastes More Memory Than You Think

A systems essay on attention sink tokens, structural KV cache waste, and why long-context serving needs memory-policy-aware treatment of hot-but-low-utility tokens.

AI Clusters Thermals Reliability

April 2026

Thermal Debt in AI Clusters: The Silent Degradation Loop Nobody Is Measuring

A systems essay on how dense GPU racks accumulate thermal debt, why point-in-time observability misses it, and what thermally-aware control planes should measure.

Developer Tools Docs-to-Video Local-First

April 2026

TechDemoForge — A Local-First Engine for Turning Technical Docs into Demo Videos

A detailed product and architecture essay on TechDemoForge, its workflow, repo structure, and why local-first technical demo generation is useful.

Inference Systems Disaggregation KV Cache

April 2026

Prefill-Decode Disaggregation: Why the Next Big Inference Architecture Splits the Job in Two

A systems essay on why prefill and decode should be split across different hardware pools and why the real engineering challenge becomes memory orchestration.

Inference Systems Memory Policy Speculative Decoding

April 2026

Speculative Decoding Is a Memory Problem

A systems essay on draft/verify KV pressure, rollback fragmentation, and why speculation pays off only when memory policy is designed for it.

Long Context Sparse Serving Memory Systems

April 2026

The Next Frontier in Long-Context Inference: Memory-Orchestrated Sparse Serving

A systems essay on sparse attention serving, hierarchical KV residency, predictive prefetch, and why long-context wins increasingly come from memory policy.

Photonics Scale-Up AI Infrastructure

April 2026

Scale-Out Was Yesterday. Scale-Up Optics Is the Next Battle

A systems-first essay on why the next optical contest in AI infrastructure is shifting inward toward dense rack-scale and scale-up fabrics.

Photonics AI Networking Materials

April 2026

InP vs Silicon Photonics vs VCSEL: The Materials Stack Behind AI Networking

A deeper systems-first essay on why AI networking will be built from a layered materials stack rather than a single optical winner.

AI Infrastructure Interconnect Power Density

April 2026

The Real AI Bottleneck Is Moving From Compute to Interconnect Power Density

A deeper systems essay on why the next constraint is the power, heat, and topology cost of moving bits across clusters.

Photonics AI Clusters Systems Design

April 2026

Photonics Is No Longer a Component Story — It Is Becoming the Operating System of AI Clusters

A systems essay on how optics is moving into the scheduler, topology planner, reliability model, and power logic of next-generation AI infrastructure.

Optics AI Infrastructure Interconnect

April 2026

CPO, LPO, DSP, and VCSEL: What Actually Matters for AI Infrastructure

A practical hardware-focused companion essay on optical architectures, power budgets, serviceability, and the real tradeoffs hyperscalers optimize for.

AI Runtime Memory Scheduling Inference Systems

April 2026

The Memory Scheduler Is the New Critical Path in AI Inference

A systems essay on explicit data movement, KV cache management, tiered memory placement, DMA orchestration, and why scheduler quality now directly determines inference efficiency.

AI Hardware Memory Fabrics Systems Architecture

April 2026

Why Cache Coherency Is the Wrong Default for AI Machines

A long-form systems argument for selective coherency, explicit tensor movement, CXL.mem over universal coherence, and schedule-first design in large-model infrastructure.

HBM KV Cache AI Inference

April 2026

When VRAM Stops Being a Weight Warehouse

A systems primer on HBM as a bounded working set, KV cache dominance, PagedAttention, weight offload, and why scheduled weight streaming is the next step in inference architecture.

Virtual Memory Storage Systems Systems Architecture

April 2026

When 128 TB Stops Feeling Infinite: Address Space, mmap, and the Quiet Limits of Modern Systems

A systems essay on 48-bit virtual addressing, mmap-heavy designs, storage density, 5-level paging, and why explicit data orchestration becomes the long-term answer.

AI Compilers Memory Policy MCOS

April 2026

Memory Intent IR: Why AI Compilers Must Emit Memory Plans

A systems essay on compiler-emitted memory intent, object semantics, workload phases, reuse confidence, and why hardware orchestration needs structured plans instead of blind guesses.

MCOS Fabric Control AI Memory Systems

April 2026

MCOS-HFC: A Hardware Fabric Controller for Memory-Centric AI Systems

A technical essay on explicit memory intent, residency maps, regret-aware eviction, recomputation-vs-transfer arbitration, atomic doorbells, and DPU embodiments for AI memory fabrics.

MCOS AI Memory Fabrics Hardware Control

April 2026

MCOS Must Live in Hardware: From JBOD to Intelligent AI Memory Fabrics

A systems essay on why software-only memory orchestration hits a ceiling and why the real future is hardware-resident movement control near the fabric, the tiers, and the accelerators.

MCOS AI Systems Memory Control

April 2026

MCOS: A Memory-Centric Operating System for the Future of AI Systems

A technical essay on memory placement, movement, residency, reuse, admission, and eviction as first-class scheduling decisions rather than passive implementation details.

RDMA GPU Data Paths Storage Fabrics

April 2026

Why "Disk → RDMA → GPU" Is Still Fragmented Today

A technical essay on why local direct-storage acceleration and network direct-memory acceleration still do not add up to one universal GPU-native end-to-end storage fabric.

On-Chip Memory GPU Systems HBM→SRAM

April 2026

From SSD to GPU to SRAM: Why the Last Bottleneck Is Now On-Chip

A technical essay on why eliminating host-side bounce buffers shifts the real bottleneck inward, toward deterministic HBM↔SRAM orchestration inside the accelerator.

AI Systems Data Movement Zero-Copy

April 2026

Bounce Buffers: The Hidden Tax on Modern AI Systems

A technical essay on hidden staging buffers, GPUDirect-era dataflow, and why eliminating unnecessary copies matters across storage, network, memory, and accelerator paths.

RDMA AI Infrastructure Disaggregated Inference

April 2026

RDMA in the Age of AI: Zero-Copy, KV Cache Transfer, and the New Glue Layer

A detailed technical essay on RDMA, zero-copy realities, and why "RDMA exists" still does not mean true end-to-end zero-copy in disaggregated inference.

Disaggregated Inference KV Cache Transfer AI Infrastructure

April 2026

Seam Orchestrator: Workload-Aware KV Routing in Disaggregated Inference

A technical essay on policy above transport for KV movement, workload-aware admissibility, swappable glue layers, Scenario E and F, and experiment-backed results.

AI Infrastructure Reliability Control Planes

April 2026

AI Cluster Reliability Beyond Fault-Tolerant Parallelism

A technical essay on gray failures, checkpoint economics, cooling-compute seams, and seam-aware control planes for modern AI cluster reliability.

AI Infrastructure Reliability Cluster Operations

April 2026

The Next AI Cluster Failure Won’t Look Like a GPU Failure

A technical essay on seam failures across facilities, fabrics, heterogeneous inference pools, checkpoint economics, and why component dashboards miss the most expensive cluster incidents.

Edge AI Compiler Runtime Power Control

April 2026

Adaptive Compiler–Runtime Power Contract for Energy-Optimal Edge Inference

A technical explainer on authenticated power contracts, alternative execution plans, safe switching boundaries, and runtime enforcement for edge inference systems.

Inference Architecture DMA Scheduling On-Chip Buffers

April 2026

Deterministic Memory-Orchestrated Inference Using DMA and Bounded On-Chip Buffers

A technical deep dive into compiler-scheduled DMA, explicit fences, bounded SRAM, and why deterministic buffer legality changes the inference memory system.

Edge Inference Energy Scheduling ARM Systems

April 2026

SLA-Constrained Energy-Aware Inference Scheduling on ARM Edge Systems

A technical essay on latency-aware policy selection across model variants, DMA strategy, memory residency, and accelerator performance-state control on edge systems.

AI Infrastructure Weight Residency Runtime Control

April 2026

Predictive Weight Orchestration: Runtime Control for Multi-Tier Weight Residency

A technical essay on HBM pressure, predictive multi-tier placement, precision state transitions, MoE router-history signals, and bandwidth-aware runtime scheduling.

Agent Systems Control Plane Developer Tooling

April 2026

Mobile Agent Control: A Vendor-Neutral Control Plane for Terminal-Native Coding Agents

A systems essay on Android-first operations, FastAPI supervision, runtime adapters, websocket telemetry, and safe control of local coding agents across machines.

Developer Tooling Web Performance Telemetry

March 2026

Introducing ChromeLens: Systems-Grade Web Performance Telemetry

An introduction to ChromeLens, deterministic CDP tracing, interactive flow profiling, and the hydration penalty behind complex modern web applications.

Cache Hierarchy AI Infrastructure Memory Systems

April 2026

What Bigger L2 Actually Buys You

A detailed technical essay on larger L2 caches in AI systems, miss-rate reduction, average access time, bandwidth relief, energy tradeoffs, and where bigger L2 stops being enough.

AI Infrastructure Memory Hierarchy Chip Architecture

April 2026

Why AI Needs a New Memory Hierarchy, Not Just Bigger Caches

A detailed technical essay on AI-native residency fabrics, class-aware memory, weights, KV cache, experts, and why bigger generic caches are not the full answer.

Computer Architecture LLM Decode Memory Hierarchy

April 2026

SRMIC-X1: Rethinking the Memory Hierarchy for LLM Decode

A technical essay on residency-first decode acceleration, distributed on-package SRAM, HRM/HBM crossover behavior, and why autoregressive inference is fundamentally memory-bound.

GPU Observability NVIDIA H100/H200 Telemetry

April 2026

How to Measure GPU Underutilization on NVIDIA H100 and H200

A practical systems essay on rolling-window low-utilization metrics, sampled idle behavior, power telemetry, and what point-in-time GPU utilization misses.

Portfolio Tools Options Analysis Local Software

April 2026

Schwab Portfolio Tools and the Case for Local, Practical Portfolio Infrastructure

A technical essay on privacy-first local portfolio tooling, Schwab CSV analysis, options workflows, IV crush scenarios, and practical risk reporting without platform theater.

LLM Inference Memory Hierarchy Controller Design

April 2026

vOrchestrate and the Case for Controller-Centric Memory Policy in LLM Inference

A technical essay on dynamic multi-tier weight residency orchestration across HBM, DRAM, and NVMe, with scoring, guardrails, state transitions, and simulation-first evaluation.

Computational Biology Explainability Agent Systems

April 2026

MHC Atlas OS and the Case for Explainable Structure-Guided Prioritization

A technical essay on runtime-agnostic, policy-governed, structure-guided experimental prioritization using AlphaFold-derived data, explainable scoring, and decision memory.

Long Context Memory Policy AI Infrastructure

April 2026

Long-Context Inference Needs Better Memory Policy, Not Just More Memory

A technical essay on predictive context region orchestration, region-level attention and residency control, speculative promotion, reversible demotion, and coherence-aware long-context inference.

AI Systems Memory Hierarchy Bandwidth

April 2026

The Real Tax in AI Systems Is Moving Bytes

A technical essay on bandwidth amplification, repeated refill across the hierarchy, and why better AI machines need stronger movement discipline, not just more compute and memory.

AI Infrastructure SRAM Residency Patent-linked

April 4, 2026

Hardware-Enforced On-Chip Memory Residency for Neural Network Inference Accelerators

A deep technical essay on the bandwidth cost of repeatedly reloading hot weights during autoregressive inference, and why a wired on-chip residency primitive changes the machine rather than merely nudging the policy.

Local AI Middleware Open Source

April 2026

gemma4-wdc: A middleware layer that stops local agents from doing the same work twice

A laptop-first systems write-up on shared execution units, bounded admission windows, and why local multi-agent workflows waste more backend work than most people realize.

HBM LLM Inference Memory Policy

April 2, 2026

HBM Fragmentation Guard: Confidence-Gated Residency Control for AI Accelerators

A systems essay on why LRU is the wrong default for HBM residency under LLM serving pressure, and how confidence gating, thrash budgets, and safe-window compaction change allocator behavior.