Writings
Long-form writing across AI infrastructure, memory systems, local agent runtimes, and accelerator architecture. This section is where project ideas and patent-adjacent concepts get room to breathe as essays rather than just landing-page summaries.
April 2026
A systems essay on how Vera Rubin NVL72 changes AI data center cooling with 45°C supply temperatures, fan-free trays, hose-free design, and liquid-cooled busbars.
April 2026
A systems essay on the five-layer direct liquid cooling architecture of Blackwell GB300 NVL72 racks and the suppliers behind each thermal layer.
April 2026
A systems essay on where thermal control actually belongs inside vLLM, from scheduler decisions and swap behavior to tensor-parallel batch cutting.
April 2026
A technical guide to detecting silent HBM thermal throttling on H100 and H200 clusters when standard GPU temperature dashboards look deceptively healthy.
April 2026
A systems essay on HBM thermal telemetry, KV fetch stalls, and why hot memory dies turn thermal debt into an inference scheduling problem.
April 2026
A systems essay on why co-packaged optics matters because it makes disaggregated memory and expert movement schedulable under tight latency budgets.
April 2026
A technical essay on thermal-safe KV admission, HBM backpressure, reuse prediction, and production-serving policy design for H100 and H200 inference clusters.
April 2026
A technical primer mapping the AI optical networking stack across fiber, lasers, transceivers, DSPs, switches, and test infrastructure through the companies building each layer.
April 2026
A systems essay on attention sink tokens, structural KV cache waste, and why long-context serving needs memory-policy-aware treatment of hot-but-low-utility tokens.
April 2026
A systems essay on how dense GPU racks accumulate thermal debt, why point-in-time observability misses it, and what thermally-aware control planes should measure.
April 2026
A detailed product and architecture essay on TechDemoForge, its workflow, repo structure, and why local-first technical demo generation is useful.
April 2026
A systems essay on why prefill and decode should be split across different hardware pools and why the real engineering challenge becomes memory orchestration.
April 2026
A systems essay on draft/verify KV pressure, rollback fragmentation, and why speculation pays off only when memory policy is designed for it.
April 2026
A systems essay on sparse attention serving, hierarchical KV residency, predictive prefetch, and why long-context wins increasingly come from memory policy.
April 2026
A systems-first essay on why the next optical contest in AI infrastructure is shifting inward toward dense rack-scale and scale-up fabrics.
April 2026
A deeper systems-first essay on why AI networking will be built from a layered materials stack rather than a single optical winner.
April 2026
A deeper systems essay on why the next constraint is the power, heat, and topology cost of moving bits across clusters.
April 2026
A systems essay on how optics is moving into the scheduler, topology planner, reliability model, and power logic of next-generation AI infrastructure.
April 2026
A practical hardware-focused companion essay on optical architectures, power budgets, serviceability, and the real tradeoffs hyperscalers optimize for.
April 2026
A systems essay on explicit data movement, KV cache management, tiered memory placement, DMA orchestration, and why scheduler quality now directly determines inference efficiency.
April 2026
A long-form systems argument for selective coherency, explicit tensor movement, CXL.mem over universal coherence, and schedule-first design in large-model infrastructure.
April 2026
A systems primer on HBM as a bounded working set, KV cache dominance, PagedAttention, weight offload, and why scheduled weight streaming is the next step in inference architecture.
April 2026
A systems essay on 48-bit virtual addressing, mmap-heavy designs, storage density, 5-level paging, and why explicit data orchestration becomes the long-term answer.
April 2026
A systems essay on compiler-emitted memory intent, object semantics, workload phases, reuse confidence, and why hardware orchestration needs structured plans instead of blind guesses.
April 2026
A technical essay on explicit memory intent, residency maps, regret-aware eviction, recomputation-vs-transfer arbitration, atomic doorbells, and DPU embodiments for AI memory fabrics.
April 2026
A systems essay on why software-only memory orchestration hits a ceiling and why the real future is hardware-resident movement control near the fabric, the tiers, and the accelerators.
April 2026
A technical essay on memory placement, movement, residency, reuse, admission, and eviction as first-class scheduling decisions rather than passive implementation details.
April 2026
A technical essay on why local direct-storage acceleration and network direct-memory acceleration still do not add up to one universal GPU-native end-to-end storage fabric.
April 2026
A technical essay on why eliminating host-side bounce buffers shifts the real bottleneck inward, toward deterministic HBM↔SRAM orchestration inside the accelerator.
April 2026
A technical essay on hidden staging buffers, GPUDirect-era dataflow, and why eliminating unnecessary copies matters across storage, network, memory, and accelerator paths.
April 2026
A detailed technical essay on RDMA, zero-copy realities, and why "RDMA exists" still does not mean true end-to-end zero-copy in disaggregated inference.
April 2026
A technical essay on policy above transport for KV movement, workload-aware admissibility, swappable glue layers, Scenario E and F, and experiment-backed results.
April 2026
A technical essay on gray failures, checkpoint economics, cooling-compute seams, and seam-aware control planes for modern AI cluster reliability.
April 2026
A technical essay on seam failures across facilities, fabrics, heterogeneous inference pools, checkpoint economics, and why component dashboards miss the most expensive cluster incidents.
April 2026
A technical explainer on authenticated power contracts, alternative execution plans, safe switching boundaries, and runtime enforcement for edge inference systems.
April 2026
A technical deep dive into compiler-scheduled DMA, explicit fences, bounded SRAM, and why deterministic buffer legality changes the inference memory system.
April 2026
A technical essay on latency-aware policy selection across model variants, DMA strategy, memory residency, and accelerator performance-state control on edge systems.
April 2026
A technical essay on HBM pressure, predictive multi-tier placement, precision state transitions, MoE router-history signals, and bandwidth-aware runtime scheduling.
April 2026
A systems essay on Android-first operations, FastAPI supervision, runtime adapters, websocket telemetry, and safe control of local coding agents across machines.
March 2026
An introduction to ChromeLens, deterministic CDP tracing, interactive flow profiling, and the hydration penalty behind complex modern web applications.
April 2026
A detailed technical essay on larger L2 caches in AI systems, miss-rate reduction, average access time, bandwidth relief, energy tradeoffs, and where bigger L2 stops being enough.
April 2026
A detailed technical essay on AI-native residency fabrics, class-aware memory, weights, KV cache, experts, and why bigger generic caches are not the full answer.
April 2026
A technical essay on residency-first decode acceleration, distributed on-package SRAM, HRM/HBM crossover behavior, and why autoregressive inference is fundamentally memory-bound.
April 2026
A practical systems essay on rolling-window low-utilization metrics, sampled idle behavior, power telemetry, and what point-in-time GPU utilization misses.
April 2026
A technical essay on privacy-first local portfolio tooling, Schwab CSV analysis, options workflows, IV crush scenarios, and practical risk reporting without platform theater.
April 2026
A technical essay on dynamic multi-tier weight residency orchestration across HBM, DRAM, and NVMe, with scoring, guardrails, state transitions, and simulation-first evaluation.
April 2026
A technical essay on runtime-agnostic, policy-governed, structure-guided experimental prioritization using AlphaFold-derived data, explainable scoring, and decision memory.
April 2026
A technical essay on predictive context region orchestration, region-level attention and residency control, speculative promotion, reversible demotion, and coherence-aware long-context inference.
April 2026
A technical essay on bandwidth amplification, repeated refill across the hierarchy, and why better AI machines need stronger movement discipline, not just more compute and memory.
April 4, 2026
A deep technical essay on the bandwidth cost of repeatedly reloading hot weights during autoregressive inference, and why a wired on-chip residency primitive changes the machine rather than merely nudging the policy.
April 2026
A laptop-first systems write-up on shared execution units, bounded admission windows, and why local multi-agent workflows waste more backend work than most people realize.
April 2, 2026
A systems essay on why LRU is the wrong default for HBM residency under LLM serving pressure, and how confidence gating, thrash budgets, and safe-window compaction change allocator behavior.