Mobile Agent Control — A Vendor-Neutral Control Plane for Terminal-Native Coding Agents

Why this matters

The real problem is not agent intelligence. It is agent operations.

There is now a growing class of coding systems that are no longer just browser tabs. They run through CLIs, require machine-local authentication, inspect real repositories, talk to local MCP servers, and often depend on shell-visible execution environments. That makes them powerful, but it also creates a management problem: how do you safely operate those runtimes across multiple machines without collapsing into ad hoc SSH sessions, arbitrary shell access, and inconsistent local setups?

Mobile Agent Control answers that with a clean systems split. The phone is not a shell. The phone is a control plane. The machine-side supervisor owns execution, local state, health visibility, and policy enforcement. Runtimes remain swappable. The result is a design that feels closer to infrastructure orchestration than to a thin remote-terminal wrapper.

The strongest idea in the repo is architectural: the control surface stays vendor-neutral even while the execution path remains deeply aware of runtime-specific behavior.

1. Mobile-first operating model

The Android app is not treated as an afterthought. It is the primary operator console, with cross-machine running-agent visibility, machine cards, warning counts, last heartbeat, worker usage, and offline state.

2. Supervisor-owned execution

The FastAPI service is the real control core. It persists state, owns live coordination, mediates auth, emits websocket events, and decides what can and cannot be launched.

3. Runtime abstraction without pretending runtimes are identical

Gemini, Codex, Hermes, and future runtimes sit behind one contract, but their auth checks, CLI discovery, WSL handling, command surfaces, and launch semantics remain adapter-specific.

Architecture deep dive

A layered control-plane architecture instead of a remote shell hack

Layer	Responsibility	Why it matters
Android operator console	Launch agents, inspect machine health, track running agents, view logs and status, issue prompts or restart/stop actions.	Turns local agent execution into something operable from anywhere, without granting the phone arbitrary shell power.
Responsive web /admin console	Provides an alternate operator interface optimized for desktop and mobile browsers.	Useful for debugging, demos, and mixed-device workflows.
FastAPI supervisor	Owns auth, audit, lifecycle management, state transitions, queueing, websocket emissions, and machine-local execution orchestration.	This is the trust boundary and enforcement plane.
CLI runtime executor	Launches only approved launch profiles and delegates runtime behavior to adapters.	Prevents the architecture from devolving into “send arbitrary commands over the network.”
Adapter layer	Implements runtime-specific capabilities for Gemini CLI, Codex CLI, Hermes via WSL, and future integrations.	Preserves extensibility while keeping the top-level resource model stable.
Persistence + eventing	SQLite stores recent state while in-memory services coordinate live scheduling and websocket streaming.	Provides operational continuity across restarts without giving up simple runtime responsiveness.

Safety model

Safe launch profiles are the key design choice

The most important security-adjacent decision here is that the phone never sends arbitrary shell commands. Instead, the supervisor loads approved launch templates from backend/config/launch_profiles.json and uses those profiles to control what actually executes. That gives the system a constrained action vocabulary instead of unlimited remote command injection.

Bearer-token auth at the application layer
Tailscale-only private connectivity for pre-release deployment
No public supervisor exposure
No hardcoded LAN-only assumptions in product design

State model

Persistent enough to survive restarts, live enough to operate in real time

The state strategy is pragmatic. Recent agents, tasks, audit entries, and supervisor snapshots persist in SQLite, while live scheduling, process coordination, and websocket state updates remain in memory. That split is technically sensible for an MVP because it avoids the complexity of a full distributed state machine while still recovering useful history after restart.

SQLite-backed state + in-memory coordination

Runtime design

Vendor-neutral at the control plane, runtime-specific at the execution edge

One of the strongest technical traits of the project is that it does not confuse neutrality with uniformity. The control model is shared: machines, agents, tasks, logs, launch profiles, audit history. But the runtime edge remains intentionally adapter-aware. That is exactly the right design for this category.

GeminiCliAdapter

The flagship default path. Surfaces CLI version/auth status, workspace-scoped slash commands from real .gemini/commands directories, and MCP visibility from Gemini settings.

CodexCliAdapter

A parallel real adapter proving that the architecture is not pinned to one vendor. Same control plane, different runtime contract.

HermesCliAdapter

A WSL-backed adapter for Windows hosts. This is notable because it acknowledges the messy reality of heterogeneous local environments.

CopilotCliAdapter

Registered placeholder for future activation. It signals intended expansion without faking completeness today.

Monitoring

Streaming observability is built into the product surface

The websocket event model gives the system live machine and agent awareness. Event payloads can include machine, machine_health, agent, agent_status, job, log, audit, and general message events. On top of that, the UI layers surface warning counts, heartbeat freshness, last log time, elapsed time, and heuristic stuck detection.

This matters because many agent systems look usable in a demo but fail operationally the moment long-running or partially stuck jobs appear. The repo explicitly models those failure-adjacent states instead of pretending every agent is always either perfectly healthy or fully dead.

Event surface WS /ws Possible payload categories: - machine - machine_health - agent - agent_status - job - log - audit - message Operational hints exposed in UI: - elapsed time - last heartbeat - last log time - recent logs - recent events - warning/stuck indicators

API surface

A practical supervisor API, not just a demo backend

The route design shows that the repository is thinking in resource-model terms rather than in one-off button handlers. There are endpoints for machine identity and health, runtime adapter introspection, workspaces, agents, events, metrics, logs, audit history, tasks, and websocket streaming. That gives the system room to evolve toward a cloud-brokered control plane later without throwing away the client abstraction.

Representative endpoints GET /health GET /machines GET /machines/{id}/health GET /agents GET /agents/running GET /agents/{id} GET /agents/{id}/events GET /agents/{id}/metrics POST /agents/start POST /agents/launch POST /agents/{id}/prompt POST /agents/{id}/tasks POST /agents/{id}/restart POST /agents/{id}/stop GET /launch-profiles GET /runtime/adapters GET /runtime/adapters/{adapter_id} GET /runtime/adapters/{adapter_id}/commands POST /runtime/adapters/{adapter_id}/commands DELETE /runtime/adapters/{adapter_id}/commands/{command_name} GET /workspaces GET /audit GET /tasks WS /ws

Why the repo is interesting

It is quietly building an agent operations layer

Many projects in this area focus on model prompts, sandbox tricks, or front-end wrappers. This one focuses on something more durable: the operational substrate for terminal-native agents. That includes lifecycle management, constrained launch semantics, runtime heterogeneity, workspace discovery, health telemetry, audit trails, and the migration path from direct machine connectivity to a cloud-routed architecture.

What is still intentionally deferred

The repo is disciplined about scope

Worker scaling and pause/resume semantics are explicitly deferred. That is actually a good sign. The design concentrates first on getting the core control-plane abstractions right, rather than pretending to solve full distributed agent scheduling in v1.

Roadmap reading

The path from tailnet-connected MVP to cloud control plane is already visible

The current deployment model uses Tailscale private connectivity between the Android client and each supervisor. But the repository is clear that this is temporary. The longer-term direction is a logical supervisor API behind a cloud control plane, with machine supervisors remaining as local orchestration agents and the mobile client authenticating against a cloud API rather than each machine directly.

TodayAndroid → machine supervisors
Private Tailscale addresses, app-level bearer tokens, direct registration per machine.

TransitionCloud API introduced
The client targets logical resources instead of machine URLs.

Steady stateCloud-brokered control plane
Commands route to local supervisors while resource semantics stay stable.

ResultProduct-grade topology
Mobile UX preserved, security posture improved, multi-machine fleet management simplified.

Closing take

This repo matters because it treats coding agents like infrastructure, not toys

Mobile Agent Control is best understood as an operations architecture for terminal-native agents. Its contribution is not “yet another agent.” Its contribution is the control plane that makes real local runtimes usable across devices and machines. The design choices around supervisor ownership, launch-profile safety, runtime adapters, websocket observability, and cloud migration all point in the same direction: agent systems are becoming operational systems, and operational systems need disciplined control architecture.

If this project keeps expanding, the most compelling future directions will likely be richer fleet policy, stronger authn/authz layering, command governance, multi-tenant routing, and eventually deeper scheduling semantics for longer-running supervised tasks.