The real problem is not agent intelligence. It is agent operations.
There is now a growing class of coding systems that are no longer just browser tabs. They run through CLIs, require machine-local authentication, inspect real repositories, talk to local MCP servers, and often depend on shell-visible execution environments. That makes them powerful, but it also creates a management problem: how do you safely operate those runtimes across multiple machines without collapsing into ad hoc SSH sessions, arbitrary shell access, and inconsistent local setups?
Mobile Agent Control answers that with a clean systems split. The phone is not a shell. The phone is a control plane. The machine-side supervisor owns execution, local state, health visibility, and policy enforcement. Runtimes remain swappable. The result is a design that feels closer to infrastructure orchestration than to a thin remote-terminal wrapper.
1. Mobile-first operating model
The Android app is not treated as an afterthought. It is the primary operator console, with cross-machine running-agent visibility, machine cards, warning counts, last heartbeat, worker usage, and offline state.
2. Supervisor-owned execution
The FastAPI service is the real control core. It persists state, owns live coordination, mediates auth, emits websocket events, and decides what can and cannot be launched.
3. Runtime abstraction without pretending runtimes are identical
Gemini, Codex, Hermes, and future runtimes sit behind one contract, but their auth checks, CLI discovery, WSL handling, command surfaces, and launch semantics remain adapter-specific.
A layered control-plane architecture instead of a remote shell hack
| Layer | Responsibility | Why it matters |
|---|---|---|
| Android operator console | Launch agents, inspect machine health, track running agents, view logs and status, issue prompts or restart/stop actions. | Turns local agent execution into something operable from anywhere, without granting the phone arbitrary shell power. |
| Responsive web /admin console | Provides an alternate operator interface optimized for desktop and mobile browsers. | Useful for debugging, demos, and mixed-device workflows. |
| FastAPI supervisor | Owns auth, audit, lifecycle management, state transitions, queueing, websocket emissions, and machine-local execution orchestration. | This is the trust boundary and enforcement plane. |
| CLI runtime executor | Launches only approved launch profiles and delegates runtime behavior to adapters. | Prevents the architecture from devolving into “send arbitrary commands over the network.” |
| Adapter layer | Implements runtime-specific capabilities for Gemini CLI, Codex CLI, Hermes via WSL, and future integrations. | Preserves extensibility while keeping the top-level resource model stable. |
| Persistence + eventing | SQLite stores recent state while in-memory services coordinate live scheduling and websocket streaming. | Provides operational continuity across restarts without giving up simple runtime responsiveness. |
Safe launch profiles are the key design choice
The most important security-adjacent decision here is that the phone never sends arbitrary shell commands. Instead, the supervisor loads approved launch templates from backend/config/launch_profiles.json and uses those profiles to control what actually executes. That gives the system a constrained action vocabulary instead of unlimited remote command injection.
- Bearer-token auth at the application layer
- Tailscale-only private connectivity for pre-release deployment
- No public supervisor exposure
- No hardcoded LAN-only assumptions in product design
Persistent enough to survive restarts, live enough to operate in real time
The state strategy is pragmatic. Recent agents, tasks, audit entries, and supervisor snapshots persist in SQLite, while live scheduling, process coordination, and websocket state updates remain in memory. That split is technically sensible for an MVP because it avoids the complexity of a full distributed state machine while still recovering useful history after restart.
SQLite-backed state + in-memory coordinationVendor-neutral at the control plane, runtime-specific at the execution edge
One of the strongest technical traits of the project is that it does not confuse neutrality with uniformity. The control model is shared: machines, agents, tasks, logs, launch profiles, audit history. But the runtime edge remains intentionally adapter-aware. That is exactly the right design for this category.
GeminiCliAdapter
The flagship default path. Surfaces CLI version/auth status, workspace-scoped slash commands from real .gemini/commands directories, and MCP visibility from Gemini settings.
CodexCliAdapter
A parallel real adapter proving that the architecture is not pinned to one vendor. Same control plane, different runtime contract.
HermesCliAdapter
A WSL-backed adapter for Windows hosts. This is notable because it acknowledges the messy reality of heterogeneous local environments.
CopilotCliAdapter
Registered placeholder for future activation. It signals intended expansion without faking completeness today.
Streaming observability is built into the product surface
The websocket event model gives the system live machine and agent awareness. Event payloads can include machine, machine_health, agent, agent_status, job, log, audit, and general message events. On top of that, the UI layers surface warning counts, heartbeat freshness, last log time, elapsed time, and heuristic stuck detection.
This matters because many agent systems look usable in a demo but fail operationally the moment long-running or partially stuck jobs appear. The repo explicitly models those failure-adjacent states instead of pretending every agent is always either perfectly healthy or fully dead.
A practical supervisor API, not just a demo backend
The route design shows that the repository is thinking in resource-model terms rather than in one-off button handlers. There are endpoints for machine identity and health, runtime adapter introspection, workspaces, agents, events, metrics, logs, audit history, tasks, and websocket streaming. That gives the system room to evolve toward a cloud-brokered control plane later without throwing away the client abstraction.
It is quietly building an agent operations layer
Many projects in this area focus on model prompts, sandbox tricks, or front-end wrappers. This one focuses on something more durable: the operational substrate for terminal-native agents. That includes lifecycle management, constrained launch semantics, runtime heterogeneity, workspace discovery, health telemetry, audit trails, and the migration path from direct machine connectivity to a cloud-routed architecture.
The repo is disciplined about scope
Worker scaling and pause/resume semantics are explicitly deferred. That is actually a good sign. The design concentrates first on getting the core control-plane abstractions right, rather than pretending to solve full distributed agent scheduling in v1.
The path from tailnet-connected MVP to cloud control plane is already visible
The current deployment model uses Tailscale private connectivity between the Android client and each supervisor. But the repository is clear that this is temporary. The longer-term direction is a logical supervisor API behind a cloud control plane, with machine supervisors remaining as local orchestration agents and the mobile client authenticating against a cloud API rather than each machine directly.
Private Tailscale addresses, app-level bearer tokens, direct registration per machine.
The client targets logical resources instead of machine URLs.
Commands route to local supervisors while resource semantics stay stable.
Mobile UX preserved, security posture improved, multi-machine fleet management simplified.
This repo matters because it treats coding agents like infrastructure, not toys
Mobile Agent Control is best understood as an operations architecture for terminal-native agents. Its contribution is not “yet another agent.” Its contribution is the control plane that makes real local runtimes usable across devices and machines. The design choices around supervisor ownership, launch-profile safety, runtime adapters, websocket observability, and cloud migration all point in the same direction: agent systems are becoming operational systems, and operational systems need disciplined control architecture.
If this project keeps expanding, the most compelling future directions will likely be richer fleet policy, stronger authn/authz layering, command governance, multi-tenant routing, and eventually deeper scheduling semantics for longer-running supervised tasks.