Native ESPHome

ESPHome

Built-in ESPHome device runtime inside Tater for VoicePE, Sat1, and future native devices, with satellites, live entities, logs, stats, and the full voice pipeline on the main app port.

Native device runtime Bundled Settings -> ESPHome Voice device runtime
Why it matters

What native ESPHome unlocks

Tater now owns the full ESPHome voice experience directly: discovery, room-aware voice sessions, live device state, and on-device playback all run inside the main app instead of a downloadable core.

Feature set

What makes the built-in ESPHome stack feel like a real voice system

  • Built into Tater itself, always on, and served from the main app port rather than a separate external voice service.
  • Settings -> ESPHome now owns Satellites, Settings, and Stats so operators can manage discovery, pairing, rooms, logs, live entities, and voice metrics in one place.
  • Shared speech backends live in Settings -> Models, with Faster Whisper, Vosk, Wyoming, Kokoro, Pocket TTS, Piper, and Home Assistant announcement TTS available where they make sense.
  • Runtime model files auto-download into agent_lab/models/stt and agent_lab/models/tts so rebuilds do not require hand-seeding models.
  • Live entity views expose sensors plus writable controls such as switches, buttons, numbers, selects, lights, and RGB color when the device supports it.
  • Per-device logs, stats, room awareness, and direct playback make Tater Voice hardware feel local to the room instead of remote to the browser.
Operator controls

How operators use it in Tater

ESPHome is configured through Settings -> ESPHome for Satellites, Settings, and Stats, while shared STT/TTS model choices live in Settings -> Models. The runtime is built into the main Tater app rather than a separate downloadable core.

Voice experience

How native ESPHome makes Tater feel like a real device assistant.

These notes focus on the built-in ESPHome runtime, the live voice pipeline, shared speech backends, and the operator tools now living directly inside Tater.

Built inOne appMain port

Native ESPHome runtime

ESPHome is no longer a shop core. It is part of the main Tater app and starts with Tater.

  • The old external voice runtime has been folded into Tater's built-in ESPHome runtime so the voice stack no longer depends on a separate downloadable core or its own HTTP listener.
  • That keeps the device lifecycle simpler: discovery, session handling, playback URLs, and operator screens now all live inside the same main application.
  • This built-in shape also leaves room for future ESPHome device types beyond the current voice-pipeline hardware.
Models tabSTTTTS

Voice pipeline and shared models

The live voice loop uses shared STT/TTS choices from the Models tab while keeping ESPHome-specific controls in one native screen.

  • STT can use Faster Whisper, Vosk, or Wyoming depending on the install and hardware, while TTS can use Wyoming, Kokoro, Pocket TTS, or Piper.
  • Announcement flows can still bridge to Home Assistant API TTS when that is the right delivery path, but device-local voice replies stay inside Tater's built-in runtime.
  • Because models auto-download into agent_lab/models, first-run setup is much smoother on fresh installs and bind-mounted Docker deployments.
SatellitesStatsLive logs

Runtime observability

The ESPHome screen now separates devices, settings, and stats so tuning is based on real behavior instead of guesswork.

  • Satellites shows discovered devices, saved room assignments, live entity state, device facts, and an ESPHome-style live log console.
  • Stats surfaces wake behavior, no-op rates, false wakes, backend latency, fallback usage, and per-device voice summaries for tuning.
  • Writable entity controls are available inline for things like switches, lights, numbers, buttons, and select options.
ExperimentalPartial STTEarly-start TTS

Experimental voice features

Optional experimental toggles let operators test more aggressive voice behavior on hardware that can support it.

  • Experimental Partial STT can keep partial transcript state during live capture so the system gets earlier visibility into what the user is saying.
  • Experimental Early-Start TTS can begin speaking long replies sooner by preparing smaller response chunks before the whole answer is finished.
  • Experimental Live Tool Progress Speech lets Tater speak Hydra tool-progress lines during the thinking phase instead of waiting until the final answer.
Built-in APIs

HTTP endpoints exposed by this runtime.

GET /api/settings/esphome/runtime

Load the native ESPHome runtime view used by Settings -> ESPHome.

Returns the current Satellites, Settings, and Stats payload so the WebUI can render discovery state, device cards, voice metrics, and runtime controls.

POST /api/settings/esphome/runtime/action

Run a native ESPHome runtime action from the WebUI.

Handles refresh, connect/disconnect, save/forget satellite actions, live log lifecycle, and direct entity-control actions from the ESPHome settings screen.

GET /tater-ha/v1/voice/native/status

Inspect current voice-pipeline runtime state and backend availability.

Returns selected speech backends, effective fallback state, model roots, discovery state, selector sessions, and availability of local STT/TTS backends.

POST /tater-ha/v1/voice/esphome/entities

Fetch live ESPHome entity rows for one connected satellite.

Returns the live entity snapshot so verbas and operators can inspect sensors, buttons, numbers, switches, lights, and other exposed device entities.

POST /tater-ha/v1/voice/esphome/entities/command

Command a writable ESPHome entity on one satellite.

Supports button, number, switch, select, text, and light-control actions so device-local flows can act directly on the speaking device.

POST /tater-ha/v1/voice/esphome/play

Queue direct audio playback on a selected ESPHome satellite.

Used for device-local playback flows such as announcements, generated audio, and other responses that should play on the speaking satellite itself.