Handy

Free, open-source speech-to-text app that runs locally on your computer. No cloud dependency, complete privacy, simple push-to-talk functionality.

At a Glance:

Handy is an open-source, offline desktop speech-to-text tool that transcribes voice locally using Whisper or Parakeet models and pastes text into any application, available across macOS, Windows, and Linux.

Overview:

Handy is a free, cross-platform desktop application for privacy-focused speech transcription that works completely offline. It allows users to press a configurable keyboard shortcut, speak, and have transcribed text pasted directly into whatever application they are using—without sending audio to the cloud. Built as a Tauri app combining a React frontend with a Rust backend, Handy performs local transcription using either Whisper models with GPU acceleration or the CPU-optimized Parakeet V3 model. The tool is designed to be simple and extensible, positioning itself as a forkable foundation for speech-to-text tooling rather than a feature-packed product.

Key Decision Points:

Fully offline operation: All voice processing happens locally using Silero VAD and local Whisper or Parakeet models—no audio leaves the machine.
Cross-platform desktop support: Runs on macOS (Intel and Apple Silicon), Windows (x64), and Linux (x64), with known limitations on Wayland compositors.
Model flexibility with trade-offs: Users can choose between GPU-accelerated Whisper models or the CPU-optimized Parakeet V3, which offers automatic language detection and works on mid-range hardware.
Designed for extensibility over polish: The project explicitly prioritizes being a forkable, well-patterned codebase that others can build upon, rather than aiming to be the most feature-complete speech-to-text app.
Raycast integration available: Users on macOS can control Handy—start/stop recording, browse history, switch models—through a Raycast extension maintained by a community contributor.

Core Features:

Global keyboard shortcut transcription: Press a configurable shortcut to start/stop recording, have speech processed locally, and text pasted into the active text field.
Push-to-talk and toggle modes: Supports both press-and-hold recording and toggle on/off transcription with optional post-processing.
Multiple speech recognition backends: Offers Whisper models (Small/Medium/Turbo/Large) with GPU acceleration and the CPU-optimized Parakeet V3 model with automatic language detection.
Raycast integration: Provides remote control via Raycast for starting/stopping recording, browsing transcript history, managing the dictionary, and switching models and languages.
Debug mode and remote CLI control: Includes an advanced debug panel and supports CLI flags for controlling a running instance, enabling integration with system-level shortcuts and automation on Wayland.
Custom model support: Auto-discovers custom Whisper GGML models placed in the models directory, allowing use of fine-tuned or community models.

Use Cases:

Users who need offline speech transcription: Works entirely on-device, making it suitable for environments with no internet access or where cloud transcription is not desired.
Developers wanting a forkable speech-to-text foundation: The codebase is intentionally structured as a well-patterned Tauri + Rust + React stack for others to build custom transcription solutions upon.
Desktop users who transcribe across multiple applications: Press a shortcut, speak, and have text pasted into any app—text editors, messaging clients, search fields, or form inputs.
Linux users comfortable with system-level configuration: Requires manual shortcut setup on Wayland via desktop environment settings or window manager config files, with X11 users needing xdotool or dotool for reliable text input.

Open-Source Alternative Value:

Handy provides a fully open-source speech transcription application that runs entirely offline on macOS, Windows, and Linux. Built as a Tauri application with a Rust backend, the codebase is accessible and intentionally structured for forking, modification, and extension. The project uses local Whisper and Parakeet models for transcription with Silero-based voice activity detection, and its architecture combines React for the settings UI with Rust for system integration and ML inference. Handy also exposes CLI parameters for remote control, enabling integration with custom shortcuts and scripts across different desktop environments. The maintainers explicitly position the project as a foundation for others to build upon rather than a polished end-user product.

ShareX LinkedIn Reddit