Overview:
Handy is a free, open-source, and extensible speech-to-text desktop application that operates entirely offline. It is designed for users who need simple, private transcription without sending audio data to the cloud. By pressing a configurable keyboard shortcut, users can speak and have their words automatically typed into any active text field. The application is cross-platform, supporting Windows, macOS, and Linux, and is built to be a simple, forkable tool for the community.
Core Features:
Local Transcription Models: Supports a choice of Whisper models (Small/Medium/Turbo/Large) with GPU acceleration and the CPU-optimized Parakeet V3 model with automatic language detection.
Offline Operation: All audio processing and transcription happens on the user's computer, with no data sent to external servers.
Configurable Shortcuts: Users can set a global keyboard shortcut to start and stop recording, including a push-to-talk mode.
Voice Activity Detection (VAD): Silences are automatically filtered using Silero VAD to improve transcription accuracy.
Raycast Integration: On macOS, users can control Handy (start/stop recording, browse history, manage dictionary, switch models and languages) directly from Raycast.
Custom Whisper Model Support: Users can manually place custom Whisper GGML models into the
modelsdirectory, which Handy will auto-discover and list as "Custom Models."
Use Cases:
Developers: Use as a foundation to build upon or create custom forks with modified features and models.
Users needing private dictation: Transcribe notes, emails, or documents into any application without relying on cloud services.
Users in restricted network environments: Manually install models for use behind a proxy or firewall where automatic downloads are blocked.
System administrators on Linux: Configure global keyboard shortcuts through their desktop environment (GNOME, KDE Plasma, Sway, Hyprland) to integrate Handy into established workflows.
Why It Matters:
Handy addresses a specific gap in the open-source ecosystem by providing a simple, single-purpose speech-to-text tool that is fully offline and privacy-focused. Its use of Tauri and Rust provides a performant cross-platform foundation. The project emphasizes forkability and extensibility, allowing developers to customize and contribute. Its support for multiple local models, including the CPU-optimized Parakeet V3, makes it accessible without high-end hardware, while the manual model installation feature ensures it works in constrained environments.




