Audio Transcriber

Convert any audio or video file to text—completely offline and private. Choose between CPU-optimized (recommended) or GPU-accelerated depending on your hardware.

Why Audio Transcriber?

✅ Works offline — No cloud uploads, no account needed
✅ Fast and accurate — Powered by advanced AI models
✅ Free and open — No subscriptions or usage limits
✅ Cross-platform — Windows, macOS, and Linux supported
✅ Choose your tool — CPU-based is significantly faster for single files; GPU-based better for batch processing

⚡ Quick Comparison

Feature	CPU-Based	GPU-Based
Best for	Single files, laptops	Batch processing
Engine	Parakeet V3 (ONNX)	Whisper AI (PyTorch)
Speed	13.8x real-time	2.91x real-time
Setup	Simple	Requires CUDA setup
GPU needed	No	Yes (NVIDIA)
Model size	~671MB	~1GB

Test Results (1:14 min, 1MB audio file):

CPU: 5.36 seconds (13.8x real-time, 0.07 RTF)
GPU: 25.45 seconds (2.91x real-time, 0.34 RTF)
CPU is 4.75x faster for single files

→ Recommended: Start with CPU-based for single transcriptions. Use GPU for batch processing 50+ files.

🚀 CPU-Based Transcriber (Recommended)

This is the easiest option. It works on any computer without requiring a GPU.

Installation

Step 1: Navigate to the CPU folder

cd "CPU Based Audio Transcriber"

Step 2: Create a virtual environment

macOS / Linux:

python3 -m venv venv
source venv/bin/activate

Windows (PowerShell):

python -m venv venv
venv\Scripts\Activate.ps1

Windows (Command Prompt):

python -m venv venv
venv\Scripts\activate.bat

Step 3: Install dependencies

pip install -r requirements.txt

Step 4 (Optional): Install FFmpeg for video support

macOS:

brew install ffmpeg

Ubuntu / Debian:

sudo apt-get install ffmpeg

Windows:

choco install ffmpeg

Quick Start

Launch the app:

python app.py

On first run:

Download splash screen — App automatically downloads AI models from Hugging Face (~671MB)
Progress indicator — Shows download status
Main window — Ready to transcribe

Subsequent runs:

No splash screen — Models are cached locally, app starts instantly
Loading screen — App loads the model into memory
Main window — Ready to transcribe

Then:

Click Browse and select an audio or video file
Click Transcribe
Watch the countdown timer
Results appear automatically

Supported File Types

Audio: MP3, WAV, M4A, FLAC, OGG, AIFF, AU, and more
Video: MP4, MKV, AVI, MOV, WEBM, FLV, WMV, etc. (auto-converted)

How It Works

Smart time estimation — App measures your CPU performance on startup
Accurate countdown — Timer shows real remaining time, not guesses
Responsive UI — Never freezes; cancel anytime
No language selection — Model auto-detects what you're speaking

What Happens Behind the Scenes

1. Select file
  ↓
2. App measures CPU speed (5-second benchmark)
  ↓
3. Calculate: estimated_time = file_duration × your_cpu_speed
  ↓
4. Transcribe with accurate countdown
  ↓
5. Show results

Project Files

CPU Based Audio Transcriber/
├── app.py                      # Main interface (start here)
├── config.py                   # Model auto-download configuration
├── download_splash.py          # Download progress splash screen
├── model_manager.py            # Parakeet V3 model loader
├── transcription_service.py    # Transcription engine
├── performance_profiler.py     # CPU benchmarking
├── video_converter.py          # Video-to-audio conversion
├── audio_handler.py            # Audio file loading
├── requirements.txt            # Dependencies list
│
├── models/                     # ONNX model files (auto-downloaded on first run)
│   ├── encoder.int8.onnx
│   ├── decoder.int8.onnx
│   ├── joiner.int8.onnx
│   └── tokens.txt

Note: Model files are automatically downloaded from Hugging Face on first run (~671MB). Subsequent runs use the cached models. │ └── assets/ # UI resources

System Requirements

Python: 3.14+
RAM: 2GB minimum (4GB recommended)
Disk: 300MB for models
OS: Windows, macOS (Intel/Apple Silicon), Linux

Dependencies

sherpa-onnx>=1.9.0 — ONNX runtime for Parakeet V3
soundfile>=0.12.1 — Audio file reading
customtkinter>=5.0.0 — Modern graphical interface
numpy>=1.21.0 — Audio processing

🎮 GPU-Based Transcriber

Use this if you have an NVIDIA GPU for batch processing.

Installation

Step 1: Navigate to the GPU folder

cd "GPU Based Audio Transcriber"

Step 2: Create a virtual environment

macOS / Linux:

python3 -m venv venv
source venv/bin/activate

Windows (PowerShell):

python -m venv venv
venv\Scripts\Activate.ps1

Step 3: Install dependencies

pip install -r requirements.txt

Step 4: Install PyTorch with CUDA support

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126

Step 5: Verify GPU detection

python -c "import torch; print('GPU available:', torch.cuda.is_available())"

If True appears, your GPU is ready. If False, see GPU troubleshooting.

Quick Start

python main.py

On first run:

Download splash screen — App automatically downloads Whisper AI model from Hugging Face (~1GB)
Progress indicator — Shows download status
Splash screen — Model loads into memory
Main window — Two columns for single/batch transcription

Subsequent runs:

No download — Models are cached, app loads model directly
Splash screen — Model loads into memory
Main window — Ready to transcribe

Choose Single File Transcription or Batch Processing:

Single File:

Click Browse File
Select language
Click Transcribe

Batch Processing:

Click Select Folder
Choose output folder
Click Start Batch

Features

🌐 Multi-language — Supports multiple languages
📁 Batch processing — Transcribe entire folders automatically
💾 Auto-save — Results saved to text files
⏱️ Real-time feedback — Monitor transcription progress

Dependencies

torch>=2.0.0 — PyTorch with CUDA support
openai-whisper>=20230314 — Whisper speech recognition
pydub>=0.25.0 — Audio handling
PyQt5>=5.0.0 — Professional interface

Note: AI models are automatically downloaded from Hugging Face on first run (~1GB). Subsequent runs use the cached models.

📖 Detailed Usage Guide

CPU Version: Full Workflow

Example: Transcribe a podcast episode

1. Start app: python app.py

2. App benchmarks your CPU (watch progress):
  "⏳ Starting CPU benchmark..."
  "✓ CPU benchmark complete (RTF: 1.2x)"

3. Click "Browse" → Select "podcast.mp4"

4. Click "Transcribe"
  Status: "⏱️ 15.2s remaining"
  (countdown updates every 0.5 seconds)

5. Wait for transcription
  Status: "✓ Ready"

6. Results appear in text box
  (Select all + copy, or save to file)

GPU Version: Batch Processing

Example: Transcribe 10 MP4 files

1. Start app: python main.py
  (Splash screen shows model loading progress)

2. Click "Select Folder with Files"
  → Choose folder with 10 MP4s

3. Shows "✓ 10 file(s) found"

4. Click "Start Batch"
  → Choose output folder

5. Watch real-time progress:
  "Processing 3/10: meeting-notes.mp4"
  "⏱️ Est. Time: 2:45"

6. Results saved:
  ✓ meeting-notes.txt
  ✓ Q1-summary.txt
  ...

📝 License

MIT — Free to use, modify, and distribute

❓ FAQ

Q: Can I use this without internet?
A: Yes! After you run the app once (models download), you can work completely offline.

Q: Is there a file size limit?
A: No. Very large files will just take longer to transcribe.

Q: Does it work on Mac with Apple Silicon?
A: Yes, the CPU version works on Apple Silicon (M1/M2/M3).

Q: Can I batch transcribe with CPU version?
A: Not in the UI, but you can run multiple instances.

Q: Can I improve transcription accuracy?
A: Use clear audio and minimize background noise for best results.

🤝 Contributing

Found a bug? Want to improve something?

Check existing GitHub issues
Create a new issue with clear description
Submit a pull request with improvements

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.claude/skills		.claude/skills
CPU Based Audio Transcriber		CPU Based Audio Transcriber
GPU Based Audio Transcriber		GPU Based Audio Transcriber
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audio Transcriber

Why Audio Transcriber?

⚡ Quick Comparison

🚀 CPU-Based Transcriber (Recommended)

Installation

Quick Start

Supported File Types

How It Works

What Happens Behind the Scenes

Project Files

System Requirements

Dependencies

🎮 GPU-Based Transcriber

Installation

Quick Start

Features

Dependencies

📖 Detailed Usage Guide

CPU Version: Full Workflow

GPU Version: Batch Processing

📝 License

❓ FAQ

🤝 Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Audio Transcriber

Why Audio Transcriber?

⚡ Quick Comparison

🚀 CPU-Based Transcriber (Recommended)

Installation

Quick Start

Supported File Types

How It Works

What Happens Behind the Scenes

Project Files

System Requirements

Dependencies

🎮 GPU-Based Transcriber

Installation

Quick Start

Features

Dependencies

📖 Detailed Usage Guide

CPU Version: Full Workflow

GPU Version: Batch Processing

📝 License

❓ FAQ

🤝 Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages