CLAUDE.md 5.0 KB

mic-system

Aplikacja webowa do monitoringu i nagrywania audio z 2 mikrofonów INMP441 (I2S) na RPi Zero 2W.

Architektura

Hardware:  2x INMP441 (I2S, 24-bit) → Google AIY Voice HAT → RPi Zero 2W (aarch64, 416MB RAM)
Backend:   Python 3.13, Flask + Flask-SocketIO (eventlet), 4 procesy worker
Frontend:  Vanilla JS + Canvas (waveform), WebSocket (Socket.IO)
Port:      5000 (HTTP)
Service:   systemd mic_system.service (user: pch, auto-restart)
Network:   WiFi 10.0.100.24

Struktura plików

Plik LOC Rola
app.py ~115 Flask server + WebSocket — REST API (status, recordings CRUD), Socket.IO (audio_data stream, commands)
audio_capture.py ~700 Serce systemu — AudioEngine: odczyt I2S (sounddevice), resampling, DSP pipeline, routing do UI/recorder
agc.py ~110 Stateful AGC — envelope follower, noise gate, speech gating, limiter
beamforming.py ~45 Delay-and-sum beamforming — 2 mikrofony, fractional delay (linear interp)
recorder.py ~160 Threaded WAV writer — non-blocking queue, int16 conversion
static/app.js ~530 Frontend — waveformy (Canvas), VU metery, kontrolki, live monitor (WebAudio), lista nagrań
templates/index.html ~190 UI — polskie etykiety, kontrolki audio
static/style.css ~315 Dark theme, glow effects, responsive
scripts/setup_rpi.sh Setup I2S overlay, ALSA, venv, systemd
scripts/deploy_from_windows.ps1 Deployment z Windows przez SSH/SCP
scripts/diag_ws_record.py Diagnostyczny klient Socket.IO — auto-record + JSON stats
deploy/mic_system.service Systemd unit file

DSP Pipeline (audio_capture.py)

I2S 48kHz stereo (int32)
  → convert to float32 (>>8 for 24-bit MSB in 32-bit frame)
  → [optional] HPF 75Hz + notch 50Hz (hum removal)
  → resample to target rate (16k/22k/24k/32k via scipy polyphase)
  → split: mic1, mic2
  → [optional] mono_mix = (mic1+mic2)/2
  → [optional] beamforming:
      - GCC-PHAT for angle estimation (speech band 300-3400Hz, ProcessPoolExecutor)
      - delay-and-sum with auto-tracking (smoothing 0.88/0.12)
      - beam clarity enhancement (high-freq blend 0.22)
      - presence boost (0.20)
  → [optional] noise suppression (spectral subtraction, alpha varies by gate state)
  → [optional] speech gate (VAD-based, hold 850ms, attack 12ms, release 360ms)
  → [optional] AGC (per-channel: mic1, mic2, beam — AgcProcessor instances)
  → [optional] limiter (peak clipping at 0.97)
  → downsample waveform for UI (every Nth sample)
  → encode PCM16 base64 for live monitor
  → emit via Socket.IO every ~80ms

Tryby pracy

  1. Mic1 / Mic2 — single mic mono
  2. Mono mix — (L+R)/2
  3. Beamforming — delay-and-sum z auto-kierowaniem (GCC-PHAT w paśmie mowy)
  4. HiFi test — raw 48kHz, brak DSP, single mic

Nagrywanie

  • Źródła: mic1, mic2, mono_mix, beam, compare_all (3 pliki jednocześnie), hifi_raw
  • Auto-stop po zadanym czasie (sekundy)
  • Format: WAV 16-bit, mono
  • Threaded writer (WavRecorder) — nie blokuje audio callback

WebSocket Protocol

Server → Client: audio_data (co ~80ms)

{
  "mic1": [float samples...],
  "mic2": [float samples...],
  "beam": [float samples...],
  "mono_mix": [float samples...],
  "rms_mic1": float,
  "rms_mic2": float,
  "rms_beam": float,
  "rms_mono_mix": float,
  "recording": bool,
  "rec_duration": float,
  "speech_detected": bool,
  "speech_gate_open": bool,
  "beam_angle_deg": float,
  "hifi_mode": bool,
  "monitor_on": bool,
  "monitor_source": str,
  "monitor_chunk_b64": str (PCM16 base64),
  "monitor_sr": int
}

Client → Server: client_message

  • {"type": "settings", ...} — update all audio settings
  • {"type": "record_start", "source": str, "duration_sec": float} — start recording
  • {"type": "record_stop"} — stop recording

Server → Client: server_ack

  • {"type": "settings_applied", "settings": {...}}
  • {"type": "record_started", "filenames": [...], ...}
  • {"type": "record_stopped", "status": {...}}

REST API

  • GET / — main page (index.html)
  • GET /api/status — audio engine status + settings
  • GET /api/recordings — list recordings (JSON array)
  • GET /api/recordings/<filename> — download WAV
  • DELETE /api/recordings/<filename> — delete recording

Parametry sprzętowe

  • Mikrofony: INMP441, 24-bit, I2S, rozmieszczone na okręgu ⌀6cm (sloty co 90°)
  • Mic spacing: ~0.0424m (sąsiednie sloty, sin(π/4) × 0.06)
  • Hardware sample rate: 48000 Hz (stały, Voice HAT)
  • Voice HAT overlay: googlevoicehat-soundcard
  • ALSA config: ~/.asoundrc → hw:0, S32_LE, 48kHz, 2ch
  • RPi Zero 2W: 4 cores @ 1GHz, ~51% CPU na główny proces

Development

# Na RPi
cd /home/pch/mic_system
source .venv/bin/activate
python app.py

# Z Windows
.\scripts\deploy_from_windows.ps1 -Host 10.0.100.24 -User pch

# Diagnostyka
python scripts/diag_ws_record.py --url http://127.0.0.1:5000 --duration 10 --source compare_all

Git

Repo: https://git.mm.mk/suby/mic-system.git Branch: master