# Transcription DevTools Includes: * __JiWER__ CLI NodeJS wrapper * Benchmark tool to test multiple transcription engines * TypeScript classes to evaluate word-error-rate of files generated by the transcription ## Build ```sh npm run build ``` ## Benchmark A benchmark of available __transcribers__ might be run with: ```sh npm run benchmark ``` ``` ┌────────────────────────┬───────────────────────┬───────────────────────┬──────────┬────────┬───────────────────────┐ │ (index) │ WER │ CER │ duration │ model │ engine │ ├────────────────────────┼───────────────────────┼───────────────────────┼──────────┼────────┼───────────────────────┤ │ 5yZGBYqojXe7nuhq1TuHvz │ '28.39506172839506%' │ '9.62457337883959%' │ '41s' │ 'tiny' │ 'openai-whisper' │ │ x6qREJ2AkTU4e5YmvfivQN │ '29.75206611570248%' │ '10.46195652173913%' │ '15s' │ 'tiny' │ 'whisper-ctranslate2' │ └────────────────────────┴───────────────────────┴───────────────────────┴──────────┴────────┴───────────────────────┘ ``` The benchmark may be run with multiple model builtin sizes: ```sh MODELS=tiny,small,large npm run benchmark ``` ## Jiwer > *JiWER is a python tool for computing the word-error-rate of ASR systems.* > https://jitsi.github.io/jiwer/cli/ __JiWER__ serves as a reference implementation to calculate errors rates between 2 text files: - WER (Word Error Rate) - CER (Character Error Rate) ### Usage ```typescript const jiwerCLI = new JiwerClI('./reference.txt', './hypothesis.txt') // WER as a percentage, ex: 0.03 -> 3% console.log(await jiwerCLI.wer()) // CER as a percentage: 0.01 -> 1% console.log(await jiwerCLI.cer()) // Detailed comparison report console.log(await jiwerCLI.alignment()) ``` ## Resources - https://jitsi.github.io/jiwer/ - https://github.com/rapidfuzz/RapidFuzz