chore: fiddling around some more
chore: add ctranslate2 and timestamped
chore: add performance markers
chore: refactor test
chore: change worflow name
chore: ensure Python3
chore(duration): convert to chai/mocha syntahx
chore(transcription): add individual tests for others transcribers
chore(transcription): implement formats test of all implementations
Also compare result of other implementation to the reference implementation
chore(transcription): add more test case with other language and models size and local model
chore(test): wip ctranslate 2 adapat
chore(transcription): wip transcript file and benchmark
chore(test): clean a bit
chore(test): clean a bit
chore(test): refacto timestamed spec
chore(test): update workflow
chore(test): fix glob expansion with sh
chore(test): extract some hw info
chore(test): fix async tests
chore(benchmark): add model info
feat(transcription): allow use of a local mode in timestamped-whisper
feat(transcription): extract run and profiling info in own value object
feat(transcription): extract run concept in own class an run more bench
chore(transcription): somplify run object only a uuid is now needed and add more benchmark scenario
docs(transcription): creates own package readme
docs(transcription): add local model usage
docs(transcription): update README
fix(transcription): use fr video for better comparison
chore(transcription): make openai comparison passed
docs(timestamped): clea
chore(transcription): change transcribers transcribe method signature
Introduce whisper builtin model.
fix(transcription): activate language detection
Forbid transcript creation without a language.
Add `languageDetection` flag to an engine and some assertions.
Fix an issue in `whisper-ctranslate2` :
https://github.com/Softcatala/whisper-ctranslate2/pull/93
chore(transcription): use PeerTube time helpers instead of custom ones
Update existing time function to output an integer number of seconds and add a ms human-readable time formatter with hints of tests.
chore(transcription): use PeerTube UUID helpers
chore(transcription): enable CER evaluation
Thanks to this recent fix in Jiwer <3
https://github.com/jitsi/jiwer/issues/873
chore(jiwer): creates JiWer package
I'm not very happy with the TranscriptFileEvaluator constructor... suggestions ?
chore(JiWer): add usage in README
docs(jiwer): update JiWer readme
chore(transcription): use FunMOOC video in fixtures
chore(transcription): add proper english video fixture
chore(transcription): use os tmp directory where relevant
chore(transcription): fix jiwer cli test reference.txt
chore(transcription): move benchmark out of tests
chore(transcription): remove transcription workflow
docs(transcription): add benchmark info
fix(transcription): use ms precision in other transcribers
chore(transcription): simplify most of the tests
chore(transcription): remove slashes when building path with join
chore(transcription): make fromPath method async
chore(transcription): assert path to model is a directory for CTranslate2 transcriber
chore(transcription): ctranslate2 assertion
chore(transcription): ctranslate2 assertion
chore(transcription): add preinstall script for Python dependencies
chore(transcription): add download and unzip utils functions
chore(transcription): add download and unzip utils functions
chore(transcription): download & unzip models fixtures
chore(transcription): zip
chore(transcription): raise download file test timeout
chore(transcription): simplify download file test
chore(transcription): add transcriptions test to CI
chore(transcription): raise test preconditions timeout
chore(transcription): run preinstall scripts before running ci
chore(transcription): create dedicated tmp folder for transcriber tests
chore(transcription): raise timeout some more
chore(transcription): raise timeout some more
chore(transcription): raise timeout some more
chore(transcription): raise timeout some more
chore(transcription): raise timeout some more
chore(transcription): raise timeout some more
chore(transcription): raise timeout some more
chore(transcription): raise timeout some more
chore(transcription): use short video for local model test
chore(transcription): raise timeout some more
chore(transcription): raise timeout some more
chore(transcription): raise timeout some more
chore(transcription): setup verbosity based on NODE_ENV value
* Comments and videos can be automatically tagged using core rules or
watched word lists
* These tags can be used to automatically filter videos and comments
* Introduce a new video comment policy where comments must be approved
first
* Comments may have to be approved if the user auto block them using
core rules or watched word lists
* Implement FEP-5624 to federate reply control policies
Breaking: YAML config `ip_view_expiration` is renamed `view_expiration`
Breaking: Views are taken into account after 10 seconds instead of 30
seconds (can be changed in YAML config)
Purpose of this commit is to get closer to other video platforms where
some platforms count views on play (mux, vimeo) or others use a very low
delay (instagram, tiktok)
We also want to improve the viewer identification, where we no longer
use the IP but the `sessionId` generated by the web browser. Multiple
viewers behind a NAT can now be able to be identified as independent
viewers (this method is also used by vimeo or mux)
Can lead to performance issues on prometheus side and peertube side if
many different URLs have been called on peertube side (google indexation
for example)