chore: fiddling around some more
chore: add ctranslate2 and timestamped
chore: add performance markers
chore: refactor test
chore: change worflow name
chore: ensure Python3
chore(duration): convert to chai/mocha syntahx
chore(transcription): add individual tests for others transcribers
chore(transcription): implement formats test of all implementations
Also compare result of other implementation to the reference implementation
chore(transcription): add more test case with other language and models size and local model
chore(test): wip ctranslate 2 adapat
chore(transcription): wip transcript file and benchmark
chore(test): clean a bit
chore(test): clean a bit
chore(test): refacto timestamed spec
chore(test): update workflow
chore(test): fix glob expansion with sh
chore(test): extract some hw info
chore(test): fix async tests
chore(benchmark): add model info
feat(transcription): allow use of a local mode in timestamped-whisper
feat(transcription): extract run and profiling info in own value object
feat(transcription): extract run concept in own class an run more bench
chore(transcription): somplify run object only a uuid is now needed and add more benchmark scenario
docs(transcription): creates own package readme
docs(transcription): add local model usage
docs(transcription): update README
fix(transcription): use fr video for better comparison
chore(transcription): make openai comparison passed
docs(timestamped): clea
chore(transcription): change transcribers transcribe method signature
Introduce whisper builtin model.
fix(transcription): activate language detection
Forbid transcript creation without a language.
Add `languageDetection` flag to an engine and some assertions.
Fix an issue in `whisper-ctranslate2` :
https://github.com/Softcatala/whisper-ctranslate2/pull/93
chore(transcription): use PeerTube time helpers instead of custom ones
Update existing time function to output an integer number of seconds and add a ms human-readable time formatter with hints of tests.
chore(transcription): use PeerTube UUID helpers
chore(transcription): enable CER evaluation
Thanks to this recent fix in Jiwer <3
https://github.com/jitsi/jiwer/issues/873
chore(jiwer): creates JiWer package
I'm not very happy with the TranscriptFileEvaluator constructor... suggestions ?
chore(JiWer): add usage in README
docs(jiwer): update JiWer readme
chore(transcription): use FunMOOC video in fixtures
chore(transcription): add proper english video fixture
chore(transcription): use os tmp directory where relevant
chore(transcription): fix jiwer cli test reference.txt
chore(transcription): move benchmark out of tests
chore(transcription): remove transcription workflow
docs(transcription): add benchmark info
fix(transcription): use ms precision in other transcribers
chore(transcription): simplify most of the tests
chore(transcription): remove slashes when building path with join
chore(transcription): make fromPath method async
chore(transcription): assert path to model is a directory for CTranslate2 transcriber
chore(transcription): ctranslate2 assertion
chore(transcription): ctranslate2 assertion
chore(transcription): add preinstall script for Python dependencies
chore(transcription): add download and unzip utils functions
chore(transcription): add download and unzip utils functions
chore(transcription): download & unzip models fixtures
chore(transcription): zip
chore(transcription): raise download file test timeout
chore(transcription): simplify download file test
chore(transcription): add transcriptions test to CI
chore(transcription): raise test preconditions timeout
chore(transcription): run preinstall scripts before running ci
chore(transcription): create dedicated tmp folder for transcriber tests
chore(transcription): raise timeout some more
chore(transcription): raise timeout some more
chore(transcription): raise timeout some more
chore(transcription): raise timeout some more
chore(transcription): raise timeout some more
chore(transcription): raise timeout some more
chore(transcription): raise timeout some more
chore(transcription): raise timeout some more
chore(transcription): use short video for local model test
chore(transcription): raise timeout some more
chore(transcription): raise timeout some more
chore(transcription): raise timeout some more
chore(transcription): setup verbosity based on NODE_ENV value
* Split methods in multiple classes
* Add JSONLD tags in embed too
* Index embeds but use a canonical URL tag (targeting the watch page)
* Remote objects don't include a canonical URL tag anymore. Instead we
forbid indexation
* Canonical URLs now use the official short URL (/w/, /w/p, /a, /c
etc.)
Sorry for the very big commit that may lead to git log issues and merge
conflicts, but it's a major step forward:
* Server can be faster at startup because imports() are async and we can
easily lazy import big modules
* Angular doesn't seem to support ES import (with .js extension), so we
had to correctly organize peertube into a monorepo:
* Use yarn workspace feature
* Use typescript reference projects for dependencies
* Shared projects have been moved into "packages", each one is now a
node module (with a dedicated package.json/tsconfig.json)
* server/tools have been moved into apps/ and is now a dedicated app
bundled and published on NPM so users don't have to build peertube
cli tools manually
* server/tests have been moved into packages/ so we don't compile
them every time we want to run the server
* Use isolatedModule option:
* Had to move from const enum to const
(https://www.typescriptlang.org/docs/handbook/enums.html#objects-vs-enums)
* Had to explictely specify "type" imports when used in decorators
* Prefer tsx (that uses esbuild under the hood) instead of ts-node to
load typescript files (tests with mocha or scripts):
* To reduce test complexity as esbuild doesn't support decorator
metadata, we only test server files that do not import server
models
* We still build tests files into js files for a faster CI
* Remove unmaintained peertube CLI import script
* Removed some barrels to speed up execution (less imports)
* Add "currentTime" and "event" body params to view endpoint
* Merge watching and view endpoints
* Introduce WatchAction AP activity
* Add tables to store viewer information of local videos
* Add endpoints to fetch video views/viewers stats of local videos
* Refactor views/viewers handlers
* Support "views" and "viewers" counters for both VOD and live videos
* Add support for saving video files to object storage
* Add support for custom url generation on s3 stored files
Uses two config keys to support url generation that doesn't directly go
to (compatible s3). Can be used to generate urls to any cache server or
CDN.
* Upload files to s3 concurrently and delete originals afterwards
* Only publish after move to object storage is complete
* Use base url instead of url template
* Fix mistyped config field
* Add rudenmentary way to download before transcode
* Implement Chocobozzz suggestions
https://github.com/Chocobozzz/PeerTube/pull/4290#issuecomment-891670478
The remarks in question:
Try to use objectStorage prefix instead of s3 prefix for your function/variables/config names
Prefer to use a tree for the config: s3.streaming_playlists_bucket -> object_storage.streaming_playlists.bucket
Use uppercase for config: S3.STREAMING_PLAYLISTS_BUCKETINFO.bucket -> OBJECT_STORAGE.STREAMING_PLAYLISTS.BUCKET (maybe BUCKET_NAME instead of BUCKET)
I suggest to rename moveJobsRunning to pendingMovingJobs (or better, create a dedicated videoJobInfo table with a pendingMove & videoId columns so we could also use this table to track pending transcoding jobs)
https://github.com/Chocobozzz/PeerTube/pull/4290/files#diff-3e26d41ca4bda1de8e1747af70ca2af642abcc1e9e0bfb94239ff2165acfbde5R19 uses a string instead of an integer
I think we should store the origin object storage URL in fileUrl, without base_url injection. Instead, inject the base_url at "runtime" so admins can easily change this configuration without running a script to update DB URLs
* Import correct function
* Support multipart upload
* Remove import of node 15.0 module stream/promises
* Extend maximum upload job length
Using the same value as for redundancy downloading seems logical
* Use dynamic part size for really large uploads
Also adds very small part size for local testing
* Fix decreasePendingMove query
* Resolve various PR comments
* Move to object storage after optimize
* Make upload size configurable and increase default
* Prune webtorrent files that are stored in object storage
* Move files after transcoding jobs
* Fix federation
* Add video path manager
* Support move to external storage job in client
* Fix live object storage tests
Co-authored-by: Chocobozzz <me@florianbigard.com>