Commit Graph

283 Commits (aee2eb8daf08ce5e0923fdbd096802f291a8c83a)

Author SHA1 Message Date
Raphaël Vinot d60a4e56db chg: Update indexes only when needed 2024-01-08 16:27:12 +01:00
Raphaël Vinot 89dbef8683 chg: Avoid to discard the index lock too soon 2023-11-21 16:50:15 +01:00
Raphaël Vinot 9031141b61 chg: Remove empty dirs when everything has been archived 2023-11-21 11:50:09 +01:00
Raphaël Vinot 6d61645d97 chg: remove index when all the captures are archived 2023-11-20 23:48:56 +01:00
Raphaël Vinot efe2124753 fix: Quit BG indexer when shutdown is requested. Improve exceptions handling in archiver 2023-11-20 11:45:41 +01:00
Raphaël Vinot 11a3b6b2f9 chg: Improve indexes cleanup 2023-11-18 03:20:49 +01:00
Raphaël Vinot ff27808320 fix: Path in index may be the full path (old format) 2023-11-18 02:47:43 +01:00
Raphaël Vinot cd11df7ac4 fix: Update index files to remove archived (or simply gone) captures 2023-11-18 02:39:21 +01:00
Raphaël Vinot 9a9c4464ed fix: Update index for recent captures on every archive 2023-11-17 15:47:12 +01:00
Raphaël Vinot ce76218657 fix: build backlog pickles in reverse order 2023-11-16 23:58:07 +01:00
Raphaël Vinot 096d7c6fb5 chg: clear old UUIDs found when archiving 2023-11-16 23:22:04 +01:00
Raphaël Vinot f209ef22f1 fix: skip root directory when scanning on s3fs 2023-11-16 22:55:44 +01:00
Raphaël Vinot 7791eff842 new: Store directories by day, refactor indexing 2023-11-16 16:54:21 +01:00
Raphaël Vinot 1c5c178d20 fix: s3fs support was broken. 2023-10-23 15:59:14 +02:00
Raphaël Vinot fcaeda8f7f new: Use S3FS in archiving script instead, remove python 3.12 uspport
Also remove standalone script for updating archived indexes.
2023-10-23 13:57:44 +02:00
Raphaël Vinot db9ca0ea2b fix: Properly match 0/1 as string 2023-10-20 15:55:50 +02:00
Raphaël Vinot a2ba5c551d fix: allow auto_report to be "True" without any setting. 2023-10-20 15:48:28 +02:00
Raphaël Vinot 0daff9ef77 chg: settings tweaks, logging 2023-10-11 15:02:11 +02:00
Raphaël Vinot b4599492f3 fix: Avoid exception killing website if non-responsive 3rd party module. 2023-10-11 14:57:53 +02:00
Raphaël Vinot 5ca7c5cb1d fix: Typo in last commit 2023-10-10 21:40:49 +02:00
Raphaël Vinot 3e4eb572a0 chg: auto-restart webservers after 1000 requests 2023-10-10 21:32:26 +02:00
Raphaël Vinot 2920f796fe fix: Speedup generating pickles in BG 2023-10-09 10:26:37 +02:00
Raphaël Vinot f2c9647a9e new: Don't attempt to initialize indexes if they're on a s3fs mount 2023-10-04 11:06:02 +02:00
Raphaël Vinot e3b85508f1 fix: Attempt to check if a directory is empty faster. 2023-10-02 16:16:22 +02:00
Raphaël Vinot f250cba632 chg: yet another attempt to improve checking archived captures 2023-10-02 15:50:46 +02:00
Raphaël Vinot 1220f5926d fix: reduce calls to stat on archived dirs, improve logging 2023-09-29 15:00:40 +02:00
Raphaël Vinot 3b5e45a1e7 fix: Properly stop when there is nothing to archive 2023-09-23 11:38:25 +02:00
Raphaël Vinot d4b9ca13af chg: Avoid setting the lock and quitting, cleanup 2023-09-19 14:26:35 +02:00
Raphaël Vinot 1fcd4cfa3f fix: Make sure we only get valid dirs out of the month directory 2023-09-19 12:51:34 +02:00
Raphaël Vinot 50406a921c fix: Attempt to avoid listing non-existing directories 2023-09-19 12:14:00 +02:00
Raphaël Vinot 29fb60e9b7 fix: Avois error when capture is in a weird state 2023-09-18 15:02:33 +02:00
Raphaël Vinot 1533e33ede fix: remove locks from the archived directories 2023-09-18 10:38:05 +02:00
Raphaël Vinot 3fb9da8480 fix: Try to avoid exceptions and race conditions between archiving and build pickle 2023-09-18 01:31:28 +02:00
Raphaël Vinot 532b68dd07 fix: Avoid exception when attempting to move a capture 2023-09-18 00:33:59 +02:00
Raphaël Vinot 84175a944c fix: Update index of the right directory 2023-09-15 15:38:23 +02:00
Raphaël Vinot ae484cc14f chg: Update index right after archiving 2023-09-15 13:25:37 +02:00
Raphaël Vinot 8c707b364f fix: Missing f in f-string... 2023-09-15 11:42:33 +02:00
Raphaël Vinot 6c3f64e78a chg: Avoid waiting when many captures need to be archived 2023-09-15 11:30:04 +02:00
Raphaël Vinot 416ca7224e fix: Avoid race condition when re-enqueuing 2023-09-01 16:00:45 +02:00
Raphaël Vinot 2ec34e2b9b chg: Improve archiver 2023-08-20 16:21:33 +02:00
Raphaël Vinot d0d08b5882 fix: Avoid exception on empty dir in archiver. 2023-08-16 11:15:00 +02:00
Raphaël Vinot 447229ced3 chg: Compress HARs by default, update codebase accordingly 2023-08-11 13:16:59 +02:00
Raphaël Vinot 206e5957b5 new: Support for favicons fetching and display
Related https://github.com/Lookyloo/PlaywrightCapture/issues/45
2023-08-09 16:50:33 +02:00
Raphaël Vinot e256a7fe6b chg: Proper use of shutil.move, speedup initialization of CaptureCache 2023-08-08 12:41:21 +02:00
Raphaël Vinot fa1d353834 chg: Avoid a few more disk access whenever possible. 2023-08-07 13:13:57 +02:00
Raphaël Vinot 212ba9a0d9 chg: Reduce disk usage 2023-08-06 21:34:20 +02:00
Raphaël Vinot 27b10f2c05 chg: Speedup indexes update 2023-08-05 20:47:08 +02:00
Raphaël Vinot 14df52a623 chg: Many improvments in archiver 2023-08-05 13:36:56 +02:00
Raphaël Vinot e9dad5de61 chg: Attempt to reduce disk use 2023-08-04 15:03:58 +02:00
Raphaël Vinot c203aa91b9 chg: Avoid directory listing as much as possible in archiver, allow shutdown 2023-08-04 14:02:45 +02:00
Raphaël Vinot 5fca6b13ea chg: Show stacktrace when we cannot build the pickle 2023-08-04 13:15:39 +02:00
Raphaël Vinot 959b7ca96d fix: use glob with path instead of rglob (faster)) 2023-08-04 13:15:03 +02:00
Raphaël Vinot 4be8186cc6 chg: Improve readability of the background indexer 2023-07-30 16:59:41 +02:00
Raphaël Vinot ea2ded9beb fix: properly handle missing title in cache 2023-07-27 15:21:06 +02:00
Raphaël Vinot ebfc2f00a5 fix: Exception when a formerly broken capture is re-processed and works 2023-07-27 14:56:39 +02:00
Raphaël Vinot 855485984f fix: handle gracefully empty lists in hset, and duplicates UUIDs 2023-07-26 22:16:00 +02:00
Raphaël Vinot fd9325bb0d chg: Improve logging, add lock on indexer. 2023-07-26 12:37:12 +02:00
Raphaël Vinot f60457a484 fix: Put the max captures counter at the right place... 2023-07-26 11:45:22 +02:00
Raphaël Vinot fc5850e147 chg: Avoid building old pickles forever 2023-07-26 11:38:40 +02:00
Raphaël Vinot a18f8f9675 chg: do not discard capture without HAR files
They are often just captures with an error file.
2023-07-25 20:29:30 +02:00
Raphaël Vinot ef3432cbed fix: Few more improvments on lockfile and broken captures. 2023-07-25 20:16:48 +02:00
Raphaël Vinot 484aec5ddd fix: Properly handle lock file. 2023-07-25 19:29:53 +02:00
Raphaël Vinot 345a2f3f45 fix: Import method from the right file 2023-07-25 17:16:59 +02:00
Raphaël Vinot 3c50474ce4 fix: check if a tree.pickle.gz exists in the background indexer 2023-07-25 17:13:28 +02:00
Raphaël Vinot 0c7b3d9106 fix: indexer getting stuck when we had more than one at a time 2023-07-25 17:08:00 +02:00
Raphaël Vinot 177474e874 new: Basic support for HHHash 2023-07-21 15:48:20 +02:00
Raphaël Vinot fec61d42ee fix: Re-submit captures cleaned up too early in lacus 2023-06-27 11:33:56 +02:00
Raphaël Vinot 582b5956e9 new: Store capture settings, use TypedDict whenever possible. 2023-05-15 16:08:19 +02:00
Raphaël Vinot 6a9bcc0050 new: Automatic reporting via API
Related to #678
2023-04-28 17:19:53 +02:00
Raphaël Vinot 4ceae60db7 chg: Avoid stopping the captures before they're done 2023-04-09 13:58:34 +02:00
Raphaël Vinot 2ceda75eab chg: Fairly big refactoring/cleanup to support LacusCore 1.4.0 2023-04-08 13:49:18 +02:00
Raphaël Vinot 9995371916 chg: Normalize logging on the config file settings 2023-04-05 16:23:46 +02:00
Raphaël Vinot e410b7631e fix: no decoding in archiver, catch exception when requesting hashes on broken capture 2023-03-16 14:47:24 +01:00
Raphaël Vinot 9497060028 fix: Cleanup prints, improve archiver. 2023-03-16 12:28:28 +01:00
Raphaël Vinot 96f1b2bd53 fix: Avoid exception if microsec is missing. 2023-03-12 19:25:16 +01:00
Raphaël Vinot 36d39f6076 new: Add PID in lock file, allows to check if the locking process is still there 2023-02-26 17:20:17 +01:00
Raphaël Vinot f0615fc54f chg: Bump deps, prepare for v1.17.0 release 2022-12-28 16:25:44 +01:00
Raphaël Vinot b7302d09b5 fix: pass the browser to the brower key. 2022-12-02 14:26:08 +01:00
Raphaël Vinot 00370291ac new: Logging config in file 2022-11-23 15:54:22 +01:00
Raphaël Vinot 3c1cbd6ece new: Very basic page to submit an existing capture via a HAR file 2022-11-19 01:32:17 +01:00
Raphaël Vinot 9677c4d120 new: Support lacus unreachable by caching locally
+ initialize lacus globally for consistency.
2022-11-01 18:10:25 +01:00
Raphaël Vinot a48c6e0bd6 new: SIGTERM handling (PyLacus and LacusCore) 2022-10-28 12:40:28 +02:00
Raphaël Vinot e3075060cd chg: Properly type the response from LacusCore/PyLacus 2022-10-26 14:25:23 +02:00
Raphaël Vinot 93c3ea8d39 fix: Catch exception when Lacus is unreachable 2022-09-29 15:42:05 +02:00
Raphaël Vinot a27683f090 fix: Match compressed HAR as valid for rebuild 2022-09-28 11:23:44 +02:00
Raphaël Vinot d500872943 fix: Wait for capture to be done before processing it 2022-09-27 15:44:17 +02:00
Raphaël Vinot 5cd8169735 chg: Avoid captures without url(s) or document 2022-09-27 11:33:36 +02:00
Raphaël Vinot f886b8676b fix: More exceptions catching for the the new caching method 2022-09-26 20:55:16 +02:00
Raphaël Vinot edd8d786d3 chg: Do not try to build a tree if there are no HAR files 2022-09-26 15:59:04 +02:00
Raphaël Vinot 31261e84c2 fix: Better handling of half broken captures without HAR files 2022-09-26 14:58:30 +02:00
Raphaël Vinot 52b68fccdc fix: make mypy happy, simplify code 2022-09-23 21:45:50 +02:00
Raphaël Vinot 2ec55be573 fix: Properly unset async capture when the queue is empty 2022-09-23 21:40:56 +02:00
Raphaël Vinot 354841b005 chg: Improve status reporting when a capture is ongoing 2022-09-23 21:33:38 +02:00
Raphaël Vinot da33a7f5b3 chg: Avoid stacktrace when trying to generate broken capture 2022-09-23 14:46:19 +02:00
Raphaël Vinot 18b0b6e3cd chg: Improve logging for archiver. 2022-09-23 14:32:42 +02:00
Raphaël Vinot c7ca251e7a chg: make to_capture key a ranked set again 2022-09-23 14:25:01 +02:00
Raphaël Vinot 19799c19af chg: Re-enable start script 2022-09-23 13:13:09 +02:00
Raphaël Vinot da502ee3d6 chg: Implement support for LacusCore *or* PyLacus 2022-09-23 13:13:09 +02:00
Raphaël Vinot d38b612c37 chg: Bump lacuscore 2022-09-23 13:13:09 +02:00
Raphaël Vinot 9189888a0d chg: Properly handle missing HAR 2022-09-23 13:13:09 +02:00