Commit Graph

262 Commits (2baa2cd73a0cd25bccba29ca6125ee7d7077a459)

Author SHA1 Message Date
Raphaël Vinot 2920f796fe fix: Speedup generating pickles in BG 2023-10-09 10:26:37 +02:00
Raphaël Vinot f2c9647a9e new: Don't attempt to initialize indexes if they're on a s3fs mount 2023-10-04 11:06:02 +02:00
Raphaël Vinot e3b85508f1 fix: Attempt to check if a directory is empty faster. 2023-10-02 16:16:22 +02:00
Raphaël Vinot f250cba632 chg: yet another attempt to improve checking archived captures 2023-10-02 15:50:46 +02:00
Raphaël Vinot 1220f5926d fix: reduce calls to stat on archived dirs, improve logging 2023-09-29 15:00:40 +02:00
Raphaël Vinot 3b5e45a1e7 fix: Properly stop when there is nothing to archive 2023-09-23 11:38:25 +02:00
Raphaël Vinot d4b9ca13af chg: Avoid setting the lock and quitting, cleanup 2023-09-19 14:26:35 +02:00
Raphaël Vinot 1fcd4cfa3f fix: Make sure we only get valid dirs out of the month directory 2023-09-19 12:51:34 +02:00
Raphaël Vinot 50406a921c fix: Attempt to avoid listing non-existing directories 2023-09-19 12:14:00 +02:00
Raphaël Vinot 29fb60e9b7 fix: Avois error when capture is in a weird state 2023-09-18 15:02:33 +02:00
Raphaël Vinot 1533e33ede fix: remove locks from the archived directories 2023-09-18 10:38:05 +02:00
Raphaël Vinot 3fb9da8480 fix: Try to avoid exceptions and race conditions between archiving and build pickle 2023-09-18 01:31:28 +02:00
Raphaël Vinot 532b68dd07 fix: Avoid exception when attempting to move a capture 2023-09-18 00:33:59 +02:00
Raphaël Vinot 84175a944c fix: Update index of the right directory 2023-09-15 15:38:23 +02:00
Raphaël Vinot ae484cc14f chg: Update index right after archiving 2023-09-15 13:25:37 +02:00
Raphaël Vinot 8c707b364f fix: Missing f in f-string... 2023-09-15 11:42:33 +02:00
Raphaël Vinot 6c3f64e78a chg: Avoid waiting when many captures need to be archived 2023-09-15 11:30:04 +02:00
Raphaël Vinot 416ca7224e fix: Avoid race condition when re-enqueuing 2023-09-01 16:00:45 +02:00
Raphaël Vinot 2ec34e2b9b chg: Improve archiver 2023-08-20 16:21:33 +02:00
Raphaël Vinot d0d08b5882 fix: Avoid exception on empty dir in archiver. 2023-08-16 11:15:00 +02:00
Raphaël Vinot 447229ced3 chg: Compress HARs by default, update codebase accordingly 2023-08-11 13:16:59 +02:00
Raphaël Vinot 206e5957b5 new: Support for favicons fetching and display
Related https://github.com/Lookyloo/PlaywrightCapture/issues/45
2023-08-09 16:50:33 +02:00
Raphaël Vinot e256a7fe6b chg: Proper use of shutil.move, speedup initialization of CaptureCache 2023-08-08 12:41:21 +02:00
Raphaël Vinot fa1d353834 chg: Avoid a few more disk access whenever possible. 2023-08-07 13:13:57 +02:00
Raphaël Vinot 212ba9a0d9 chg: Reduce disk usage 2023-08-06 21:34:20 +02:00
Raphaël Vinot 27b10f2c05 chg: Speedup indexes update 2023-08-05 20:47:08 +02:00
Raphaël Vinot 14df52a623 chg: Many improvments in archiver 2023-08-05 13:36:56 +02:00
Raphaël Vinot e9dad5de61 chg: Attempt to reduce disk use 2023-08-04 15:03:58 +02:00
Raphaël Vinot c203aa91b9 chg: Avoid directory listing as much as possible in archiver, allow shutdown 2023-08-04 14:02:45 +02:00
Raphaël Vinot 5fca6b13ea chg: Show stacktrace when we cannot build the pickle 2023-08-04 13:15:39 +02:00
Raphaël Vinot 959b7ca96d fix: use glob with path instead of rglob (faster)) 2023-08-04 13:15:03 +02:00
Raphaël Vinot 4be8186cc6 chg: Improve readability of the background indexer 2023-07-30 16:59:41 +02:00
Raphaël Vinot ea2ded9beb fix: properly handle missing title in cache 2023-07-27 15:21:06 +02:00
Raphaël Vinot ebfc2f00a5 fix: Exception when a formerly broken capture is re-processed and works 2023-07-27 14:56:39 +02:00
Raphaël Vinot 855485984f fix: handle gracefully empty lists in hset, and duplicates UUIDs 2023-07-26 22:16:00 +02:00
Raphaël Vinot fd9325bb0d chg: Improve logging, add lock on indexer. 2023-07-26 12:37:12 +02:00
Raphaël Vinot f60457a484 fix: Put the max captures counter at the right place... 2023-07-26 11:45:22 +02:00
Raphaël Vinot fc5850e147 chg: Avoid building old pickles forever 2023-07-26 11:38:40 +02:00
Raphaël Vinot a18f8f9675 chg: do not discard capture without HAR files
They are often just captures with an error file.
2023-07-25 20:29:30 +02:00
Raphaël Vinot ef3432cbed fix: Few more improvments on lockfile and broken captures. 2023-07-25 20:16:48 +02:00
Raphaël Vinot 484aec5ddd fix: Properly handle lock file. 2023-07-25 19:29:53 +02:00
Raphaël Vinot 345a2f3f45 fix: Import method from the right file 2023-07-25 17:16:59 +02:00
Raphaël Vinot 3c50474ce4 fix: check if a tree.pickle.gz exists in the background indexer 2023-07-25 17:13:28 +02:00
Raphaël Vinot 0c7b3d9106 fix: indexer getting stuck when we had more than one at a time 2023-07-25 17:08:00 +02:00
Raphaël Vinot 177474e874 new: Basic support for HHHash 2023-07-21 15:48:20 +02:00
Raphaël Vinot fec61d42ee fix: Re-submit captures cleaned up too early in lacus 2023-06-27 11:33:56 +02:00
Raphaël Vinot 582b5956e9 new: Store capture settings, use TypedDict whenever possible. 2023-05-15 16:08:19 +02:00
Raphaël Vinot 6a9bcc0050 new: Automatic reporting via API
Related to #678
2023-04-28 17:19:53 +02:00
Raphaël Vinot 4ceae60db7 chg: Avoid stopping the captures before they're done 2023-04-09 13:58:34 +02:00
Raphaël Vinot 2ceda75eab chg: Fairly big refactoring/cleanup to support LacusCore 1.4.0 2023-04-08 13:49:18 +02:00
Raphaël Vinot 9995371916 chg: Normalize logging on the config file settings 2023-04-05 16:23:46 +02:00
Raphaël Vinot e410b7631e fix: no decoding in archiver, catch exception when requesting hashes on broken capture 2023-03-16 14:47:24 +01:00
Raphaël Vinot 9497060028 fix: Cleanup prints, improve archiver. 2023-03-16 12:28:28 +01:00
Raphaël Vinot 96f1b2bd53 fix: Avoid exception if microsec is missing. 2023-03-12 19:25:16 +01:00
Raphaël Vinot 36d39f6076 new: Add PID in lock file, allows to check if the locking process is still there 2023-02-26 17:20:17 +01:00
Raphaël Vinot f0615fc54f chg: Bump deps, prepare for v1.17.0 release 2022-12-28 16:25:44 +01:00
Raphaël Vinot b7302d09b5 fix: pass the browser to the brower key. 2022-12-02 14:26:08 +01:00
Raphaël Vinot 00370291ac new: Logging config in file 2022-11-23 15:54:22 +01:00
Raphaël Vinot 3c1cbd6ece new: Very basic page to submit an existing capture via a HAR file 2022-11-19 01:32:17 +01:00
Raphaël Vinot 9677c4d120 new: Support lacus unreachable by caching locally
+ initialize lacus globally for consistency.
2022-11-01 18:10:25 +01:00
Raphaël Vinot a48c6e0bd6 new: SIGTERM handling (PyLacus and LacusCore) 2022-10-28 12:40:28 +02:00
Raphaël Vinot e3075060cd chg: Properly type the response from LacusCore/PyLacus 2022-10-26 14:25:23 +02:00
Raphaël Vinot 93c3ea8d39 fix: Catch exception when Lacus is unreachable 2022-09-29 15:42:05 +02:00
Raphaël Vinot a27683f090 fix: Match compressed HAR as valid for rebuild 2022-09-28 11:23:44 +02:00
Raphaël Vinot d500872943 fix: Wait for capture to be done before processing it 2022-09-27 15:44:17 +02:00
Raphaël Vinot 5cd8169735 chg: Avoid captures without url(s) or document 2022-09-27 11:33:36 +02:00
Raphaël Vinot f886b8676b fix: More exceptions catching for the the new caching method 2022-09-26 20:55:16 +02:00
Raphaël Vinot edd8d786d3 chg: Do not try to build a tree if there are no HAR files 2022-09-26 15:59:04 +02:00
Raphaël Vinot 31261e84c2 fix: Better handling of half broken captures without HAR files 2022-09-26 14:58:30 +02:00
Raphaël Vinot 52b68fccdc fix: make mypy happy, simplify code 2022-09-23 21:45:50 +02:00
Raphaël Vinot 2ec55be573 fix: Properly unset async capture when the queue is empty 2022-09-23 21:40:56 +02:00
Raphaël Vinot 354841b005 chg: Improve status reporting when a capture is ongoing 2022-09-23 21:33:38 +02:00
Raphaël Vinot da33a7f5b3 chg: Avoid stacktrace when trying to generate broken capture 2022-09-23 14:46:19 +02:00
Raphaël Vinot 18b0b6e3cd chg: Improve logging for archiver. 2022-09-23 14:32:42 +02:00
Raphaël Vinot c7ca251e7a chg: make to_capture key a ranked set again 2022-09-23 14:25:01 +02:00
Raphaël Vinot 19799c19af chg: Re-enable start script 2022-09-23 13:13:09 +02:00
Raphaël Vinot da502ee3d6 chg: Implement support for LacusCore *or* PyLacus 2022-09-23 13:13:09 +02:00
Raphaël Vinot d38b612c37 chg: Bump lacuscore 2022-09-23 13:13:09 +02:00
Raphaël Vinot 9189888a0d chg: Properly handle missing HAR 2022-09-23 13:13:09 +02:00
Raphaël Vinot 623813167e chg: Add missing bits 2022-09-23 13:13:09 +02:00
Raphaël Vinot 318f554db3 chg: move to lacus, WiP 2022-09-23 13:13:09 +02:00
Raphaël Vinot 812c63b0f2 fix: error in UAs, typing 2022-09-05 18:58:45 +02:00
Raphaël Vinot c6464936fc chg: Bump to poetry v1.2, remove dep on setuptools 2022-08-31 16:33:13 +02:00
Raphaël Vinot f232eba662 chg: Improve UA rendering 2022-08-23 17:44:48 +02:00
Raphaël Vinot ebbe6e3ce9 new: Pick mobile devices on capture page 2022-08-22 17:34:00 +02:00
Raphaël Vinot 35789f549b fix: Exception on invalid capture 2022-08-20 23:33:32 +02:00
Raphaël Vinot d63ea473f5 new: Autoselect browser engine based on the UA 2022-08-19 14:26:22 +02:00
Raphaël Vinot 998ef12b06 new: Add support for playwright devices and browser name (API only) 2022-08-18 11:19:32 +02:00
Raphaël Vinot e89e9a20cb fix: Force BG processor to index all the recent captures 2022-08-12 01:08:28 +02:00
Raphaël Vinot be2e1ddc33 fix: properly handle listing configuration, clear None from queries before pasing to redis 2022-08-10 18:53:14 +02:00
Raphaël Vinot 49f335405e fix: Avoid exceptions on invalid requests 2022-08-05 11:28:44 +02:00
Raphaël Vinot 4280a4e11f fix: Support for document on public instances. 2022-08-04 21:28:47 +02:00
Raphaël Vinot 94bae7c5e3 chg: Avoid exception on broken captures 2022-08-04 21:11:58 +02:00
Raphaël Vinot 4f72d64735 new: Upload a file instead of submitting a URL. 2022-08-04 16:58:07 +02:00
Raphaël Vinot 3170038db7 new: dropdown to pass DoNotTrack HTTP header
Improvments on the capture page.
2022-08-03 12:07:45 +02:00
Raphaël Vinot bcfaaec941 chg: Improve logging in archiver 2022-07-27 14:33:28 +02:00
Raphaël Vinot c9381873c7 chg: cleanup the file download feature 2022-07-19 17:54:45 +02:00
Arhamyss 47ecc7a4fa new: Download file 2022-07-19 10:07:36 -04:00
Raphaël Vinot 5f329e4d7b new: compress HAR files in archived captures. 2022-07-12 18:44:33 +02:00
Raphaël Vinot 6ba019ec83 chg: Improve somewhat the useragents available for capturing
Fix #416
2022-06-09 18:58:17 +02:00