Commit Graph

237 Commits (a26e80b093fa97ce643e451a8dd2bc1513f0e179)

Author SHA1 Message Date
Raphaël Vinot 27b10f2c05 chg: Speedup indexes update 2023-08-05 20:47:08 +02:00
Raphaël Vinot 14df52a623 chg: Many improvments in archiver 2023-08-05 13:36:56 +02:00
Raphaël Vinot e9dad5de61 chg: Attempt to reduce disk use 2023-08-04 15:03:58 +02:00
Raphaël Vinot c203aa91b9 chg: Avoid directory listing as much as possible in archiver, allow shutdown 2023-08-04 14:02:45 +02:00
Raphaël Vinot 5fca6b13ea chg: Show stacktrace when we cannot build the pickle 2023-08-04 13:15:39 +02:00
Raphaël Vinot 959b7ca96d fix: use glob with path instead of rglob (faster)) 2023-08-04 13:15:03 +02:00
Raphaël Vinot 4be8186cc6 chg: Improve readability of the background indexer 2023-07-30 16:59:41 +02:00
Raphaël Vinot ea2ded9beb fix: properly handle missing title in cache 2023-07-27 15:21:06 +02:00
Raphaël Vinot ebfc2f00a5 fix: Exception when a formerly broken capture is re-processed and works 2023-07-27 14:56:39 +02:00
Raphaël Vinot 855485984f fix: handle gracefully empty lists in hset, and duplicates UUIDs 2023-07-26 22:16:00 +02:00
Raphaël Vinot fd9325bb0d chg: Improve logging, add lock on indexer. 2023-07-26 12:37:12 +02:00
Raphaël Vinot f60457a484 fix: Put the max captures counter at the right place... 2023-07-26 11:45:22 +02:00
Raphaël Vinot fc5850e147 chg: Avoid building old pickles forever 2023-07-26 11:38:40 +02:00
Raphaël Vinot a18f8f9675 chg: do not discard capture without HAR files
They are often just captures with an error file.
2023-07-25 20:29:30 +02:00
Raphaël Vinot ef3432cbed fix: Few more improvments on lockfile and broken captures. 2023-07-25 20:16:48 +02:00
Raphaël Vinot 484aec5ddd fix: Properly handle lock file. 2023-07-25 19:29:53 +02:00
Raphaël Vinot 345a2f3f45 fix: Import method from the right file 2023-07-25 17:16:59 +02:00
Raphaël Vinot 3c50474ce4 fix: check if a tree.pickle.gz exists in the background indexer 2023-07-25 17:13:28 +02:00
Raphaël Vinot 0c7b3d9106 fix: indexer getting stuck when we had more than one at a time 2023-07-25 17:08:00 +02:00
Raphaël Vinot 177474e874 new: Basic support for HHHash 2023-07-21 15:48:20 +02:00
Raphaël Vinot fec61d42ee fix: Re-submit captures cleaned up too early in lacus 2023-06-27 11:33:56 +02:00
Raphaël Vinot 582b5956e9 new: Store capture settings, use TypedDict whenever possible. 2023-05-15 16:08:19 +02:00
Raphaël Vinot 6a9bcc0050 new: Automatic reporting via API
Related to #678
2023-04-28 17:19:53 +02:00
Raphaël Vinot 4ceae60db7 chg: Avoid stopping the captures before they're done 2023-04-09 13:58:34 +02:00
Raphaël Vinot 2ceda75eab chg: Fairly big refactoring/cleanup to support LacusCore 1.4.0 2023-04-08 13:49:18 +02:00
Raphaël Vinot 9995371916 chg: Normalize logging on the config file settings 2023-04-05 16:23:46 +02:00
Raphaël Vinot e410b7631e fix: no decoding in archiver, catch exception when requesting hashes on broken capture 2023-03-16 14:47:24 +01:00
Raphaël Vinot 9497060028 fix: Cleanup prints, improve archiver. 2023-03-16 12:28:28 +01:00
Raphaël Vinot 96f1b2bd53 fix: Avoid exception if microsec is missing. 2023-03-12 19:25:16 +01:00
Raphaël Vinot 36d39f6076 new: Add PID in lock file, allows to check if the locking process is still there 2023-02-26 17:20:17 +01:00
Raphaël Vinot f0615fc54f chg: Bump deps, prepare for v1.17.0 release 2022-12-28 16:25:44 +01:00
Raphaël Vinot b7302d09b5 fix: pass the browser to the brower key. 2022-12-02 14:26:08 +01:00
Raphaël Vinot 00370291ac new: Logging config in file 2022-11-23 15:54:22 +01:00
Raphaël Vinot 3c1cbd6ece new: Very basic page to submit an existing capture via a HAR file 2022-11-19 01:32:17 +01:00
Raphaël Vinot 9677c4d120 new: Support lacus unreachable by caching locally
+ initialize lacus globally for consistency.
2022-11-01 18:10:25 +01:00
Raphaël Vinot a48c6e0bd6 new: SIGTERM handling (PyLacus and LacusCore) 2022-10-28 12:40:28 +02:00
Raphaël Vinot e3075060cd chg: Properly type the response from LacusCore/PyLacus 2022-10-26 14:25:23 +02:00
Raphaël Vinot 93c3ea8d39 fix: Catch exception when Lacus is unreachable 2022-09-29 15:42:05 +02:00
Raphaël Vinot a27683f090 fix: Match compressed HAR as valid for rebuild 2022-09-28 11:23:44 +02:00
Raphaël Vinot d500872943 fix: Wait for capture to be done before processing it 2022-09-27 15:44:17 +02:00
Raphaël Vinot 5cd8169735 chg: Avoid captures without url(s) or document 2022-09-27 11:33:36 +02:00
Raphaël Vinot f886b8676b fix: More exceptions catching for the the new caching method 2022-09-26 20:55:16 +02:00
Raphaël Vinot edd8d786d3 chg: Do not try to build a tree if there are no HAR files 2022-09-26 15:59:04 +02:00
Raphaël Vinot 31261e84c2 fix: Better handling of half broken captures without HAR files 2022-09-26 14:58:30 +02:00
Raphaël Vinot 52b68fccdc fix: make mypy happy, simplify code 2022-09-23 21:45:50 +02:00
Raphaël Vinot 2ec55be573 fix: Properly unset async capture when the queue is empty 2022-09-23 21:40:56 +02:00
Raphaël Vinot 354841b005 chg: Improve status reporting when a capture is ongoing 2022-09-23 21:33:38 +02:00
Raphaël Vinot da33a7f5b3 chg: Avoid stacktrace when trying to generate broken capture 2022-09-23 14:46:19 +02:00
Raphaël Vinot 18b0b6e3cd chg: Improve logging for archiver. 2022-09-23 14:32:42 +02:00
Raphaël Vinot c7ca251e7a chg: make to_capture key a ranked set again 2022-09-23 14:25:01 +02:00
Raphaël Vinot 19799c19af chg: Re-enable start script 2022-09-23 13:13:09 +02:00
Raphaël Vinot da502ee3d6 chg: Implement support for LacusCore *or* PyLacus 2022-09-23 13:13:09 +02:00
Raphaël Vinot d38b612c37 chg: Bump lacuscore 2022-09-23 13:13:09 +02:00
Raphaël Vinot 9189888a0d chg: Properly handle missing HAR 2022-09-23 13:13:09 +02:00
Raphaël Vinot 623813167e chg: Add missing bits 2022-09-23 13:13:09 +02:00
Raphaël Vinot 318f554db3 chg: move to lacus, WiP 2022-09-23 13:13:09 +02:00
Raphaël Vinot 812c63b0f2 fix: error in UAs, typing 2022-09-05 18:58:45 +02:00
Raphaël Vinot c6464936fc chg: Bump to poetry v1.2, remove dep on setuptools 2022-08-31 16:33:13 +02:00
Raphaël Vinot f232eba662 chg: Improve UA rendering 2022-08-23 17:44:48 +02:00
Raphaël Vinot ebbe6e3ce9 new: Pick mobile devices on capture page 2022-08-22 17:34:00 +02:00
Raphaël Vinot 35789f549b fix: Exception on invalid capture 2022-08-20 23:33:32 +02:00
Raphaël Vinot d63ea473f5 new: Autoselect browser engine based on the UA 2022-08-19 14:26:22 +02:00
Raphaël Vinot 998ef12b06 new: Add support for playwright devices and browser name (API only) 2022-08-18 11:19:32 +02:00
Raphaël Vinot e89e9a20cb fix: Force BG processor to index all the recent captures 2022-08-12 01:08:28 +02:00
Raphaël Vinot be2e1ddc33 fix: properly handle listing configuration, clear None from queries before pasing to redis 2022-08-10 18:53:14 +02:00
Raphaël Vinot 49f335405e fix: Avoid exceptions on invalid requests 2022-08-05 11:28:44 +02:00
Raphaël Vinot 4280a4e11f fix: Support for document on public instances. 2022-08-04 21:28:47 +02:00
Raphaël Vinot 94bae7c5e3 chg: Avoid exception on broken captures 2022-08-04 21:11:58 +02:00
Raphaël Vinot 4f72d64735 new: Upload a file instead of submitting a URL. 2022-08-04 16:58:07 +02:00
Raphaël Vinot 3170038db7 new: dropdown to pass DoNotTrack HTTP header
Improvments on the capture page.
2022-08-03 12:07:45 +02:00
Raphaël Vinot bcfaaec941 chg: Improve logging in archiver 2022-07-27 14:33:28 +02:00
Raphaël Vinot c9381873c7 chg: cleanup the file download feature 2022-07-19 17:54:45 +02:00
Arhamyss 47ecc7a4fa new: Download file 2022-07-19 10:07:36 -04:00
Raphaël Vinot 5f329e4d7b new: compress HAR files in archived captures. 2022-07-12 18:44:33 +02:00
Raphaël Vinot 6ba019ec83 chg: Improve somewhat the useragents available for capturing
Fix #416
2022-06-09 18:58:17 +02:00
Raphaël Vinot 1817a3e13b chg: sunday cleanup 2022-05-23 00:15:52 +02:00
Raphaël Vinot d222ae04aa new: Keep capture even if we have a network error 2022-05-03 12:23:16 +02:00
Raphaël Vinot 463d1d2d1a new: autosubmit to FOX, bump deps 2022-05-02 13:04:55 +02:00
Raphaël Vinot ef1094a331 chg: Bump deps, fix cookie issue
Fix  #404
2022-04-29 00:44:03 +02:00
Raphaël Vinot 1679ccf90f chg: Improve capture, ignore ssl issues. 2022-04-26 13:49:24 +02:00
Raphaël Vinot 77fbf47e73 fix: capture cleanup 2022-04-26 10:25:11 +02:00
Raphaël Vinot 147bc65992 fix: Mypy, docker 2022-04-26 00:59:57 +02:00
Raphaël Vinot 41c7e87458 fix: docker, improve error catching 2022-04-26 00:33:50 +02:00
Raphaël Vinot 5af278f84d fix: issue in playwrightcapture module 2022-04-25 15:20:05 +02:00
Raphaël Vinot 4ad898a375 chg: Use packaged playwright capture module 2022-04-25 13:34:01 +02:00
Raphaël Vinot c93a6c307d chg: properly set cookies 2022-04-24 20:17:54 +03:00
Raphaël Vinot 680eb1b309 fix: better handling if capture fails. 2022-04-21 15:48:28 +03:00
Raphaël Vinot 8d159ffba0 new: Switch away from splash to use playwright 2022-04-21 14:55:07 +03:00
Raphaël Vinot 83fc0bd8f4 fix: shutil.move wants str (not Path) for python<3.9 2022-04-10 12:43:56 +02:00
Kimmo Linnavuo a80b6a31e4 Use shutil.move instead of path rename when moving discarded captures 2022-04-08 15:28:06 +03:00
Raphaël Vinot cf46dde1ed chg: Add basic pre-hook config 2022-03-31 11:30:53 +02:00
Raphaël Vinot ae9cb3e81c chg: Bump deps 2022-03-29 21:13:02 +02:00
Raphaël Vinot c9307b5159 chg: Improve start/stop for DBs 2021-12-02 14:39:32 +01:00
Raphaël Vinot a55fb5380a chg: Sync stop script with template 2021-11-26 14:16:22 -05:00
Raphaël Vinot d7c9892957 fix: Wait for DBs to be down before returning in stop script 2021-11-26 13:48:46 -05:00
Raphaël Vinot daca988f3f chg: better handling of broken indexes in archiver 2021-11-26 12:36:35 -05:00
Raphaël Vinot cef1088984 chg: programmatically shutdown DBs 2021-11-26 12:35:15 -05:00
Raphaël Vinot 58b50f2b24 new: Pass optional arbitrary HTTP headers to capture 2021-11-23 12:59:56 -08:00
Raphaël Vinot bfb1e6b181 fix: Use default_public for all capture, including if submitted via the API 2021-11-02 14:58:31 -07:00
Raphaël Vinot 1f998b457f chg: use template 2021-10-18 13:06:43 +02:00