Raphaël Vinot
|
d60a4e56db
|
chg: Update indexes only when needed
|
2024-01-08 16:27:12 +01:00 |
Raphaël Vinot
|
89dbef8683
|
chg: Avoid to discard the index lock too soon
|
2023-11-21 16:50:15 +01:00 |
Raphaël Vinot
|
9031141b61
|
chg: Remove empty dirs when everything has been archived
|
2023-11-21 11:50:09 +01:00 |
Raphaël Vinot
|
6d61645d97
|
chg: remove index when all the captures are archived
|
2023-11-20 23:48:56 +01:00 |
Raphaël Vinot
|
efe2124753
|
fix: Quit BG indexer when shutdown is requested. Improve exceptions handling in archiver
|
2023-11-20 11:45:41 +01:00 |
Raphaël Vinot
|
11a3b6b2f9
|
chg: Improve indexes cleanup
|
2023-11-18 03:20:49 +01:00 |
Raphaël Vinot
|
ff27808320
|
fix: Path in index may be the full path (old format)
|
2023-11-18 02:47:43 +01:00 |
Raphaël Vinot
|
cd11df7ac4
|
fix: Update index files to remove archived (or simply gone) captures
|
2023-11-18 02:39:21 +01:00 |
Raphaël Vinot
|
9a9c4464ed
|
fix: Update index for recent captures on every archive
|
2023-11-17 15:47:12 +01:00 |
Raphaël Vinot
|
ce76218657
|
fix: build backlog pickles in reverse order
|
2023-11-16 23:58:07 +01:00 |
Raphaël Vinot
|
096d7c6fb5
|
chg: clear old UUIDs found when archiving
|
2023-11-16 23:22:04 +01:00 |
Raphaël Vinot
|
f209ef22f1
|
fix: skip root directory when scanning on s3fs
|
2023-11-16 22:55:44 +01:00 |
Raphaël Vinot
|
7791eff842
|
new: Store directories by day, refactor indexing
|
2023-11-16 16:54:21 +01:00 |
Raphaël Vinot
|
1c5c178d20
|
fix: s3fs support was broken.
|
2023-10-23 15:59:14 +02:00 |
Raphaël Vinot
|
fcaeda8f7f
|
new: Use S3FS in archiving script instead, remove python 3.12 uspport
Also remove standalone script for updating archived indexes.
|
2023-10-23 13:57:44 +02:00 |
Raphaël Vinot
|
db9ca0ea2b
|
fix: Properly match 0/1 as string
|
2023-10-20 15:55:50 +02:00 |
Raphaël Vinot
|
a2ba5c551d
|
fix: allow auto_report to be "True" without any setting.
|
2023-10-20 15:48:28 +02:00 |
Raphaël Vinot
|
0daff9ef77
|
chg: settings tweaks, logging
|
2023-10-11 15:02:11 +02:00 |
Raphaël Vinot
|
b4599492f3
|
fix: Avoid exception killing website if non-responsive 3rd party module.
|
2023-10-11 14:57:53 +02:00 |
Raphaël Vinot
|
5ca7c5cb1d
|
fix: Typo in last commit
|
2023-10-10 21:40:49 +02:00 |
Raphaël Vinot
|
3e4eb572a0
|
chg: auto-restart webservers after 1000 requests
|
2023-10-10 21:32:26 +02:00 |
Raphaël Vinot
|
2920f796fe
|
fix: Speedup generating pickles in BG
|
2023-10-09 10:26:37 +02:00 |
Raphaël Vinot
|
f2c9647a9e
|
new: Don't attempt to initialize indexes if they're on a s3fs mount
|
2023-10-04 11:06:02 +02:00 |
Raphaël Vinot
|
e3b85508f1
|
fix: Attempt to check if a directory is empty faster.
|
2023-10-02 16:16:22 +02:00 |
Raphaël Vinot
|
f250cba632
|
chg: yet another attempt to improve checking archived captures
|
2023-10-02 15:50:46 +02:00 |
Raphaël Vinot
|
1220f5926d
|
fix: reduce calls to stat on archived dirs, improve logging
|
2023-09-29 15:00:40 +02:00 |
Raphaël Vinot
|
3b5e45a1e7
|
fix: Properly stop when there is nothing to archive
|
2023-09-23 11:38:25 +02:00 |
Raphaël Vinot
|
d4b9ca13af
|
chg: Avoid setting the lock and quitting, cleanup
|
2023-09-19 14:26:35 +02:00 |
Raphaël Vinot
|
1fcd4cfa3f
|
fix: Make sure we only get valid dirs out of the month directory
|
2023-09-19 12:51:34 +02:00 |
Raphaël Vinot
|
50406a921c
|
fix: Attempt to avoid listing non-existing directories
|
2023-09-19 12:14:00 +02:00 |
Raphaël Vinot
|
29fb60e9b7
|
fix: Avois error when capture is in a weird state
|
2023-09-18 15:02:33 +02:00 |
Raphaël Vinot
|
1533e33ede
|
fix: remove locks from the archived directories
|
2023-09-18 10:38:05 +02:00 |
Raphaël Vinot
|
3fb9da8480
|
fix: Try to avoid exceptions and race conditions between archiving and build pickle
|
2023-09-18 01:31:28 +02:00 |
Raphaël Vinot
|
532b68dd07
|
fix: Avoid exception when attempting to move a capture
|
2023-09-18 00:33:59 +02:00 |
Raphaël Vinot
|
84175a944c
|
fix: Update index of the right directory
|
2023-09-15 15:38:23 +02:00 |
Raphaël Vinot
|
ae484cc14f
|
chg: Update index right after archiving
|
2023-09-15 13:25:37 +02:00 |
Raphaël Vinot
|
8c707b364f
|
fix: Missing f in f-string...
|
2023-09-15 11:42:33 +02:00 |
Raphaël Vinot
|
6c3f64e78a
|
chg: Avoid waiting when many captures need to be archived
|
2023-09-15 11:30:04 +02:00 |
Raphaël Vinot
|
416ca7224e
|
fix: Avoid race condition when re-enqueuing
|
2023-09-01 16:00:45 +02:00 |
Raphaël Vinot
|
2ec34e2b9b
|
chg: Improve archiver
|
2023-08-20 16:21:33 +02:00 |
Raphaël Vinot
|
d0d08b5882
|
fix: Avoid exception on empty dir in archiver.
|
2023-08-16 11:15:00 +02:00 |
Raphaël Vinot
|
447229ced3
|
chg: Compress HARs by default, update codebase accordingly
|
2023-08-11 13:16:59 +02:00 |
Raphaël Vinot
|
206e5957b5
|
new: Support for favicons fetching and display
Related https://github.com/Lookyloo/PlaywrightCapture/issues/45
|
2023-08-09 16:50:33 +02:00 |
Raphaël Vinot
|
e256a7fe6b
|
chg: Proper use of shutil.move, speedup initialization of CaptureCache
|
2023-08-08 12:41:21 +02:00 |
Raphaël Vinot
|
fa1d353834
|
chg: Avoid a few more disk access whenever possible.
|
2023-08-07 13:13:57 +02:00 |
Raphaël Vinot
|
212ba9a0d9
|
chg: Reduce disk usage
|
2023-08-06 21:34:20 +02:00 |
Raphaël Vinot
|
27b10f2c05
|
chg: Speedup indexes update
|
2023-08-05 20:47:08 +02:00 |
Raphaël Vinot
|
14df52a623
|
chg: Many improvments in archiver
|
2023-08-05 13:36:56 +02:00 |
Raphaël Vinot
|
e9dad5de61
|
chg: Attempt to reduce disk use
|
2023-08-04 15:03:58 +02:00 |
Raphaël Vinot
|
c203aa91b9
|
chg: Avoid directory listing as much as possible in archiver, allow shutdown
|
2023-08-04 14:02:45 +02:00 |
Raphaël Vinot
|
5fca6b13ea
|
chg: Show stacktrace when we cannot build the pickle
|
2023-08-04 13:15:39 +02:00 |
Raphaël Vinot
|
959b7ca96d
|
fix: use glob with path instead of rglob (faster))
|
2023-08-04 13:15:03 +02:00 |
Raphaël Vinot
|
4be8186cc6
|
chg: Improve readability of the background indexer
|
2023-07-30 16:59:41 +02:00 |
Raphaël Vinot
|
ea2ded9beb
|
fix: properly handle missing title in cache
|
2023-07-27 15:21:06 +02:00 |
Raphaël Vinot
|
ebfc2f00a5
|
fix: Exception when a formerly broken capture is re-processed and works
|
2023-07-27 14:56:39 +02:00 |
Raphaël Vinot
|
855485984f
|
fix: handle gracefully empty lists in hset, and duplicates UUIDs
|
2023-07-26 22:16:00 +02:00 |
Raphaël Vinot
|
fd9325bb0d
|
chg: Improve logging, add lock on indexer.
|
2023-07-26 12:37:12 +02:00 |
Raphaël Vinot
|
f60457a484
|
fix: Put the max captures counter at the right place...
|
2023-07-26 11:45:22 +02:00 |
Raphaël Vinot
|
fc5850e147
|
chg: Avoid building old pickles forever
|
2023-07-26 11:38:40 +02:00 |
Raphaël Vinot
|
a18f8f9675
|
chg: do not discard capture without HAR files
They are often just captures with an error file.
|
2023-07-25 20:29:30 +02:00 |
Raphaël Vinot
|
ef3432cbed
|
fix: Few more improvments on lockfile and broken captures.
|
2023-07-25 20:16:48 +02:00 |
Raphaël Vinot
|
484aec5ddd
|
fix: Properly handle lock file.
|
2023-07-25 19:29:53 +02:00 |
Raphaël Vinot
|
345a2f3f45
|
fix: Import method from the right file
|
2023-07-25 17:16:59 +02:00 |
Raphaël Vinot
|
3c50474ce4
|
fix: check if a tree.pickle.gz exists in the background indexer
|
2023-07-25 17:13:28 +02:00 |
Raphaël Vinot
|
0c7b3d9106
|
fix: indexer getting stuck when we had more than one at a time
|
2023-07-25 17:08:00 +02:00 |
Raphaël Vinot
|
177474e874
|
new: Basic support for HHHash
|
2023-07-21 15:48:20 +02:00 |
Raphaël Vinot
|
fec61d42ee
|
fix: Re-submit captures cleaned up too early in lacus
|
2023-06-27 11:33:56 +02:00 |
Raphaël Vinot
|
582b5956e9
|
new: Store capture settings, use TypedDict whenever possible.
|
2023-05-15 16:08:19 +02:00 |
Raphaël Vinot
|
6a9bcc0050
|
new: Automatic reporting via API
Related to #678
|
2023-04-28 17:19:53 +02:00 |
Raphaël Vinot
|
4ceae60db7
|
chg: Avoid stopping the captures before they're done
|
2023-04-09 13:58:34 +02:00 |
Raphaël Vinot
|
2ceda75eab
|
chg: Fairly big refactoring/cleanup to support LacusCore 1.4.0
|
2023-04-08 13:49:18 +02:00 |
Raphaël Vinot
|
9995371916
|
chg: Normalize logging on the config file settings
|
2023-04-05 16:23:46 +02:00 |
Raphaël Vinot
|
e410b7631e
|
fix: no decoding in archiver, catch exception when requesting hashes on broken capture
|
2023-03-16 14:47:24 +01:00 |
Raphaël Vinot
|
9497060028
|
fix: Cleanup prints, improve archiver.
|
2023-03-16 12:28:28 +01:00 |
Raphaël Vinot
|
96f1b2bd53
|
fix: Avoid exception if microsec is missing.
|
2023-03-12 19:25:16 +01:00 |
Raphaël Vinot
|
36d39f6076
|
new: Add PID in lock file, allows to check if the locking process is still there
|
2023-02-26 17:20:17 +01:00 |
Raphaël Vinot
|
f0615fc54f
|
chg: Bump deps, prepare for v1.17.0 release
|
2022-12-28 16:25:44 +01:00 |
Raphaël Vinot
|
b7302d09b5
|
fix: pass the browser to the brower key.
|
2022-12-02 14:26:08 +01:00 |
Raphaël Vinot
|
00370291ac
|
new: Logging config in file
|
2022-11-23 15:54:22 +01:00 |
Raphaël Vinot
|
3c1cbd6ece
|
new: Very basic page to submit an existing capture via a HAR file
|
2022-11-19 01:32:17 +01:00 |
Raphaël Vinot
|
9677c4d120
|
new: Support lacus unreachable by caching locally
+ initialize lacus globally for consistency.
|
2022-11-01 18:10:25 +01:00 |
Raphaël Vinot
|
a48c6e0bd6
|
new: SIGTERM handling (PyLacus and LacusCore)
|
2022-10-28 12:40:28 +02:00 |
Raphaël Vinot
|
e3075060cd
|
chg: Properly type the response from LacusCore/PyLacus
|
2022-10-26 14:25:23 +02:00 |
Raphaël Vinot
|
93c3ea8d39
|
fix: Catch exception when Lacus is unreachable
|
2022-09-29 15:42:05 +02:00 |
Raphaël Vinot
|
a27683f090
|
fix: Match compressed HAR as valid for rebuild
|
2022-09-28 11:23:44 +02:00 |
Raphaël Vinot
|
d500872943
|
fix: Wait for capture to be done before processing it
|
2022-09-27 15:44:17 +02:00 |
Raphaël Vinot
|
5cd8169735
|
chg: Avoid captures without url(s) or document
|
2022-09-27 11:33:36 +02:00 |
Raphaël Vinot
|
f886b8676b
|
fix: More exceptions catching for the the new caching method
|
2022-09-26 20:55:16 +02:00 |
Raphaël Vinot
|
edd8d786d3
|
chg: Do not try to build a tree if there are no HAR files
|
2022-09-26 15:59:04 +02:00 |
Raphaël Vinot
|
31261e84c2
|
fix: Better handling of half broken captures without HAR files
|
2022-09-26 14:58:30 +02:00 |
Raphaël Vinot
|
52b68fccdc
|
fix: make mypy happy, simplify code
|
2022-09-23 21:45:50 +02:00 |
Raphaël Vinot
|
2ec55be573
|
fix: Properly unset async capture when the queue is empty
|
2022-09-23 21:40:56 +02:00 |
Raphaël Vinot
|
354841b005
|
chg: Improve status reporting when a capture is ongoing
|
2022-09-23 21:33:38 +02:00 |
Raphaël Vinot
|
da33a7f5b3
|
chg: Avoid stacktrace when trying to generate broken capture
|
2022-09-23 14:46:19 +02:00 |
Raphaël Vinot
|
18b0b6e3cd
|
chg: Improve logging for archiver.
|
2022-09-23 14:32:42 +02:00 |
Raphaël Vinot
|
c7ca251e7a
|
chg: make to_capture key a ranked set again
|
2022-09-23 14:25:01 +02:00 |
Raphaël Vinot
|
19799c19af
|
chg: Re-enable start script
|
2022-09-23 13:13:09 +02:00 |
Raphaël Vinot
|
da502ee3d6
|
chg: Implement support for LacusCore *or* PyLacus
|
2022-09-23 13:13:09 +02:00 |
Raphaël Vinot
|
d38b612c37
|
chg: Bump lacuscore
|
2022-09-23 13:13:09 +02:00 |
Raphaël Vinot
|
9189888a0d
|
chg: Properly handle missing HAR
|
2022-09-23 13:13:09 +02:00 |