Raphaël Vinot
|
14df52a623
|
chg: Many improvments in archiver
|
2023-08-05 13:36:56 +02:00 |
Raphaël Vinot
|
e9dad5de61
|
chg: Attempt to reduce disk use
|
2023-08-04 15:03:58 +02:00 |
Raphaël Vinot
|
c203aa91b9
|
chg: Avoid directory listing as much as possible in archiver, allow shutdown
|
2023-08-04 14:02:45 +02:00 |
Raphaël Vinot
|
5fca6b13ea
|
chg: Show stacktrace when we cannot build the pickle
|
2023-08-04 13:15:39 +02:00 |
Raphaël Vinot
|
959b7ca96d
|
fix: use glob with path instead of rglob (faster))
|
2023-08-04 13:15:03 +02:00 |
Raphaël Vinot
|
4be8186cc6
|
chg: Improve readability of the background indexer
|
2023-07-30 16:59:41 +02:00 |
Raphaël Vinot
|
ea2ded9beb
|
fix: properly handle missing title in cache
|
2023-07-27 15:21:06 +02:00 |
Raphaël Vinot
|
ebfc2f00a5
|
fix: Exception when a formerly broken capture is re-processed and works
|
2023-07-27 14:56:39 +02:00 |
Raphaël Vinot
|
855485984f
|
fix: handle gracefully empty lists in hset, and duplicates UUIDs
|
2023-07-26 22:16:00 +02:00 |
Raphaël Vinot
|
fd9325bb0d
|
chg: Improve logging, add lock on indexer.
|
2023-07-26 12:37:12 +02:00 |
Raphaël Vinot
|
f60457a484
|
fix: Put the max captures counter at the right place...
|
2023-07-26 11:45:22 +02:00 |
Raphaël Vinot
|
fc5850e147
|
chg: Avoid building old pickles forever
|
2023-07-26 11:38:40 +02:00 |
Raphaël Vinot
|
a18f8f9675
|
chg: do not discard capture without HAR files
They are often just captures with an error file.
|
2023-07-25 20:29:30 +02:00 |
Raphaël Vinot
|
ef3432cbed
|
fix: Few more improvments on lockfile and broken captures.
|
2023-07-25 20:16:48 +02:00 |
Raphaël Vinot
|
484aec5ddd
|
fix: Properly handle lock file.
|
2023-07-25 19:29:53 +02:00 |
Raphaël Vinot
|
345a2f3f45
|
fix: Import method from the right file
|
2023-07-25 17:16:59 +02:00 |
Raphaël Vinot
|
3c50474ce4
|
fix: check if a tree.pickle.gz exists in the background indexer
|
2023-07-25 17:13:28 +02:00 |
Raphaël Vinot
|
0c7b3d9106
|
fix: indexer getting stuck when we had more than one at a time
|
2023-07-25 17:08:00 +02:00 |
Raphaël Vinot
|
177474e874
|
new: Basic support for HHHash
|
2023-07-21 15:48:20 +02:00 |
Raphaël Vinot
|
fec61d42ee
|
fix: Re-submit captures cleaned up too early in lacus
|
2023-06-27 11:33:56 +02:00 |
Raphaël Vinot
|
582b5956e9
|
new: Store capture settings, use TypedDict whenever possible.
|
2023-05-15 16:08:19 +02:00 |
Raphaël Vinot
|
6a9bcc0050
|
new: Automatic reporting via API
Related to #678
|
2023-04-28 17:19:53 +02:00 |
Raphaël Vinot
|
4ceae60db7
|
chg: Avoid stopping the captures before they're done
|
2023-04-09 13:58:34 +02:00 |
Raphaël Vinot
|
2ceda75eab
|
chg: Fairly big refactoring/cleanup to support LacusCore 1.4.0
|
2023-04-08 13:49:18 +02:00 |
Raphaël Vinot
|
9995371916
|
chg: Normalize logging on the config file settings
|
2023-04-05 16:23:46 +02:00 |
Raphaël Vinot
|
e410b7631e
|
fix: no decoding in archiver, catch exception when requesting hashes on broken capture
|
2023-03-16 14:47:24 +01:00 |
Raphaël Vinot
|
9497060028
|
fix: Cleanup prints, improve archiver.
|
2023-03-16 12:28:28 +01:00 |
Raphaël Vinot
|
96f1b2bd53
|
fix: Avoid exception if microsec is missing.
|
2023-03-12 19:25:16 +01:00 |
Raphaël Vinot
|
36d39f6076
|
new: Add PID in lock file, allows to check if the locking process is still there
|
2023-02-26 17:20:17 +01:00 |
Raphaël Vinot
|
f0615fc54f
|
chg: Bump deps, prepare for v1.17.0 release
|
2022-12-28 16:25:44 +01:00 |
Raphaël Vinot
|
b7302d09b5
|
fix: pass the browser to the brower key.
|
2022-12-02 14:26:08 +01:00 |
Raphaël Vinot
|
00370291ac
|
new: Logging config in file
|
2022-11-23 15:54:22 +01:00 |
Raphaël Vinot
|
3c1cbd6ece
|
new: Very basic page to submit an existing capture via a HAR file
|
2022-11-19 01:32:17 +01:00 |
Raphaël Vinot
|
9677c4d120
|
new: Support lacus unreachable by caching locally
+ initialize lacus globally for consistency.
|
2022-11-01 18:10:25 +01:00 |
Raphaël Vinot
|
a48c6e0bd6
|
new: SIGTERM handling (PyLacus and LacusCore)
|
2022-10-28 12:40:28 +02:00 |
Raphaël Vinot
|
e3075060cd
|
chg: Properly type the response from LacusCore/PyLacus
|
2022-10-26 14:25:23 +02:00 |
Raphaël Vinot
|
93c3ea8d39
|
fix: Catch exception when Lacus is unreachable
|
2022-09-29 15:42:05 +02:00 |
Raphaël Vinot
|
a27683f090
|
fix: Match compressed HAR as valid for rebuild
|
2022-09-28 11:23:44 +02:00 |
Raphaël Vinot
|
d500872943
|
fix: Wait for capture to be done before processing it
|
2022-09-27 15:44:17 +02:00 |
Raphaël Vinot
|
5cd8169735
|
chg: Avoid captures without url(s) or document
|
2022-09-27 11:33:36 +02:00 |
Raphaël Vinot
|
f886b8676b
|
fix: More exceptions catching for the the new caching method
|
2022-09-26 20:55:16 +02:00 |
Raphaël Vinot
|
edd8d786d3
|
chg: Do not try to build a tree if there are no HAR files
|
2022-09-26 15:59:04 +02:00 |
Raphaël Vinot
|
31261e84c2
|
fix: Better handling of half broken captures without HAR files
|
2022-09-26 14:58:30 +02:00 |
Raphaël Vinot
|
52b68fccdc
|
fix: make mypy happy, simplify code
|
2022-09-23 21:45:50 +02:00 |
Raphaël Vinot
|
2ec55be573
|
fix: Properly unset async capture when the queue is empty
|
2022-09-23 21:40:56 +02:00 |
Raphaël Vinot
|
354841b005
|
chg: Improve status reporting when a capture is ongoing
|
2022-09-23 21:33:38 +02:00 |
Raphaël Vinot
|
da33a7f5b3
|
chg: Avoid stacktrace when trying to generate broken capture
|
2022-09-23 14:46:19 +02:00 |
Raphaël Vinot
|
18b0b6e3cd
|
chg: Improve logging for archiver.
|
2022-09-23 14:32:42 +02:00 |
Raphaël Vinot
|
c7ca251e7a
|
chg: make to_capture key a ranked set again
|
2022-09-23 14:25:01 +02:00 |
Raphaël Vinot
|
19799c19af
|
chg: Re-enable start script
|
2022-09-23 13:13:09 +02:00 |
Raphaël Vinot
|
da502ee3d6
|
chg: Implement support for LacusCore *or* PyLacus
|
2022-09-23 13:13:09 +02:00 |
Raphaël Vinot
|
d38b612c37
|
chg: Bump lacuscore
|
2022-09-23 13:13:09 +02:00 |
Raphaël Vinot
|
9189888a0d
|
chg: Properly handle missing HAR
|
2022-09-23 13:13:09 +02:00 |
Raphaël Vinot
|
623813167e
|
chg: Add missing bits
|
2022-09-23 13:13:09 +02:00 |
Raphaël Vinot
|
318f554db3
|
chg: move to lacus, WiP
|
2022-09-23 13:13:09 +02:00 |
Raphaël Vinot
|
812c63b0f2
|
fix: error in UAs, typing
|
2022-09-05 18:58:45 +02:00 |
Raphaël Vinot
|
c6464936fc
|
chg: Bump to poetry v1.2, remove dep on setuptools
|
2022-08-31 16:33:13 +02:00 |
Raphaël Vinot
|
f232eba662
|
chg: Improve UA rendering
|
2022-08-23 17:44:48 +02:00 |
Raphaël Vinot
|
ebbe6e3ce9
|
new: Pick mobile devices on capture page
|
2022-08-22 17:34:00 +02:00 |
Raphaël Vinot
|
35789f549b
|
fix: Exception on invalid capture
|
2022-08-20 23:33:32 +02:00 |
Raphaël Vinot
|
d63ea473f5
|
new: Autoselect browser engine based on the UA
|
2022-08-19 14:26:22 +02:00 |
Raphaël Vinot
|
998ef12b06
|
new: Add support for playwright devices and browser name (API only)
|
2022-08-18 11:19:32 +02:00 |
Raphaël Vinot
|
e89e9a20cb
|
fix: Force BG processor to index all the recent captures
|
2022-08-12 01:08:28 +02:00 |
Raphaël Vinot
|
be2e1ddc33
|
fix: properly handle listing configuration, clear None from queries before pasing to redis
|
2022-08-10 18:53:14 +02:00 |
Raphaël Vinot
|
49f335405e
|
fix: Avoid exceptions on invalid requests
|
2022-08-05 11:28:44 +02:00 |
Raphaël Vinot
|
4280a4e11f
|
fix: Support for document on public instances.
|
2022-08-04 21:28:47 +02:00 |
Raphaël Vinot
|
94bae7c5e3
|
chg: Avoid exception on broken captures
|
2022-08-04 21:11:58 +02:00 |
Raphaël Vinot
|
4f72d64735
|
new: Upload a file instead of submitting a URL.
|
2022-08-04 16:58:07 +02:00 |
Raphaël Vinot
|
3170038db7
|
new: dropdown to pass DoNotTrack HTTP header
Improvments on the capture page.
|
2022-08-03 12:07:45 +02:00 |
Raphaël Vinot
|
bcfaaec941
|
chg: Improve logging in archiver
|
2022-07-27 14:33:28 +02:00 |
Raphaël Vinot
|
c9381873c7
|
chg: cleanup the file download feature
|
2022-07-19 17:54:45 +02:00 |
Arhamyss
|
47ecc7a4fa
|
new: Download file
|
2022-07-19 10:07:36 -04:00 |
Raphaël Vinot
|
5f329e4d7b
|
new: compress HAR files in archived captures.
|
2022-07-12 18:44:33 +02:00 |
Raphaël Vinot
|
6ba019ec83
|
chg: Improve somewhat the useragents available for capturing
Fix #416
|
2022-06-09 18:58:17 +02:00 |
Raphaël Vinot
|
1817a3e13b
|
chg: sunday cleanup
|
2022-05-23 00:15:52 +02:00 |
Raphaël Vinot
|
d222ae04aa
|
new: Keep capture even if we have a network error
|
2022-05-03 12:23:16 +02:00 |
Raphaël Vinot
|
463d1d2d1a
|
new: autosubmit to FOX, bump deps
|
2022-05-02 13:04:55 +02:00 |
Raphaël Vinot
|
ef1094a331
|
chg: Bump deps, fix cookie issue
Fix #404
|
2022-04-29 00:44:03 +02:00 |
Raphaël Vinot
|
1679ccf90f
|
chg: Improve capture, ignore ssl issues.
|
2022-04-26 13:49:24 +02:00 |
Raphaël Vinot
|
77fbf47e73
|
fix: capture cleanup
|
2022-04-26 10:25:11 +02:00 |
Raphaël Vinot
|
147bc65992
|
fix: Mypy, docker
|
2022-04-26 00:59:57 +02:00 |
Raphaël Vinot
|
41c7e87458
|
fix: docker, improve error catching
|
2022-04-26 00:33:50 +02:00 |
Raphaël Vinot
|
5af278f84d
|
fix: issue in playwrightcapture module
|
2022-04-25 15:20:05 +02:00 |
Raphaël Vinot
|
4ad898a375
|
chg: Use packaged playwright capture module
|
2022-04-25 13:34:01 +02:00 |
Raphaël Vinot
|
c93a6c307d
|
chg: properly set cookies
|
2022-04-24 20:17:54 +03:00 |
Raphaël Vinot
|
680eb1b309
|
fix: better handling if capture fails.
|
2022-04-21 15:48:28 +03:00 |
Raphaël Vinot
|
8d159ffba0
|
new: Switch away from splash to use playwright
|
2022-04-21 14:55:07 +03:00 |
Raphaël Vinot
|
83fc0bd8f4
|
fix: shutil.move wants str (not Path) for python<3.9
|
2022-04-10 12:43:56 +02:00 |
Kimmo Linnavuo
|
a80b6a31e4
|
Use shutil.move instead of path rename when moving discarded captures
|
2022-04-08 15:28:06 +03:00 |
Raphaël Vinot
|
cf46dde1ed
|
chg: Add basic pre-hook config
|
2022-03-31 11:30:53 +02:00 |
Raphaël Vinot
|
ae9cb3e81c
|
chg: Bump deps
|
2022-03-29 21:13:02 +02:00 |
Raphaël Vinot
|
c9307b5159
|
chg: Improve start/stop for DBs
|
2021-12-02 14:39:32 +01:00 |
Raphaël Vinot
|
a55fb5380a
|
chg: Sync stop script with template
|
2021-11-26 14:16:22 -05:00 |
Raphaël Vinot
|
d7c9892957
|
fix: Wait for DBs to be down before returning in stop script
|
2021-11-26 13:48:46 -05:00 |
Raphaël Vinot
|
daca988f3f
|
chg: better handling of broken indexes in archiver
|
2021-11-26 12:36:35 -05:00 |
Raphaël Vinot
|
cef1088984
|
chg: programmatically shutdown DBs
|
2021-11-26 12:35:15 -05:00 |
Raphaël Vinot
|
58b50f2b24
|
new: Pass optional arbitrary HTTP headers to capture
|
2021-11-23 12:59:56 -08:00 |
Raphaël Vinot
|
bfb1e6b181
|
fix: Use default_public for all capture, including if submitted via the API
|
2021-11-02 14:58:31 -07:00 |
Raphaël Vinot
|
1f998b457f
|
chg: use template
|
2021-10-18 13:06:43 +02:00 |
Raphaël Vinot
|
6e9e3990c4
|
fix: Indexes not updated on tree rebuild, better handling of tree cache
|
2021-09-24 16:16:41 +02:00 |
Raphaël Vinot
|
48fc807e7d
|
new: Add monitoring for pickle cache status
|
2021-09-24 12:02:28 +02:00 |
Raphaël Vinot
|
32ee474be2
|
chg: Improve tree creation and cache
|
2021-09-22 17:09:04 +02:00 |
Raphaël Vinot
|
d1f673f3a7
|
chg: Cleanup passing listing key to and from bool in redis
|
2021-09-10 14:20:58 +02:00 |
Raphaël Vinot
|
9c7929569e
|
fix: The captures are visible on the index by default.
|
2021-09-08 20:43:56 +02:00 |
Raphaël Vinot
|
48b632aa1e
|
fix: Incorrect matching for listing key in capture (always false)
|
2021-09-08 10:53:31 +02:00 |
Raphaël Vinot
|
902c8f81b6
|
chg: Improve error message if the capture fails
Fix #257
|
2021-09-07 18:16:01 +02:00 |
Raphaël Vinot
|
dfbe40a52e
|
chg: reorder imports
|
2021-09-07 16:00:07 +02:00 |
Raphaël Vinot
|
c09adec333
|
chg: Improve logging.
|
2021-09-01 14:08:25 +02:00 |
Raphaël Vinot
|
797de9ddb3
|
fix: remove datefmt from logging.basicConfig, it was a bad idea.
|
2021-09-01 10:40:59 +02:00 |
Raphaël Vinot
|
2e5a5f3aff
|
fix: unlink indexes pointing to unknown directories
|
2021-08-30 14:45:44 +02:00 |
Raphaël Vinot
|
e56c70d1a1
|
chg: out of safety, do not remove a capture dir.
|
2021-08-30 12:54:17 +02:00 |
Raphaël Vinot
|
117500b777
|
chg: Make archiver an index generator
|
2021-08-30 12:48:13 +02:00 |
Raphaël Vinot
|
324736f62c
|
fix: Use proper exception on redis start
|
2021-08-27 18:08:34 +02:00 |
Raphaël Vinot
|
ae76cb77be
|
fix: Uncomment website start
|
2021-08-27 17:49:27 +02:00 |
Raphaël Vinot
|
8a51383d7a
|
chg: Move the process managment methods to the proper class
|
2021-08-27 17:28:26 +02:00 |
Raphaël Vinot
|
85e43fc677
|
chg: Make the website start a normal start script
|
2021-08-27 16:45:16 +02:00 |
Raphaël Vinot
|
d41b7735dd
|
chg: Improve storage, support both modes.
|
2021-08-26 15:49:19 +02:00 |
Raphaël Vinot
|
407e78ae7f
|
chg: More cleanup, support clean shutdown of multiple async captures
|
2021-08-25 16:40:51 +02:00 |
Raphaël Vinot
|
bf700e7a7b
|
chg: Major refactoring, move capture code to external script.
|
2021-08-25 13:36:48 +02:00 |
Raphaël Vinot
|
c732e38395
|
chg: Add logging in BG processing
|
2021-08-24 18:44:00 +02:00 |
Raphaël Vinot
|
81390d5ea0
|
chg: cleanup in the mail lookyloo class
|
2021-08-24 18:32:54 +02:00 |
Raphaël Vinot
|
8433cbcc1b
|
chg: Cleanup archiver, initialize index captures in start
|
2021-08-24 17:10:14 +02:00 |
Raphaël Vinot
|
ece30a33eb
|
chg: Fix typo in archiver
|
2021-08-23 16:56:17 +02:00 |
Raphaël Vinot
|
fb1685cedc
|
add: reset recent captures in archiving process
|
2021-08-23 16:19:50 +02:00 |
Raphaël Vinot
|
8f28335010
|
fix: properly match cut time
|
2021-08-23 15:51:06 +02:00 |
Raphaël Vinot
|
2c1971311a
|
chg: Make the cut-off date for archiving the 1st of the month
|
2021-08-23 15:36:59 +02:00 |
Raphaël Vinot
|
5c9b88a3ca
|
fix: Make sure all the archived UUIDs are removed
|
2021-08-23 15:29:21 +02:00 |
Raphaël Vinot
|
67e6571145
|
chg: Force init the archived indexes
|
2021-08-23 15:14:08 +02:00 |
Raphaël Vinot
|
53ceb9c329
|
chg: Cleanup when dir is moved, digit months on 2 values
|
2021-08-23 14:53:19 +02:00 |
Raphaël Vinot
|
d359bc7521
|
chg: Better use of cache, sanity checks
|
2021-08-23 12:17:44 +02:00 |
Raphaël Vinot
|
58b837cb6c
|
new: Archiver, refactoring.
|
2021-08-20 17:46:22 +02:00 |
Raphaël Vinot
|
6be9b69d95
|
chg: Use connection pool whenever possible
|
2021-08-18 18:01:04 +02:00 |
Raphaël Vinot
|
59f2a510c0
|
fix: properly catch broken capture, bump deps
|
2021-07-14 11:34:10 +02:00 |
Raphaël Vinot
|
1117ab6371
|
chg: add stats, avoid building big trees twice, bump deps
|
2021-05-26 18:25:06 -07:00 |
Raphaël Vinot
|
335ab662cf
|
new: Auto trigger modules in the bg process
|
2021-05-19 15:12:35 -07:00 |
Raphaël Vinot
|
f865ec912a
|
fix: Move set/unset running to abstract
Avoid issues when a script fails unexpectedly.
|
2021-04-09 14:33:42 +02:00 |
Raphaël Vinot
|
7707d638cf
|
new: Use async capture for the UI.
Add a method to make sure splash is up before trying to capture.
|
2021-04-08 19:15:53 +02:00 |
Raphaël Vinot
|
4847fdb670
|
fix: Windows path in update
|
2021-04-06 17:43:45 +02:00 |
Raphaël Vinot
|
c38ec90bb1
|
fix: Make update script windows compatible
|
2021-04-06 17:27:59 +02:00 |
Raphaël Vinot
|
fa6b4701c0
|
chg: update the cache at the right place.
|
2021-03-20 21:54:46 +01:00 |
Raphaël Vinot
|
13d34421dc
|
chg: Improve BG indexer
|
2021-03-20 01:13:37 +01:00 |
Raphaël Vinot
|
648d4d5b5b
|
chg: Add background ingester to the start script
|
2021-03-18 01:00:27 +01:00 |
Raphaël Vinot
|
b3541e0e78
|
new: background indexer
|
2021-03-12 16:53:00 +01:00 |
Raphaël Vinot
|
6059cb5219
|
chg: Remove useless code
|
2021-03-12 16:49:04 +01:00 |
Raphaël Vinot
|
82d9cc7b2f
|
fix: Properly rebuild indexed captures
|
2021-03-07 13:25:27 +01:00 |
Raphaël Vinot
|
3ec8015e14
|
chg: Better messages if website does not start
|
2021-02-21 23:40:47 +01:00 |
Raphaël Vinot
|
6149df06eb
|
chg: Make the cache entries a dataclass
Fix #99
|
2021-01-14 17:12:23 +01:00 |
Raphaël Vinot
|
354f269218
|
new: Integrate categorization in indexing
|
2020-11-09 16:02:54 +01:00 |
Raphaël Vinot
|
ea052c7c12
|
fix: Rename scrape -> capture in async
|
2020-11-05 14:14:33 +01:00 |
Raphaël Vinot
|
8b1e3585ea
|
chg: Improve initial caching.
|
2020-10-29 23:25:20 +01:00 |