Commit Graph

263 Commits (33c6da7256c3d43f65ea772f625c93c3118567f0)

Author SHA1 Message Date
Raphaël Vinot 6ba019ec83 chg: Improve somewhat the useragents available for capturing
Fix #416
2022-06-09 18:58:17 +02:00
Raphaël Vinot 1817a3e13b chg: sunday cleanup 2022-05-23 00:15:52 +02:00
Raphaël Vinot d222ae04aa new: Keep capture even if we have a network error 2022-05-03 12:23:16 +02:00
Raphaël Vinot 463d1d2d1a new: autosubmit to FOX, bump deps 2022-05-02 13:04:55 +02:00
Raphaël Vinot ef1094a331 chg: Bump deps, fix cookie issue
Fix  #404
2022-04-29 00:44:03 +02:00
Raphaël Vinot 1679ccf90f chg: Improve capture, ignore ssl issues. 2022-04-26 13:49:24 +02:00
Raphaël Vinot 77fbf47e73 fix: capture cleanup 2022-04-26 10:25:11 +02:00
Raphaël Vinot 147bc65992 fix: Mypy, docker 2022-04-26 00:59:57 +02:00
Raphaël Vinot 41c7e87458 fix: docker, improve error catching 2022-04-26 00:33:50 +02:00
Raphaël Vinot 5af278f84d fix: issue in playwrightcapture module 2022-04-25 15:20:05 +02:00
Raphaël Vinot 4ad898a375 chg: Use packaged playwright capture module 2022-04-25 13:34:01 +02:00
Raphaël Vinot c93a6c307d chg: properly set cookies 2022-04-24 20:17:54 +03:00
Raphaël Vinot 680eb1b309 fix: better handling if capture fails. 2022-04-21 15:48:28 +03:00
Raphaël Vinot 8d159ffba0 new: Switch away from splash to use playwright 2022-04-21 14:55:07 +03:00
Raphaël Vinot 83fc0bd8f4 fix: shutil.move wants str (not Path) for python<3.9 2022-04-10 12:43:56 +02:00
Kimmo Linnavuo a80b6a31e4 Use shutil.move instead of path rename when moving discarded captures 2022-04-08 15:28:06 +03:00
Raphaël Vinot cf46dde1ed chg: Add basic pre-hook config 2022-03-31 11:30:53 +02:00
Raphaël Vinot ae9cb3e81c chg: Bump deps 2022-03-29 21:13:02 +02:00
Raphaël Vinot c9307b5159 chg: Improve start/stop for DBs 2021-12-02 14:39:32 +01:00
Raphaël Vinot a55fb5380a chg: Sync stop script with template 2021-11-26 14:16:22 -05:00
Raphaël Vinot d7c9892957 fix: Wait for DBs to be down before returning in stop script 2021-11-26 13:48:46 -05:00
Raphaël Vinot daca988f3f chg: better handling of broken indexes in archiver 2021-11-26 12:36:35 -05:00
Raphaël Vinot cef1088984 chg: programmatically shutdown DBs 2021-11-26 12:35:15 -05:00
Raphaël Vinot 58b50f2b24 new: Pass optional arbitrary HTTP headers to capture 2021-11-23 12:59:56 -08:00
Raphaël Vinot bfb1e6b181 fix: Use default_public for all capture, including if submitted via the API 2021-11-02 14:58:31 -07:00
Raphaël Vinot 1f998b457f chg: use template 2021-10-18 13:06:43 +02:00
Raphaël Vinot 6e9e3990c4 fix: Indexes not updated on tree rebuild, better handling of tree cache 2021-09-24 16:16:41 +02:00
Raphaël Vinot 48fc807e7d new: Add monitoring for pickle cache status 2021-09-24 12:02:28 +02:00
Raphaël Vinot 32ee474be2 chg: Improve tree creation and cache 2021-09-22 17:09:04 +02:00
Raphaël Vinot d1f673f3a7 chg: Cleanup passing listing key to and from bool in redis 2021-09-10 14:20:58 +02:00
Raphaël Vinot 9c7929569e fix: The captures are visible on the index by default. 2021-09-08 20:43:56 +02:00
Raphaël Vinot 48b632aa1e fix: Incorrect matching for listing key in capture (always false) 2021-09-08 10:53:31 +02:00
Raphaël Vinot 902c8f81b6 chg: Improve error message if the capture fails
Fix #257
2021-09-07 18:16:01 +02:00
Raphaël Vinot dfbe40a52e chg: reorder imports 2021-09-07 16:00:07 +02:00
Raphaël Vinot c09adec333 chg: Improve logging. 2021-09-01 14:08:25 +02:00
Raphaël Vinot 797de9ddb3 fix: remove datefmt from logging.basicConfig, it was a bad idea. 2021-09-01 10:40:59 +02:00
Raphaël Vinot 2e5a5f3aff fix: unlink indexes pointing to unknown directories 2021-08-30 14:45:44 +02:00
Raphaël Vinot e56c70d1a1 chg: out of safety, do not remove a capture dir. 2021-08-30 12:54:17 +02:00
Raphaël Vinot 117500b777 chg: Make archiver an index generator 2021-08-30 12:48:13 +02:00
Raphaël Vinot 324736f62c fix: Use proper exception on redis start 2021-08-27 18:08:34 +02:00
Raphaël Vinot ae76cb77be fix: Uncomment website start 2021-08-27 17:49:27 +02:00
Raphaël Vinot 8a51383d7a chg: Move the process managment methods to the proper class 2021-08-27 17:28:26 +02:00
Raphaël Vinot 85e43fc677 chg: Make the website start a normal start script 2021-08-27 16:45:16 +02:00
Raphaël Vinot d41b7735dd chg: Improve storage, support both modes. 2021-08-26 15:49:19 +02:00
Raphaël Vinot 407e78ae7f chg: More cleanup, support clean shutdown of multiple async captures 2021-08-25 16:40:51 +02:00
Raphaël Vinot bf700e7a7b chg: Major refactoring, move capture code to external script. 2021-08-25 13:36:48 +02:00
Raphaël Vinot c732e38395 chg: Add logging in BG processing 2021-08-24 18:44:00 +02:00
Raphaël Vinot 81390d5ea0 chg: cleanup in the mail lookyloo class 2021-08-24 18:32:54 +02:00
Raphaël Vinot 8433cbcc1b chg: Cleanup archiver, initialize index captures in start 2021-08-24 17:10:14 +02:00
Raphaël Vinot ece30a33eb chg: Fix typo in archiver 2021-08-23 16:56:17 +02:00
Raphaël Vinot fb1685cedc add: reset recent captures in archiving process 2021-08-23 16:19:50 +02:00
Raphaël Vinot 8f28335010 fix: properly match cut time 2021-08-23 15:51:06 +02:00
Raphaël Vinot 2c1971311a chg: Make the cut-off date for archiving the 1st of the month 2021-08-23 15:36:59 +02:00
Raphaël Vinot 5c9b88a3ca fix: Make sure all the archived UUIDs are removed 2021-08-23 15:29:21 +02:00
Raphaël Vinot 67e6571145 chg: Force init the archived indexes 2021-08-23 15:14:08 +02:00
Raphaël Vinot 53ceb9c329 chg: Cleanup when dir is moved, digit months on 2 values 2021-08-23 14:53:19 +02:00
Raphaël Vinot d359bc7521 chg: Better use of cache, sanity checks 2021-08-23 12:17:44 +02:00
Raphaël Vinot 58b837cb6c new: Archiver, refactoring. 2021-08-20 17:46:22 +02:00
Raphaël Vinot 6be9b69d95 chg: Use connection pool whenever possible 2021-08-18 18:01:04 +02:00
Raphaël Vinot 59f2a510c0 fix: properly catch broken capture, bump deps 2021-07-14 11:34:10 +02:00
Raphaël Vinot 1117ab6371 chg: add stats, avoid building big trees twice, bump deps 2021-05-26 18:25:06 -07:00
Raphaël Vinot 335ab662cf new: Auto trigger modules in the bg process 2021-05-19 15:12:35 -07:00
Raphaël Vinot f865ec912a fix: Move set/unset running to abstract
Avoid issues when a script fails unexpectedly.
2021-04-09 14:33:42 +02:00
Raphaël Vinot 7707d638cf new: Use async capture for the UI.
Add a method to make sure splash is up before trying to capture.
2021-04-08 19:15:53 +02:00
Raphaël Vinot 4847fdb670 fix: Windows path in update 2021-04-06 17:43:45 +02:00
Raphaël Vinot c38ec90bb1 fix: Make update script windows compatible 2021-04-06 17:27:59 +02:00
Raphaël Vinot fa6b4701c0 chg: update the cache at the right place. 2021-03-20 21:54:46 +01:00
Raphaël Vinot 13d34421dc chg: Improve BG indexer 2021-03-20 01:13:37 +01:00
Raphaël Vinot 648d4d5b5b chg: Add background ingester to the start script 2021-03-18 01:00:27 +01:00
Raphaël Vinot b3541e0e78 new: background indexer 2021-03-12 16:53:00 +01:00
Raphaël Vinot 6059cb5219 chg: Remove useless code 2021-03-12 16:49:04 +01:00
Raphaël Vinot 82d9cc7b2f fix: Properly rebuild indexed captures 2021-03-07 13:25:27 +01:00
Raphaël Vinot 3ec8015e14 chg: Better messages if website does not start 2021-02-21 23:40:47 +01:00
Raphaël Vinot 6149df06eb chg: Make the cache entries a dataclass
Fix #99
2021-01-14 17:12:23 +01:00
Raphaël Vinot 354f269218 new: Integrate categorization in indexing 2020-11-09 16:02:54 +01:00
Raphaël Vinot ea052c7c12 fix: Rename scrape -> capture in async 2020-11-05 14:14:33 +01:00
Raphaël Vinot 8b1e3585ea chg: Improve initial caching. 2020-10-29 23:25:20 +01:00
Raphaël Vinot 39f88e9121 new: API to query URLs 2020-10-27 00:02:18 +01:00
Raphaël Vinot c6c4da981c chg: Improve start/stop 2020-10-22 16:41:00 +02:00
Raphaël Vinot 733e030839 fix: make flake8 happy 2020-10-13 16:38:56 +02:00
Fafner [_KeyZee_] 81fb9db3cf
Generating correct hashes 2020-10-13 16:18:01 +02:00
Fafner [_KeyZee_] 3d17a98799
Restart lookyloo after update 2020-10-13 15:05:36 +02:00
Raphaël Vinot 90a9ff9bb5 chg: Refactoring, add get_hashes 2020-10-09 18:05:25 +02:00
Raphaël Vinot 98eac69d1f new: Add self check in update script. 2020-10-03 21:32:30 +02:00
Raphaël Vinot c0ec0d7a50 chg: Bump minimal version of poetry, bump deps, fix pyproject 2020-10-03 21:19:43 +02:00
Raphaël Vinot 26cb2f1d53 chg: make 3rd party dl a python script 2020-09-28 13:57:21 +02:00
Raphaël Vinot d33698357c new: Update script. 2020-09-28 13:32:19 +02:00
Raphaël Vinot 8c97701ed7 fix: Force kill 3rdparty.sh 2020-09-22 16:33:55 +02:00
Raphaël Vinot 7a34095d9c new: Config option for Flask IP and Port, reorganize config loading 2020-09-21 16:41:30 +02:00
Raphaël Vinot 9f4c77d5d2 chg: Cleanups, allow to add context from ressources page 2020-09-03 16:32:53 +02:00
Raphaël Vinot 1c5f4f5710 fix: Do not index private captures on public instance 2020-07-20 13:39:08 +02:00
Raphaël Vinot dab2c53269 chg: More reasonable rebuild cache 2020-07-08 18:28:07 +02:00
Raphaël Vinot 0c5501016c fix: Rebuild caches when tree doesn't exists 2020-07-08 15:52:26 +02:00
Raphaël Vinot 23419a31b9 fix: cleanup 2020-07-08 15:52:26 +02:00
Raphaël Vinot 29c78d3485 chg: Cleanup and improve index rendering 2020-07-08 15:51:45 +02:00
Raphaël Vinot 67b41ca8fb chg: Improve intergration of cookies indexing 2020-07-08 15:51:01 +02:00
Raphaël Vinot 5ae7f0f7e4 new: Initial version of cookies indexing 2020-07-08 15:42:13 +02:00
Raphaël Vinot d18f5f4f88 fix: Docker, capture form, error message. 2020-07-08 02:25:15 +02:00
Raphaël Vinot 05de56022f chg: Use capture UUID as a reference everywhere 2020-06-29 12:01:31 +02:00
Raphaël Vinot ce200717ec chg: Update call to Lookyloo in async scrape 2020-03-31 16:57:16 +02:00
Raphaël Vinot ba8574ff83 chg: Do not use eventlet/gevent anymore. 2020-01-23 11:32:36 +01:00
Raphaël Vinot d34233ad5c chg: Use poetry instead of pipenv 2020-01-21 17:39:18 +01:00
Raphaël Vinot cfa300082f fix: docker-compose should now work. 2019-11-01 21:05:08 -07:00
Raphaël Vinot 1dc71c4f0b fix: Allow to disable scraping private IPs (async module). 2019-07-05 16:37:57 +02:00
Raphaël Vinot 306081281f fix: reload cache on start, bump dependencies 2019-04-16 16:04:58 +02:00
Raphaël Vinot d951d55367 fix: Missing import 2019-04-05 16:14:30 +02:00
Raphaël Vinot 12b8e4f949 chg: Improve async processing 2019-04-05 16:12:54 +02:00
Raphaël Vinot da3d1fe392 fix: Avoid loading the cache multiple times 2019-04-05 15:07:22 +02:00
Raphaël Vinot 9ef9fef655 chg: Bump configs to be in-line with prod 2019-04-05 14:13:07 +02:00
Raphaël Vinot 35f4292ab0 fix: Systemd service, add proper stop script 2019-04-05 14:01:36 +02:00
Raphaël Vinot 1d244ef456 chg: Refactor code organisation 2019-01-30 14:30:01 +01:00
Raphaël Vinot 6bc316ebcf new: Initial commit for client and async scraping 2019-01-29 18:37:13 +01:00
Raphaël Vinot bbb8c5343f chg: Cleanup, use pipfile 2019-01-23 15:13:29 +01:00