fix: Quit BG indexer when shutdown is requested. Improve exceptions handling in archiver

pull/834/head
Raphaël Vinot 2023-11-20 11:45:41 +01:00
parent 11a3b6b2f9
commit efe2124753
3 changed files with 80 additions and 3 deletions

View File

@ -65,6 +65,74 @@ pip install pylookyloo
For more details on `pylookyloo`, read the overview [docs](https://www.lookyloo.eu/docs/main/pylookyloo-overview.html), the [documentation](https://pylookyloo.readthedocs.io/en/latest/) of the module itself, or the code in this [GitHub repository](https://github.com/Lookyloo/PyLookyloo).
# Notes regarding using S3FS for storage
## Directory listing
TL;DR: it is slow.
If you have namy captures (say more than 1000/day), and store captures in a s3fs bucket mounted with s3fs-fuse,
doing a directory listing in bash (`ls`) will most probably lock the I/O for every process
trying to access any file in the whole bucket. The same will be true if you access the
filesystem using python methods (`iterdir`, `scandir`...))
A workaround is to use the python s3fs module as it will not access the filesystem for listing directories.
You can configure the s3fs credentials in `config/generic.json` key `s3fs`.
## Versioning
By default, a MinIO bucket (backend for s3fs) will have versioning enabled, wich means it
keeps a copy of every version of every file you're storing. It becomes a problem if you have a lot of captures
as the index files are updated on every change, and the max amount of versions is 10.000.
So by the time you have > 10.000 captures in a directory, you'll get I/O errors when you try
to update the index file. And you absolutely do not care about that versioning in lookyloo.
To check if versioning is enabled (can be either enabled or suspended):
```
mc version info <alias_in_config>/<bucket>
```
The command below will suspend versioning:
```bash
mc version suspend <alias_in_config>/<bucket>
```
And if you're already stuck with an index that was updated 10.000 times and you cannot do anything about it:
```bash
mc rm --non-current --versions --recursive --force <alias_in_config>/<bucket>/path/to/index
```
Error message from bash (unhelpful):
```bash
$ (git::main) rm /path/to/lookyloo/archived_captures/Year/Month/Day/index
rm: cannot remove '/path/to/lookyloo/archived_captures/Year/Month/Day/index': Input/output error
```
Python code:
```python
from lookyloo.default import get_config
import s3fs
s3fs_config = get_config('generic', 's3fs')
s3fs_client = s3fs.S3FileSystem(key=s3fs_config['config']['key'],
secret=s3fs_config['config']['secret'],
endpoint_url=s3fs_config['config']['endpoint_url'])
s3fs_bucket = s3fs_config['config']['bucket_name']
s3fs_client.rm_file(s3fs_bucket + '/Year/Month/Day/index')
```
Error from python (somewhat more helpful):
```
OSError: [Errno 5] An error occurred (MaxVersionsExceeded) when calling the DeleteObject operation: You've exceeded the limit on the number of versions you can create on this object
```
# Contributing to Lookyloo
To learn more about contributing to Lookyloo, see our [contributor guide](https://www.lookyloo.eu/docs/main/contributing.html).

View File

@ -324,9 +324,13 @@ class Archiver(AbstractManager):
try:
new_capture_path = self.__archive_single_capture(capture_path)
capture_breakpoint -= 1
except OSError as e:
self.logger.warning(f'Unable to archive capture: {e}')
finally:
except OSError:
self.logger.exception(f'Unable to archive capture {capture_path}')
(capture_path / 'lock').unlink(missing_ok=True)
except Exception:
self.logger.exception(f'Critical exception while archiving {capture_path}')
(capture_path / 'lock').unlink(missing_ok=True)
else:
(new_capture_path / 'lock').unlink(missing_ok=True)
if archiving_done:

View File

@ -45,7 +45,12 @@ class BackgroundIndexer(AbstractManager):
archive_interval = timedelta(days=get_config('generic', 'archive'))
cut_time = (datetime.now() - archive_interval)
for month_dir in make_dirs_list(self.lookyloo.capture_dir):
__counter_shutdown = 0
for capture_time, path in sorted(get_sorted_captures_from_disk(month_dir, cut_time=cut_time, keep_more_recent=True), reverse=True):
__counter_shutdown += 1
if __counter_shutdown % 10 and self.shutdown_requested():
self.logger.warning('Shutdown requested, breaking.')
return False
if ((path / 'tree.pickle.gz').exists() or (path / 'tree.pickle').exists()):
# We already have a pickle file
self.logger.debug(f'{path} has a pickle.')