Commit Graph

4 Commits (92adea38d0b495e5419a22867fe13cdd96f8b291)

Author SHA1 Message Date
Mokaddem 3a4dcd691d Improved description of modules inside the scripts 2017-05-09 11:13:16 +02:00
Alexandre Dulaunoy 3b101ea8f5 (partially) Fix #91 using a simple Alarm (SIGNAL) when exec-timeout
Introducing a timer (in this case 5 seconds) to ensure that the
execution time of the tokenizer takes less than 5 seconds. This
is a simple and standard POSIX signal handler.

This approach fixes the specific issues we have currently
with some inputs where the tokenization takes too much time. This
fix should be improved and be more generic:

 - Introducing statistics of content which timeouts.
 - Keeping a list/queue to further process those files using a different
   tokenizer approach. Maybe a set of "dirty" processes to handle the edge cases
   and to not impact the overall processing and analysis.
 - Make the timer configurable per module (at least for this one).
2017-01-12 07:32:55 +00:00
Raphaël Vinot 12aca6b760 Add script to import from local directory, use local python from env 2016-02-04 15:22:51 +01:00
Raphaël Vinot abfe13436b Big refactoring, make the queues more flexible 2014-08-29 19:37:56 +02:00