Commit Graph

409 Commits (d3cf0730f8bf3de58d37060eff0dca3f69b4a0e1)

Author SHA1 Message Date
Amber Brown 4cd1c9f2ff
Delete the disused & unspecced identicon functionality (#4106) 2018-10-29 23:57:24 +11:00
Richard van der Hoff ef771cc4c2 Fix a number of flake8 errors
Broadly three things here:

* disable W504 which seems a bit whacko
* remove a bunch of `as e` expressions from exception handlers that don't use
  them
* use `r""` for strings which include backslashes

Also, we don't use pep8 any more, so we can get rid of the duplicate config
there.
2018-10-24 10:39:03 +01:00
Richard van der Hoff 5c445114d3
Correctly account for cpu usage by background threads (#4074)
Wrap calls to deferToThread() in a thing which uses a child logcontext to
attribute CPU usage to the right request.

While we're in the area, remove the logcontext_tracer stuff, which is never
used, and afaik doesn't work.

Fixes #4064
2018-10-23 13:12:32 +01:00
Erik Johnston f6a0a02a62 Fix bug where we raised StopIteration in a generator
This made python 3.7 unhappy
2018-10-17 16:10:52 +01:00
Richard van der Hoff 4c3e7eeec5
Merge pull request #3932 from matrix-org/erikj/auto_start_expiring_caches
Fix some instances of ExpiringCache not expiring cache items
2018-09-25 12:02:57 +01:00
Jérémy Farnaud 6cf261930a added "media-src: 'self'" to CSP for resources (#3578)
Synapse doesn’t allow for media resources to be played directly from
Chrome. It is a problem for users on other networks (e.g. IRC)
communicating with Matrix users through a gateway. The gateway sends
them the raw URL for the resource when a Matrix user uploads a video
and the video cannot be played directly in Chrome using that URL.

Chrome argues it is not authorized to play the video because of the
Content Security Policy. Chrome checks for the "media-src" policy which
is missing, and defauts to the "default-src" policy which is "none".

As Synapse already sends "object-src: 'self'" I thought it wouldn’t be
a problem to add "media-src: 'self'" to the CSP to fix this problem.
2018-09-25 11:55:02 +01:00
Erik Johnston 8601c24287 Fix some instances of ExpiringCache not expiring cache items
ExpiringCache required that `start()` be called before it would actually
start expiring entries. A number of places didn't do that.

This PR removes `start` from ExpiringCache, and automatically starts
backround reaping process on creation instead.
2018-09-21 14:19:46 +01:00
Amber Brown 02aa41809b
Port rest/ to Python 3 (#3823) 2018-09-12 20:41:31 +10:00
Amber Brown 324525f40c
Port over enough to get some sytests running on Python 3 (#3668) 2018-08-20 23:54:49 +10:00
Will Hunt c151b32b1d Add GET media/v1/config (#3184) 2018-08-16 14:23:38 +01:00
Amber Brown b37c472419
Rename async to async_helpers because `async` is a keyword on Python 3.7 (#3678) 2018-08-10 23:50:21 +10:00
Richard van der Hoff 018d75a148 Refactor code for turning HttpResponseException into SynapseError
This commit replaces SynapseError.from_http_response_exception with
HttpResponseException.to_synapse_error.

The new method actually returns a ProxiedRequestError, which allows us to pass
through additional metadata from the API call.
2018-08-01 16:02:46 +01:00
Amber Brown da7785147d
Python 3: Convert some unicode/bytes uses (#3569) 2018-08-02 00:54:06 +10:00
Richard van der Hoff 03751a6420 Fix some looping_call calls which were broken in #3604
It turns out that looping_call does check the deferred returned by its
callback, and (at least in the case of client_ips), we were relying on this,
and I broke it in #3604.

Update run_as_background_process to return the deferred, and make sure we
return it to clock.looping_call.
2018-07-26 11:48:08 +01:00
Richard van der Hoff 371da42ae4 Wrap a number of things that run in the background
This will reduce the number of "Starting db connection from sentinel context"
warnings, and will help with our metrics.
2018-07-25 09:41:12 +01:00
Krombel 4a27000548 check isort by travis 2018-07-16 13:57:33 +02:00
Krombel 32fd6910d0 Use parse_{int,str} and assert from http.servlet
parse_integer and parse_string can take a request and raise errors
in case we have wrong or missing params.
This PR tries to use them more to deduplicate some code and make it
better readable
2018-07-13 21:40:14 +02:00
Amber Brown 49af402019 run isort 2018-07-09 16:09:20 +10:00
Amber Brown 6350bf925e
Attempt to be more performant on PyPy (#3462) 2018-06-28 14:49:57 +01:00
Amber Brown 77ac14b960
Pass around the reactor explicitly (#3385) 2018-06-22 09:37:10 +01:00
Amber Brown 1f69693347
Merge pull request #3244 from NotAFile/py3-six-4
replace some iteritems with six
2018-05-24 13:04:07 -05:00
Adrian Tschira 933bf2dd35 replace some iteritems with six
Signed-off-by: Adrian Tschira <nota@notafile.com>
2018-05-19 17:59:26 +02:00
Adrian Tschira aafb0f6b0d py3-ize url preview 2018-05-19 17:35:20 +02:00
Richard van der Hoff 318711e139 Set Server header in SynapseRequest
(instead of everywhere that writes a response. Or rather, the subset of places
which write responses where we haven't forgotten it).

This also means that we don't have to have the mysterious version_string
attribute in anything with a request handler.

Unfortunately it does mean that we have to pass the version string wherever we
instantiate a SynapseSite, which has been c&ped 150 times, but that is code
that ought to be cleaned up anyway really.
2018-05-10 18:50:27 +01:00
Richard van der Hoff 645cb4bf06 Remove redundant request_handler decorator
This is needless complexity; we might as well use the wrapper directly.

Also rename wrap_request_handler->wrap_json_request_handler.
2018-05-10 12:19:53 +01:00
Richard van der Hoff be31adb036 Fix logcontext leak in media repo
Make FileResponder.write_to_consumer uphold the logcontext contract
2018-05-02 16:14:50 +01:00
Richard van der Hoff dbf6f28d64
Merge pull request #3155 from NotAFile/py3-bytes-1
more bytes strings
2018-04-30 00:38:21 +01:00
Richard van der Hoff aab2e4da60
Merge pull request #3140 from matrix-org/rav/use_run_in_background
Use run_in_background in preference to preserve_fn
2018-04-30 00:34:28 +01:00
Richard van der Hoff 9e2601f830
Merge pull request #3108 from NotAFile/py3-six-urlparse
Use six.moves.urlparse
2018-04-30 00:33:05 +01:00
Adrian Tschira e9143b6593 more bytes strings
Signed-off-by: Adrian Tschira <nota@notafile.com>
2018-04-29 00:13:57 +02:00
Richard van der Hoff fc149b4eeb Merge remote-tracking branch 'origin/develop' into rav/use_run_in_background 2018-04-27 14:31:23 +01:00
Richard van der Hoff 2a13af23bc Use run_in_background in preference to preserve_fn
While I was going through uses of preserve_fn for other PRs, I converted places
which only use the wrapped function once to use run_in_background, to avoid
creating the function object.
2018-04-27 12:55:51 +01:00
Richard van der Hoff 9255a6cb17 Improve exception handling for background processes
There were a bunch of places where we fire off a process to happen in the
background, but don't have any exception handling on it - instead relying on
the unhandled error being logged when the relevent deferred gets
garbage-collected.

This is unsatisfactory for a number of reasons:
 - logging on garbage collection is best-effort and may happen some time after
   the error, if at all
 - it can be hard to figure out where the error actually happened.
 - it is logged as a scary CRITICAL error which (a) I always forget to grep for
   and (b) it's not really CRITICAL if a background process we don't care about
   fails.

So this is an attempt to add exception handling to everything we fire off into
the background.
2018-04-27 11:07:40 +01:00
Adrian Tschira 2a3c33ff03 Use six.moves.urlparse
The imports were shuffled around a bunch in py3

Signed-off-by: Adrian Tschira <nota@notafile.com>
2018-04-15 21:22:43 +02:00
Adrian Tschira 4f40d058cc Replace old-style raise with six.reraise
The old style raise is invalid syntax in python3. As noted in the docs,
this adds one more frame in the traceback, but I think this is
acceptable:

    <ipython-input-7-bcc5cba3de3f> in <module>()
         16     except:
         17         pass
    ---> 18     six.reraise(*x)

    /usr/lib/python3.6/site-packages/six.py in reraise(tp, value, tb)
        691             if value.__traceback__ is not tb:
        692                 raise value.with_traceback(tb)
    --> 693             raise value
        694         finally:
        695             value = None

    <ipython-input-7-bcc5cba3de3f> in <module>()
          9
         10 try:
    ---> 11     x()
         12 except:
         13     x = sys.exc_info()

Also note that this uses six, which is not formally a dependency yet,
but is included indirectly since most packages depend on it.

Signed-off-by: Adrian Tschira <nota@notafile.com>
2018-04-06 23:06:24 +02:00
Erik Johnston fa72803490 Merge branch 'master' of github.com:matrix-org/synapse into develop 2018-03-19 11:41:01 +00:00
Erik Johnston 926ba76e23 Replace ujson with simplejson 2018-03-15 23:43:31 +00:00
Erik Johnston 92c52df702 Make store_file use store_into_file 2018-02-14 17:55:18 +00:00
Erik Johnston 5fa571a91b Tell storage providers about new file so they can upload 2018-02-07 13:35:08 +00:00
Erik Johnston 1f881e0746
Merge pull request #2791 from matrix-org/erikj/media_storage_refactor
Ensure media is in local cache before thumbnailing
2018-02-05 11:28:52 +00:00
Richard van der Hoff d5352cbba8 Handle url_previews with no content-type
avoid failing with an exception if the remote server doesn't give us a
Content-Type header.

Also, clean up the exception handling a bit.
2018-02-02 00:53:46 +00:00
Matthew Hodgson ab9f844aaf
Add federation_domain_whitelist option (#2820)
Add federation_domain_whitelist

gives a way to restrict which domains your HS is allowed to federate with.
useful mainly for gracefully preventing a private but internet-connected HS from trying to federate to the wider public Matrix network
2018-01-22 19:11:18 +01:00
Richard van der Hoff b0d9e633ee
Merge pull request #2814 from matrix-org/rav/fix_urlcache_thumbs
Use the right path for url_preview thumbnails
2018-01-19 18:57:15 +00:00
Richard van der Hoff ad7ec63d08 Use the right path for url_preview thumbnails
This was introduced by #2627: we were overwriting the original media for url
previews with the thumbnails :/

(fixes https://github.com/vector-im/riot-web/issues/6012, hopefully)
2018-01-19 18:29:39 +00:00
Erik Johnston cd871a3057 Fix storage provider bug introduced when renamed to store_local 2018-01-18 18:37:59 +00:00
Erik Johnston 8ff6726c0d
Merge pull request #2812 from matrix-org/erikj/media_storage_provider_config
Make storage providers configurable
2018-01-18 18:33:57 +00:00
Erik Johnston 3fe2bae857 Missing staticmethod 2018-01-18 17:11:45 +00:00
Erik Johnston aae77da73f Fixup comments 2018-01-18 17:11:29 +00:00
Erik Johnston 9a89dae8c5 Fix typo in thumbnail resource causing access times to be incorrect 2018-01-18 15:06:24 +00:00
Erik Johnston 0af5dc63a8 Make storage providers more configurable 2018-01-18 14:07:21 +00:00
Erik Johnston 2cf6a7bc20 Use better file consumer 2018-01-18 12:00:46 +00:00
Erik Johnston 4a53f3a3e8 Ensure media is in local cache before thumbnailing 2018-01-18 12:00:46 +00:00
Erik Johnston 300edc2348 Update last access time when thumbnails are viewed 2018-01-17 10:24:43 +00:00
Erik Johnston 05f98a2224 Keep track of last access time for local media 2018-01-17 10:24:43 +00:00
Erik Johnston d728c47142 Add docstring 2018-01-17 10:06:14 +00:00
Erik Johnston d863f68cab Use local vars 2018-01-16 16:24:15 +00:00
Erik Johnston 6368e5c0ab Change _generate_thumbnails to take media_type 2018-01-16 16:17:38 +00:00
Erik Johnston 0a90d9ede4 Move setting of file_id up to caller 2018-01-16 16:03:05 +00:00
Erik Johnston 5dfc83704b Fix typo 2018-01-16 14:32:56 +00:00
Erik Johnston 307f88dfb6 Fix up log lines 2018-01-16 13:53:52 +00:00
Erik Johnston 9795b9ebb1 Correctly use server_name/file_id when generating/fetching remote thumbnails 2018-01-16 12:02:06 +00:00
Erik Johnston c5b589f2e8 Log when we respond with 404 2018-01-16 12:01:40 +00:00
Erik Johnston a4c5e4a645 Fix thumbnailing remote files 2018-01-16 11:37:50 +00:00
Erik Johnston 1159abbdd2
Merge pull request #2767 from matrix-org/erikj/media_storage_refactor
Refactor MediaRepository to separate out storage
2018-01-16 10:23:50 +00:00
Richard van der Hoff 21bf87a146 Reinstate media download on thumbnail request
We need to actually download the remote media when we get a request for a
thumbnail.
2018-01-12 15:38:06 +00:00
Erik Johnston 694f1c1b18 Fix up comments 2018-01-12 15:02:46 +00:00
Erik Johnston e21370ba54 Correctly reraise exception 2018-01-12 14:44:02 +00:00
Erik Johnston 85a4d78213 Make Responder a context manager 2018-01-12 13:32:03 +00:00
Erik Johnston dcc8eded41 Add missing class var 2018-01-12 13:16:27 +00:00
Erik Johnston 1e4edd1717 Remove unnecessary condition 2018-01-12 11:28:32 +00:00
Erik Johnston c6c009603c Remove unused variables 2018-01-12 11:24:05 +00:00
Erik Johnston 4d88958cf6 Make class var local 2018-01-12 11:23:54 +00:00
Erik Johnston 227c491510 Comments 2018-01-12 11:22:41 +00:00
Erik Johnston 8f03aa9f61 Add StorageProvider concept 2018-01-09 16:16:12 +00:00
Erik Johnston 2442e9876c Make PreviewUrlResource use MediaStorage 2018-01-09 16:15:07 +00:00
Erik Johnston 9d30a7691c Make ThumbnailResource use MediaStorage 2018-01-09 16:15:07 +00:00
Erik Johnston 9e20840e02 Use MediaStorage for remote media 2018-01-09 16:15:07 +00:00
Erik Johnston dd3092c3a3 Use MediaStorage for local files 2018-01-09 16:15:07 +00:00
Erik Johnston ada470bccb Add MediaStorage class 2018-01-09 16:15:07 +00:00
Erik Johnston 1ee787912b Add some helper classes 2018-01-09 16:15:07 +00:00
Erik Johnston 47ca5eb882 Split out add_file_headers 2018-01-09 16:15:07 +00:00
Erik Johnston b6c9deffda Remove dead TODO 2018-01-09 15:53:23 +00:00
Erik Johnston b30cd5b107 Remove dead code related to default thumbnails 2018-01-09 14:38:33 +00:00
Richard van der Hoff 5a4da5bf78
Merge pull request #2697 from matrix-org/rav/fix_urlcache_index_error
Fix error on sqlite 3.7
2017-11-27 12:25:48 +00:00
Richard van der Hoff 8132a6b7ac Fix OPTIONS on preview_url
Fixes #2706
2017-11-23 17:52:31 +00:00
Richard van der Hoff 2908f955d1 Check database in has_completed_background_updates
so that the right thing happens on workers.
2017-11-22 18:02:15 +00:00
Richard van der Hoff 7098b65cb8 Fix error on sqlite 3.7
Create the url_cache index on local_media_repository as a background update, so
that we can detect whether we are on sqlite or not and create a partial or
complete index accordingly.

To avoid running the cleanup job before we have built the index, add a bailout
which will defer the cleanup if the bg updates are still running.

Fixes https://github.com/matrix-org/synapse/issues/2572.
2017-11-21 11:14:17 +00:00
Richard van der Hoff 5d15abb120 Bit more logging 2017-11-10 16:58:04 +00:00
Richard van der Hoff 46790f50cf Cache failures in url_preview handler
Reshuffle the caching logic in the url_preview handler so that failures are
cached (and to generally simplify things and fix the logcontext leaks).
2017-11-10 16:50:50 +00:00
Maxime Vaillancourt 5287e57c86 Ignore noscript tags when generating URL previews 2017-10-25 20:44:34 -04:00
Richard van der Hoff eaaabc6c4f replace 'except:' with 'except Exception:'
what could possibly go wrong
2017-10-23 15:52:32 +01:00
Richard van der Hoff d03cfc4258 Fix a logcontext leak in the media repo 2017-10-23 14:34:27 +01:00
Erik Johnston bd5718d0ad Fix typo in thumbnail generation 2017-10-19 10:27:18 +01:00
Krombel a6245478c8 fix thumbnailing (#2548)
in commit 0e28281a the code for thumbnailing got refactored and the
renaming of this variables was not done correctly.

Signed-Off-by: Matthias Kesler <krombel@krombel.de>
2017-10-17 12:45:33 +02:00
Erik Johnston 1b6b0b1e66 Add try/finally block to close t_byte_source 2017-10-13 15:34:08 +01:00
Erik Johnston 6b725cf56a Remove old comment 2017-10-13 15:23:41 +01:00
Erik Johnston 2b24416e90 Don't reuse source but instead copy from primary media store to backup 2017-10-13 14:11:34 +01:00
Erik Johnston b92a8e6e4a PEP8 2017-10-13 13:58:57 +01:00
Erik Johnston 31aa7bd8d1 Move type into key 2017-10-13 13:47:38 +01:00
Erik Johnston ad1911bbf4 Comment 2017-10-13 13:47:05 +01:00
Erik Johnston c021c39cbd Remove spurious addition 2017-10-13 13:46:53 +01:00
Erik Johnston 1f43d22397 Don't needlessly rename variable 2017-10-13 11:42:07 +01:00
Erik Johnston a675bd08bd Add paths back in... 2017-10-13 11:41:06 +01:00
Erik Johnston 4d7e1dde70 Remove unnecessary diff 2017-10-13 11:36:32 +01:00
Erik Johnston ae5d18617a Make things be absolute paths again 2017-10-13 11:35:44 +01:00
Erik Johnston 9732ec6797 s/write_to_file/write_to_file_and_backup/ 2017-10-13 11:34:41 +01:00
Erik Johnston 0e28281a02 Fix up 2017-10-13 11:33:49 +01:00
Erik Johnston 505371414f Fix up thumbnailing function 2017-10-13 11:23:53 +01:00
Erik Johnston e3428d26ca Fix typo 2017-10-13 10:39:59 +01:00
Erik Johnston 35332298ef Fix up comments 2017-10-13 10:39:32 +01:00
Erik Johnston 64db043a71 Move makedirs to thread 2017-10-13 10:25:01 +01:00
Erik Johnston b60859d6cc Use make_deferred_yieldable 2017-10-13 10:24:19 +01:00
Erik Johnston d76621a47b Fix comments 2017-10-12 18:16:25 +01:00
Erik Johnston 4ae85ae121 Don't close prematurely.. 2017-10-12 17:57:31 +01:00
Erik Johnston cc505b4b5e getvalue closes buffer 2017-10-12 17:52:30 +01:00
Erik Johnston 1259a76047 Get len before close 2017-10-12 17:39:23 +01:00
Erik Johnston 802ca12d05 Don't close file prematurely 2017-10-12 17:37:21 +01:00
Erik Johnston e283b555b1 Copy everything to backup 2017-10-12 17:31:24 +01:00
Erik Johnston b77a13812c Typo 2017-10-12 15:32:32 +01:00
Erik Johnston 6dfde6d485 Remove dead code 2017-10-12 15:30:26 +01:00
Erik Johnston c8eeef6947 Fix typos 2017-10-12 15:28:24 +01:00
Erik Johnston 67cb89fbdf Fix typo 2017-10-12 15:23:41 +01:00
Erik Johnston bf4fb1fb40 Basic implementation of backup media store 2017-10-12 15:20:59 +01:00
Erik Johnston d5694ac5fa Only log if we've removed media 2017-09-28 16:08:08 +01:00
Erik Johnston 7cc483aa0e Clear up expired url cache every 10s 2017-09-28 13:56:53 +01:00
Erik Johnston e1e7d76cf1 Actually assign result to variable 2017-09-28 13:55:29 +01:00
Erik Johnston 5f501ec7e2 Fix typo in url cache expiry timer 2017-09-28 12:59:01 +01:00
Erik Johnston ace8079086 Support new and old style media id formats 2017-09-28 12:52:51 +01:00
Erik Johnston ae79764fe5 Change expires column to expires_ts 2017-09-28 12:37:53 +01:00
Erik Johnston 9ccb4226ba Delete expired url cache data 2017-09-28 12:18:06 +01:00
Erik Johnston 7fe8ed1787 Store URL cache preview downloads seperately
This makes it easier to clear old media out at a later date
2017-06-23 11:14:11 +01:00
Erik Johnston b8b936a6ea Add API to quarantine media 2017-06-19 17:39:21 +01:00
Erik Johnston 48d2949416 Throw exception when not retrying when downloading media 2017-06-13 10:23:14 +01:00
Matthew Hodgson 836d5c44b6 actually trim oversize og:description meta 2017-05-22 21:14:20 +01:00
Erik Johnston d12ae7fd1c Don't log exceptions for NotRetryingDestination 2017-05-15 15:42:18 +01:00
Richard van der Hoff 1d09586599 Address review comments
- don't blindly proxy all HTTPRequestExceptions
- log unexpected exceptions at error
- avoid `isinstance`
- improve docs on `from_http_response_exception`
2017-03-14 14:15:37 +00:00
Richard van der Hoff 170ccc9de5 Fix routing loop when fetching remote media
When we proxy a media request to a remote server, add a query-param, which will
tell the remote server to 404 if it doesn't recognise the server_name.

This should fix a routing loop where the server keeps forwarding back to
itself.

Also improves the error handling on remote media fetches, so that we don't
always return a rather obscure 502.
2017-03-13 16:30:36 +00:00
Jurek aea5461488 Fix dynamic thumbnails aspect 2017-02-24 22:43:27 +01:00
Mark Haines 32019c9897 Log which files we saved attachments to in the media_repository 2017-01-10 14:19:50 +00:00
Erik Johnston f7085ac84f Name linearizer's for better logs 2017-01-09 17:17:10 +00:00
Marcin Bachry 24c16fc349 Fix crash in url preview when html tag has no text
Signed-off-by: Marcin Bachry <hegel666@gmail.com>
2016-12-14 22:38:18 +01:00
Johannes Löthberg 32c8b5507c preview_url_resource: Ellipsis must be in unicode string
Signed-off-by: Johannes Löthberg <johannes@kyriasis.com>
2016-12-01 13:12:13 +01:00
Mark Haines b1c27975d0 Set CORs headers on responses from the media repo 2016-11-02 11:29:25 +00:00
Erik Johnston d51b8a1674 Add quotes and be explicity about script-src 2016-09-05 17:35:01 +01:00
Erik Johnston 662b031a30 Allow PDF to be rendered from media repo 2016-09-05 17:25:26 +01:00
Erik Johnston 0af9e1a637 Set `Content-Security-Policy` on media repo
This is to inform browsers that they should sandbox the returned
media. This is particularly cruical for javascript/HTML files.
2016-08-17 16:27:39 +01:00
Erik Johnston f90b3d83a3 Add None check to _iterate_over_text 2016-08-17 15:17:17 +01:00
Erik Johnston 109a560905 Flake8 2016-08-16 14:57:21 +01:00
Erik Johnston 48b5829aea Fix up preview URL API. Add tests.
This includes:

- Splitting out methods of a class into stand alone functions, to make
  them easier to test.
- Adding unit tests to split out functions, testing HTML -> preview.
- Handle the fact that elements in lxml may have tail text.
2016-08-16 14:53:24 +01:00
Erik Johnston 5bcccfde6c Don't include html comments in description 2016-08-05 14:45:11 +01:00
Erik Johnston b5525c76d1 Typo 2016-08-04 16:10:08 +01:00
Erik Johnston e97648c4e2 Test summarization 2016-08-04 16:09:09 +01:00
Erik Johnston 58c9653c6b Don't infer paragrahs from newlines 2016-08-02 18:50:24 +01:00
Erik Johnston 6b58ade2f0 Comment on why we clone 2016-08-02 18:41:22 +01:00
Erik Johnston 9e66c58ceb Spelling. 2016-08-02 18:37:31 +01:00
Erik Johnston f83f5fbce8 Make it actually compile 2016-08-02 18:32:42 +01:00
Erik Johnston aecaec3e10 Change the way we summarize URLs
Using XPath is slow on some machines (for unknown reasons), so use a
different approach to get a list of text nodes.

Try to generate a summary that respect paragraph and then word
boundaries, adding ellipses when appropriate.
2016-08-02 18:25:53 +01:00
Erik Johnston f52cb4cd78 Remove race 2016-06-29 15:24:50 +01:00
Erik Johnston a70688445d Implement purge_media_cache admin API 2016-06-29 14:57:59 +01:00
Erik Johnston 314b146b2e Track approximate last access time for remote media 2016-06-29 11:41:20 +01:00
Erik Johnston 09a17f965c Line lengths 2016-06-15 16:58:12 +01:00
Erik Johnston 1e9026e484 Handle floats as img widths 2016-06-15 16:58:05 +01:00
Erik Johnston a60169ea09 Handle og props with not content 2016-06-15 16:57:48 +01:00
Erik Johnston eba4ff1bcb 502 on /thumbnail when can't contact remote server 2016-06-09 11:29:43 +01:00
Mark Haines eb79110beb Clean up the blacklist/whitelist handling.
Always set the config key with an empty list, even if a list isn't specified.
This means that the codepaths are the same for both the empty list and
for a missing key. Since the behaviour is the same for both cases this
makes the code somewhat easier to reason about.
2016-05-16 13:03:59 +01:00
Mark Haines 8d7ad44331 Report per request metrics for all of the things using request_handler 2016-04-28 10:57:49 +01:00
Erik Johnston e8884e5e9c Add self.media_repo to PreviewUrlResource 2016-04-19 14:51:34 +01:00
Erik Johnston a7001c311b _make_dirs was moved to MediaRepository 2016-04-19 14:49:31 +01:00
Erik Johnston 9181e2f4c7 Add store to PreviewUrlResource 2016-04-19 14:48:24 +01:00
Erik Johnston fb76a81ff7 Reorder imports 2016-04-19 14:45:05 +01:00
Erik Johnston 0c93df89b6 Move MediaRepository to media_repository module 2016-04-19 11:31:43 +01:00
Erik Johnston 43f0941e8f Split out BaseMediaResource into MediaRepository
This is so that a single MediaRepository can be shared across all
resources, rather than having a "copy" per resource.

In particular this allows us to guard against both the thumbnail and
download resource triggering a download of remote content at the same
time.
2016-04-19 11:24:59 +01:00
Matthew Hodgson aaabbd3e9e explicitly pass in the charset from Content-Type to lxml to fix cyrillic woes better 2016-04-15 14:32:25 +01:00
Matthew Hodgson 84f9cac4d0 fix cyrillic URL previews by hardcoding all page decoding to UTF-8 for now, rather than relying on lxml's heuristics which seem to get it wrong 2016-04-15 13:20:08 +01:00
Matthew Hodgson f78b479118 fix urlparse import thinko breaking tiny URLs 2016-04-14 15:23:55 +01:00
Matthew Hodgson bd77216d06 comment out 2c838f6459 due to risk of https://en.wikipedia.org/wiki/Billion_laughs attacks - thanks @torhve 2016-04-14 14:39:24 +01:00
Erik Johnston d0633e6dbe Sanitize the optional dependencies for spider API 2016-04-13 13:38:09 +01:00
Erik Johnston 17515bae14 PEP8 2016-04-11 11:02:50 +01:00
Matthew Hodgson 5ffacc5e84 fix typos and needless try/except from PR review 2016-04-11 10:39:16 +01:00
Matthew Hodgson 83b2f83da0 actually throw meaningful errors 2016-04-08 21:36:59 +01:00
Mark Haines b36270b5e1 Fix pep8 warning 2016-04-08 19:52:23 +01:00
Matthew Hodgson 1ccabe2965 more PR feedback 2016-04-08 18:58:08 +01:00
Matthew Hodgson dafef5a688 Add url_preview_enabled config option to turn on/off preview_url endpoint. defaults to off.
Add url_preview_ip_range_blacklist to let admins specify internal IP ranges that must not be spidered.
Add url_preview_url_blacklist to let admins specify URL patterns that must not be spidered.
Implement a custom SpiderEndpoint and associated support classes to implement url_preview_ip_range_blacklist
Add commentary and generally address PR feedback
2016-04-08 18:37:15 +01:00
Matthew Hodgson cf51c4120e report image size (bytewise) in OG meta 2016-04-03 23:57:05 +01:00
Matthew Hodgson 0834b152fb char encoding 2016-04-03 12:59:27 +01:00
Matthew Hodgson 8b98a7e8c3 pep8 2016-04-03 12:56:29 +01:00
Matthew Hodgson eab4d462f8 fix etag typing error. fix timestamp typing error 2016-04-03 02:02:46 +01:00
Matthew Hodgson c3916462f6 rebase all image URLs 2016-04-03 01:33:12 +01:00
Matthew Hodgson 110780b18b remove stale todo 2016-04-03 00:48:31 +01:00
Matthew Hodgson b09e29a03c Ensure only one download for a given URL is active at a time 2016-04-03 00:47:40 +01:00
Matthew Hodgson 7426c86eb8 add a persistent cache of URL lookups, and fix up the in-memory one to work 2016-04-03 00:31:57 +01:00
Matthew Hodgson d1b154a10f support gzip compression, and don't pass through error msgs 2016-04-02 03:06:39 +01:00
Matthew Hodgson 9377157961 how was _respond_default_thumbnail ever meant to work? 2016-04-02 02:31:45 +01:00
Matthew Hodgson 2c838f6459 pass back SVGs as their own thumbnails 2016-04-02 02:30:07 +01:00
Matthew Hodgson 5037ee0d37 handle missing dimensions without crashing 2016-04-02 02:29:57 +01:00
Matthew Hodgson b26e8604f1 make meta comparisons case insensitive 2016-04-02 01:35:44 +01:00
Matthew Hodgson 5fd07da764 refactor calc_og; spider image URLs; fix xpath; add a (broken) expiringcache; loads of other fixes 2016-04-02 00:35:49 +01:00
Matthew Hodgson c60b751694 fix assorted redirect, unicode and screenscraping bugs 2016-04-01 02:17:48 +01:00
Matthew Hodgson 683e564815 handle spidered relative images correctly 2016-03-31 23:52:58 +01:00
Matthew Hodgson 72550c3803 prevent choking on invalid utf-8, and handle image thumbnailing smarter 2016-03-31 15:14:14 +01:00