Commit Graph

172 Commits (32545d2e2611743633d9f98a40d5f62791e19a6f)

Author SHA1 Message Date
Patrick Cloke caec7d4fa0
Convert some of the media REST code to async/await (#7110) 2020-03-20 07:20:02 -04:00
Erik Johnston b0a66ab83c
Fixup synapse.rest to pass mypy (#6732) 2020-01-20 17:38:21 +00:00
Erik Johnston 4a33a6dd19 Move background update handling out of store 2019-12-05 11:11:26 +00:00
Richard van der Hoff ef1a85e773
Fix startup error when http proxy is defined. (#6421)
Guess I only tested this on python 2 :/

Fixes #6419.
2019-11-26 18:10:50 +00:00
Andrew Morgan 3916e1b97a
Clean up newline quote marks around the codebase (#6362) 2019-11-21 12:00:14 +00:00
Richard van der Hoff 5570d1c93f
Merge pull request #6334 from matrix-org/rav/url_preview_limit_title_2
Fix exception when OpenGraph tag values are ints
2019-11-05 17:28:11 +00:00
Richard van der Hoff 81d49cbb07 Fix exception when OpenGraph tag values are ints 2019-11-05 17:22:58 +00:00
Richard van der Hoff 55a7da247a
Merge branch 'develop' into rav/url_preview_limit_title 2019-11-05 17:08:07 +00:00
Richard van der Hoff e78167c94b
Apply suggestions from code review
Co-Authored-By: Brendan Abolivier <babolivier@matrix.org>
Co-Authored-By: Erik Johnston <erik@matrix.org>
2019-11-05 16:46:39 +00:00
Richard van der Hoff e9bfe719ba Strip overlong OpenGraph data from url preview
... to stop people causing DoSes with malicious web pages
2019-11-05 15:51:18 +00:00
Richard van der Hoff 1cb84c6486
Support for routing outbound HTTP requests via a proxy (#6239)
The `http_proxy` and `HTTPS_PROXY` env vars can be set to a `host[:port]` value which should point to a proxy.

The address of the proxy should be excluded from IP blacklists such as the `url_preview_ip_range_blacklist`.

The proxy will then be used for
 * push
 * url previews
 * phone-home stats
 * recaptcha validation
 * CAS auth validation

It will *not* be used for:
 * Application Services
 * Identity servers
 * Outbound federation
 * In worker configurations, connections from workers to masters

Fixes #4198.
2019-11-01 14:07:44 +00:00
Andrew Morgan 54fef094b3
Remove usage of deprecated logger.warn method from codebase (#6271)
Replace every instance of `logger.warn` with `logger.warning` as the former is deprecated.
2019-10-31 10:23:24 +00:00
Michael Kaye e4d98188da Address codestyle concerns 2019-10-24 18:43:13 +01:00
Michael Kaye 8f4a808d9d Delay printf until logging is required.
Using % will cause the string to be generated even if debugging
is off.
2019-10-24 18:31:53 +01:00
Erik Johnston ca3e01e50d Fix store_url_cache using bytes 2019-10-10 14:52:29 +01:00
Andrew Morgan 2a44782666
Remove double return statements (#5962)
Remove all the "double return" statements which were a result of us removing all the instances of

```
defer.returnValue(...)
return
```

statements when we switched to python3 fully.
2019-09-03 11:42:45 +01:00
Amber Brown 4806651744
Replace returnValue with return (#5736) 2019-07-23 23:00:55 +10:00
Amber Brown 463b072b12
Move logging utilities out of the side drawer of util/ and into logging/ (#5606) 2019-07-04 00:07:04 +10:00
Amber Brown 0ee9076ffe Fix media repo breaking (#5593) 2019-07-02 19:01:28 +01:00
Amber Brown f40a7dc41f
Make the http server handle coroutine-making REST servlets (#5475) 2019-06-29 17:06:55 +10:00
Amber Brown 32e7c9e7f2
Run Black. (#5482) 2019-06-20 19:32:02 +10:00
Andrew Morgan 2f48c4e1ae
URL preview blacklisting fixes (#5155)
Prevents a SynapseError being raised inside of a IResolutionReceiver and instead opts to just return 0 results. This thus means that we have to lump a failed lookup and a blacklisted lookup together with the same error message, but the substitute should be generic enough to cover both cases.
2019-05-10 10:32:44 -07:00
Amber Brown ea6abf6724
Fix IP URL previews on Python 3 (#4215) 2018-12-22 01:56:13 +11:00
Amber Brown 8b1affe7d5
Fix Content-Disposition in media repository (#4176) 2018-11-15 15:55:58 -06:00
Amber Brown df758e155d
Use <meta> tags to discover the per-page encoding of html previews (#4183) 2018-11-15 11:05:08 -06:00
Amber Brown b3708830b8
Fix URL preview bugs (type error when loading cache from db, content-type including quotes) (#4157) 2018-11-08 01:37:43 +11:00
Richard van der Hoff ef771cc4c2 Fix a number of flake8 errors
Broadly three things here:

* disable W504 which seems a bit whacko
* remove a bunch of `as e` expressions from exception handlers that don't use
  them
* use `r""` for strings which include backslashes

Also, we don't use pep8 any more, so we can get rid of the duplicate config
there.
2018-10-24 10:39:03 +01:00
Erik Johnston f6a0a02a62 Fix bug where we raised StopIteration in a generator
This made python 3.7 unhappy
2018-10-17 16:10:52 +01:00
Erik Johnston 8601c24287 Fix some instances of ExpiringCache not expiring cache items
ExpiringCache required that `start()` be called before it would actually
start expiring entries. A number of places didn't do that.

This PR removes `start` from ExpiringCache, and automatically starts
backround reaping process on creation instead.
2018-09-21 14:19:46 +01:00
Amber Brown 02aa41809b
Port rest/ to Python 3 (#3823) 2018-09-12 20:41:31 +10:00
Amber Brown b37c472419
Rename async to async_helpers because `async` is a keyword on Python 3.7 (#3678) 2018-08-10 23:50:21 +10:00
Richard van der Hoff 03751a6420 Fix some looping_call calls which were broken in #3604
It turns out that looping_call does check the deferred returned by its
callback, and (at least in the case of client_ips), we were relying on this,
and I broke it in #3604.

Update run_as_background_process to return the deferred, and make sure we
return it to clock.looping_call.
2018-07-26 11:48:08 +01:00
Richard van der Hoff 371da42ae4 Wrap a number of things that run in the background
This will reduce the number of "Starting db connection from sentinel context"
warnings, and will help with our metrics.
2018-07-25 09:41:12 +01:00
Krombel 32fd6910d0 Use parse_{int,str} and assert from http.servlet
parse_integer and parse_string can take a request and raise errors
in case we have wrong or missing params.
This PR tries to use them more to deduplicate some code and make it
better readable
2018-07-13 21:40:14 +02:00
Amber Brown 49af402019 run isort 2018-07-09 16:09:20 +10:00
Amber Brown 6350bf925e
Attempt to be more performant on PyPy (#3462) 2018-06-28 14:49:57 +01:00
Adrian Tschira aafb0f6b0d py3-ize url preview 2018-05-19 17:35:20 +02:00
Richard van der Hoff 318711e139 Set Server header in SynapseRequest
(instead of everywhere that writes a response. Or rather, the subset of places
which write responses where we haven't forgotten it).

This also means that we don't have to have the mysterious version_string
attribute in anything with a request handler.

Unfortunately it does mean that we have to pass the version string wherever we
instantiate a SynapseSite, which has been c&ped 150 times, but that is code
that ought to be cleaned up anyway really.
2018-05-10 18:50:27 +01:00
Richard van der Hoff 645cb4bf06 Remove redundant request_handler decorator
This is needless complexity; we might as well use the wrapper directly.

Also rename wrap_request_handler->wrap_json_request_handler.
2018-05-10 12:19:53 +01:00
Richard van der Hoff 2a13af23bc Use run_in_background in preference to preserve_fn
While I was going through uses of preserve_fn for other PRs, I converted places
which only use the wrapped function once to use run_in_background, to avoid
creating the function object.
2018-04-27 12:55:51 +01:00
Erik Johnston fa72803490 Merge branch 'master' of github.com:matrix-org/synapse into develop 2018-03-19 11:41:01 +00:00
Erik Johnston 926ba76e23 Replace ujson with simplejson 2018-03-15 23:43:31 +00:00
Richard van der Hoff d5352cbba8 Handle url_previews with no content-type
avoid failing with an exception if the remote server doesn't give us a
Content-Type header.

Also, clean up the exception handling a bit.
2018-02-02 00:53:46 +00:00
Erik Johnston 6368e5c0ab Change _generate_thumbnails to take media_type 2018-01-16 16:17:38 +00:00
Erik Johnston 0a90d9ede4 Move setting of file_id up to caller 2018-01-16 16:03:05 +00:00
Erik Johnston 2442e9876c Make PreviewUrlResource use MediaStorage 2018-01-09 16:15:07 +00:00
Richard van der Hoff 5a4da5bf78
Merge pull request #2697 from matrix-org/rav/fix_urlcache_index_error
Fix error on sqlite 3.7
2017-11-27 12:25:48 +00:00
Richard van der Hoff 8132a6b7ac Fix OPTIONS on preview_url
Fixes #2706
2017-11-23 17:52:31 +00:00
Richard van der Hoff 2908f955d1 Check database in has_completed_background_updates
so that the right thing happens on workers.
2017-11-22 18:02:15 +00:00
Richard van der Hoff 7098b65cb8 Fix error on sqlite 3.7
Create the url_cache index on local_media_repository as a background update, so
that we can detect whether we are on sqlite or not and create a partial or
complete index accordingly.

To avoid running the cleanup job before we have built the index, add a bailout
which will defer the cleanup if the bg updates are still running.

Fixes https://github.com/matrix-org/synapse/issues/2572.
2017-11-21 11:14:17 +00:00
Richard van der Hoff 5d15abb120 Bit more logging 2017-11-10 16:58:04 +00:00
Richard van der Hoff 46790f50cf Cache failures in url_preview handler
Reshuffle the caching logic in the url_preview handler so that failures are
cached (and to generally simplify things and fix the logcontext leaks).
2017-11-10 16:50:50 +00:00
Maxime Vaillancourt 5287e57c86 Ignore noscript tags when generating URL previews 2017-10-25 20:44:34 -04:00
Richard van der Hoff eaaabc6c4f replace 'except:' with 'except Exception:'
what could possibly go wrong
2017-10-23 15:52:32 +01:00
Erik Johnston 2b24416e90 Don't reuse source but instead copy from primary media store to backup 2017-10-13 14:11:34 +01:00
Erik Johnston 505371414f Fix up thumbnailing function 2017-10-13 11:23:53 +01:00
Erik Johnston d76621a47b Fix comments 2017-10-12 18:16:25 +01:00
Erik Johnston 802ca12d05 Don't close file prematurely 2017-10-12 17:37:21 +01:00
Erik Johnston e283b555b1 Copy everything to backup 2017-10-12 17:31:24 +01:00
Erik Johnston d5694ac5fa Only log if we've removed media 2017-09-28 16:08:08 +01:00
Erik Johnston 7cc483aa0e Clear up expired url cache every 10s 2017-09-28 13:56:53 +01:00
Erik Johnston e1e7d76cf1 Actually assign result to variable 2017-09-28 13:55:29 +01:00
Erik Johnston 5f501ec7e2 Fix typo in url cache expiry timer 2017-09-28 12:59:01 +01:00
Erik Johnston ae79764fe5 Change expires column to expires_ts 2017-09-28 12:37:53 +01:00
Erik Johnston 9ccb4226ba Delete expired url cache data 2017-09-28 12:18:06 +01:00
Erik Johnston 7fe8ed1787 Store URL cache preview downloads seperately
This makes it easier to clear old media out at a later date
2017-06-23 11:14:11 +01:00
Matthew Hodgson 836d5c44b6 actually trim oversize og:description meta 2017-05-22 21:14:20 +01:00
Marcin Bachry 24c16fc349 Fix crash in url preview when html tag has no text
Signed-off-by: Marcin Bachry <hegel666@gmail.com>
2016-12-14 22:38:18 +01:00
Johannes Löthberg 32c8b5507c preview_url_resource: Ellipsis must be in unicode string
Signed-off-by: Johannes Löthberg <johannes@kyriasis.com>
2016-12-01 13:12:13 +01:00
Erik Johnston f90b3d83a3 Add None check to _iterate_over_text 2016-08-17 15:17:17 +01:00
Erik Johnston 109a560905 Flake8 2016-08-16 14:57:21 +01:00
Erik Johnston 48b5829aea Fix up preview URL API. Add tests.
This includes:

- Splitting out methods of a class into stand alone functions, to make
  them easier to test.
- Adding unit tests to split out functions, testing HTML -> preview.
- Handle the fact that elements in lxml may have tail text.
2016-08-16 14:53:24 +01:00
Erik Johnston 5bcccfde6c Don't include html comments in description 2016-08-05 14:45:11 +01:00
Erik Johnston b5525c76d1 Typo 2016-08-04 16:10:08 +01:00
Erik Johnston e97648c4e2 Test summarization 2016-08-04 16:09:09 +01:00
Erik Johnston 58c9653c6b Don't infer paragrahs from newlines 2016-08-02 18:50:24 +01:00
Erik Johnston 6b58ade2f0 Comment on why we clone 2016-08-02 18:41:22 +01:00
Erik Johnston 9e66c58ceb Spelling. 2016-08-02 18:37:31 +01:00
Erik Johnston f83f5fbce8 Make it actually compile 2016-08-02 18:32:42 +01:00
Erik Johnston aecaec3e10 Change the way we summarize URLs
Using XPath is slow on some machines (for unknown reasons), so use a
different approach to get a list of text nodes.

Try to generate a summary that respect paragraph and then word
boundaries, adding ellipses when appropriate.
2016-08-02 18:25:53 +01:00
Erik Johnston 09a17f965c Line lengths 2016-06-15 16:58:12 +01:00
Erik Johnston 1e9026e484 Handle floats as img widths 2016-06-15 16:58:05 +01:00
Erik Johnston a60169ea09 Handle og props with not content 2016-06-15 16:57:48 +01:00
Mark Haines eb79110beb Clean up the blacklist/whitelist handling.
Always set the config key with an empty list, even if a list isn't specified.
This means that the codepaths are the same for both the empty list and
for a missing key. Since the behaviour is the same for both cases this
makes the code somewhat easier to reason about.
2016-05-16 13:03:59 +01:00
Mark Haines 8d7ad44331 Report per request metrics for all of the things using request_handler 2016-04-28 10:57:49 +01:00
Erik Johnston e8884e5e9c Add self.media_repo to PreviewUrlResource 2016-04-19 14:51:34 +01:00
Erik Johnston a7001c311b _make_dirs was moved to MediaRepository 2016-04-19 14:49:31 +01:00
Erik Johnston 9181e2f4c7 Add store to PreviewUrlResource 2016-04-19 14:48:24 +01:00
Erik Johnston fb76a81ff7 Reorder imports 2016-04-19 14:45:05 +01:00
Erik Johnston 43f0941e8f Split out BaseMediaResource into MediaRepository
This is so that a single MediaRepository can be shared across all
resources, rather than having a "copy" per resource.

In particular this allows us to guard against both the thumbnail and
download resource triggering a download of remote content at the same
time.
2016-04-19 11:24:59 +01:00
Matthew Hodgson aaabbd3e9e explicitly pass in the charset from Content-Type to lxml to fix cyrillic woes better 2016-04-15 14:32:25 +01:00
Matthew Hodgson 84f9cac4d0 fix cyrillic URL previews by hardcoding all page decoding to UTF-8 for now, rather than relying on lxml's heuristics which seem to get it wrong 2016-04-15 13:20:08 +01:00
Matthew Hodgson f78b479118 fix urlparse import thinko breaking tiny URLs 2016-04-14 15:23:55 +01:00
Erik Johnston d0633e6dbe Sanitize the optional dependencies for spider API 2016-04-13 13:38:09 +01:00
Erik Johnston 17515bae14 PEP8 2016-04-11 11:02:50 +01:00
Matthew Hodgson 5ffacc5e84 fix typos and needless try/except from PR review 2016-04-11 10:39:16 +01:00
Matthew Hodgson 83b2f83da0 actually throw meaningful errors 2016-04-08 21:36:59 +01:00
Mark Haines b36270b5e1 Fix pep8 warning 2016-04-08 19:52:23 +01:00
Matthew Hodgson 1ccabe2965 more PR feedback 2016-04-08 18:58:08 +01:00
Matthew Hodgson dafef5a688 Add url_preview_enabled config option to turn on/off preview_url endpoint. defaults to off.
Add url_preview_ip_range_blacklist to let admins specify internal IP ranges that must not be spidered.
Add url_preview_url_blacklist to let admins specify URL patterns that must not be spidered.
Implement a custom SpiderEndpoint and associated support classes to implement url_preview_ip_range_blacklist
Add commentary and generally address PR feedback
2016-04-08 18:37:15 +01:00