Commit Graph

254 Commits (94f7befc31b65431270f1dd462113fdeb334545e)

Author SHA1 Message Date
Erik Johnston 505371414f Fix up thumbnailing function 2017-10-13 11:23:53 +01:00
Erik Johnston e3428d26ca Fix typo 2017-10-13 10:39:59 +01:00
Erik Johnston 35332298ef Fix up comments 2017-10-13 10:39:32 +01:00
Erik Johnston 64db043a71 Move makedirs to thread 2017-10-13 10:25:01 +01:00
Erik Johnston b60859d6cc Use make_deferred_yieldable 2017-10-13 10:24:19 +01:00
Erik Johnston d76621a47b Fix comments 2017-10-12 18:16:25 +01:00
Erik Johnston 4ae85ae121 Don't close prematurely.. 2017-10-12 17:57:31 +01:00
Erik Johnston cc505b4b5e getvalue closes buffer 2017-10-12 17:52:30 +01:00
Erik Johnston 1259a76047 Get len before close 2017-10-12 17:39:23 +01:00
Erik Johnston 802ca12d05 Don't close file prematurely 2017-10-12 17:37:21 +01:00
Erik Johnston e283b555b1 Copy everything to backup 2017-10-12 17:31:24 +01:00
Erik Johnston b77a13812c Typo 2017-10-12 15:32:32 +01:00
Erik Johnston 6dfde6d485 Remove dead code 2017-10-12 15:30:26 +01:00
Erik Johnston c8eeef6947 Fix typos 2017-10-12 15:28:24 +01:00
Erik Johnston 67cb89fbdf Fix typo 2017-10-12 15:23:41 +01:00
Erik Johnston bf4fb1fb40 Basic implementation of backup media store 2017-10-12 15:20:59 +01:00
Erik Johnston d5694ac5fa Only log if we've removed media 2017-09-28 16:08:08 +01:00
Erik Johnston 7cc483aa0e Clear up expired url cache every 10s 2017-09-28 13:56:53 +01:00
Erik Johnston e1e7d76cf1 Actually assign result to variable 2017-09-28 13:55:29 +01:00
Erik Johnston 5f501ec7e2 Fix typo in url cache expiry timer 2017-09-28 12:59:01 +01:00
Erik Johnston ace8079086 Support new and old style media id formats 2017-09-28 12:52:51 +01:00
Erik Johnston ae79764fe5 Change expires column to expires_ts 2017-09-28 12:37:53 +01:00
Erik Johnston 9ccb4226ba Delete expired url cache data 2017-09-28 12:18:06 +01:00
Erik Johnston 7fe8ed1787 Store URL cache preview downloads seperately
This makes it easier to clear old media out at a later date
2017-06-23 11:14:11 +01:00
Erik Johnston b8b936a6ea Add API to quarantine media 2017-06-19 17:39:21 +01:00
Erik Johnston 48d2949416 Throw exception when not retrying when downloading media 2017-06-13 10:23:14 +01:00
Matthew Hodgson 836d5c44b6 actually trim oversize og:description meta 2017-05-22 21:14:20 +01:00
Erik Johnston d12ae7fd1c Don't log exceptions for NotRetryingDestination 2017-05-15 15:42:18 +01:00
Richard van der Hoff 1d09586599 Address review comments
- don't blindly proxy all HTTPRequestExceptions
- log unexpected exceptions at error
- avoid `isinstance`
- improve docs on `from_http_response_exception`
2017-03-14 14:15:37 +00:00
Richard van der Hoff 170ccc9de5 Fix routing loop when fetching remote media
When we proxy a media request to a remote server, add a query-param, which will
tell the remote server to 404 if it doesn't recognise the server_name.

This should fix a routing loop where the server keeps forwarding back to
itself.

Also improves the error handling on remote media fetches, so that we don't
always return a rather obscure 502.
2017-03-13 16:30:36 +00:00
Jurek aea5461488 Fix dynamic thumbnails aspect 2017-02-24 22:43:27 +01:00
Mark Haines 32019c9897 Log which files we saved attachments to in the media_repository 2017-01-10 14:19:50 +00:00
Erik Johnston f7085ac84f Name linearizer's for better logs 2017-01-09 17:17:10 +00:00
Marcin Bachry 24c16fc349 Fix crash in url preview when html tag has no text
Signed-off-by: Marcin Bachry <hegel666@gmail.com>
2016-12-14 22:38:18 +01:00
Johannes Löthberg 32c8b5507c preview_url_resource: Ellipsis must be in unicode string
Signed-off-by: Johannes Löthberg <johannes@kyriasis.com>
2016-12-01 13:12:13 +01:00
Mark Haines b1c27975d0 Set CORs headers on responses from the media repo 2016-11-02 11:29:25 +00:00
Erik Johnston d51b8a1674 Add quotes and be explicity about script-src 2016-09-05 17:35:01 +01:00
Erik Johnston 662b031a30 Allow PDF to be rendered from media repo 2016-09-05 17:25:26 +01:00
Erik Johnston 0af9e1a637 Set `Content-Security-Policy` on media repo
This is to inform browsers that they should sandbox the returned
media. This is particularly cruical for javascript/HTML files.
2016-08-17 16:27:39 +01:00
Erik Johnston f90b3d83a3 Add None check to _iterate_over_text 2016-08-17 15:17:17 +01:00
Erik Johnston 109a560905 Flake8 2016-08-16 14:57:21 +01:00
Erik Johnston 48b5829aea Fix up preview URL API. Add tests.
This includes:

- Splitting out methods of a class into stand alone functions, to make
  them easier to test.
- Adding unit tests to split out functions, testing HTML -> preview.
- Handle the fact that elements in lxml may have tail text.
2016-08-16 14:53:24 +01:00
Erik Johnston 5bcccfde6c Don't include html comments in description 2016-08-05 14:45:11 +01:00
Erik Johnston b5525c76d1 Typo 2016-08-04 16:10:08 +01:00
Erik Johnston e97648c4e2 Test summarization 2016-08-04 16:09:09 +01:00
Erik Johnston 58c9653c6b Don't infer paragrahs from newlines 2016-08-02 18:50:24 +01:00
Erik Johnston 6b58ade2f0 Comment on why we clone 2016-08-02 18:41:22 +01:00
Erik Johnston 9e66c58ceb Spelling. 2016-08-02 18:37:31 +01:00
Erik Johnston f83f5fbce8 Make it actually compile 2016-08-02 18:32:42 +01:00
Erik Johnston aecaec3e10 Change the way we summarize URLs
Using XPath is slow on some machines (for unknown reasons), so use a
different approach to get a list of text nodes.

Try to generate a summary that respect paragraph and then word
boundaries, adding ellipses when appropriate.
2016-08-02 18:25:53 +01:00
Erik Johnston f52cb4cd78 Remove race 2016-06-29 15:24:50 +01:00
Erik Johnston a70688445d Implement purge_media_cache admin API 2016-06-29 14:57:59 +01:00
Erik Johnston 314b146b2e Track approximate last access time for remote media 2016-06-29 11:41:20 +01:00
Mark Haines 13e334506c Remove the legacy v0 content upload API.
The existing content can still be downloaded. The last upload to the
matrix.org server was in January 2015, so it is probably safe to remove
the upload API.
2016-06-21 11:47:39 +01:00
Erik Johnston 09a17f965c Line lengths 2016-06-15 16:58:12 +01:00
Erik Johnston 1e9026e484 Handle floats as img widths 2016-06-15 16:58:05 +01:00
Erik Johnston a60169ea09 Handle og props with not content 2016-06-15 16:57:48 +01:00
Erik Johnston eba4ff1bcb 502 on /thumbnail when can't contact remote server 2016-06-09 11:29:43 +01:00
Mark Haines eb79110beb Clean up the blacklist/whitelist handling.
Always set the config key with an empty list, even if a list isn't specified.
This means that the codepaths are the same for both the empty list and
for a missing key. Since the behaviour is the same for both cases this
makes the code somewhat easier to reason about.
2016-05-16 13:03:59 +01:00
Mark Haines 8d7ad44331 Report per request metrics for all of the things using request_handler 2016-04-28 10:57:49 +01:00
Erik Johnston e8884e5e9c Add self.media_repo to PreviewUrlResource 2016-04-19 14:51:34 +01:00
Erik Johnston a7001c311b _make_dirs was moved to MediaRepository 2016-04-19 14:49:31 +01:00
Erik Johnston 9181e2f4c7 Add store to PreviewUrlResource 2016-04-19 14:48:24 +01:00
Erik Johnston fb76a81ff7 Reorder imports 2016-04-19 14:45:05 +01:00
Erik Johnston 0c93df89b6 Move MediaRepository to media_repository module 2016-04-19 11:31:43 +01:00
Erik Johnston 43f0941e8f Split out BaseMediaResource into MediaRepository
This is so that a single MediaRepository can be shared across all
resources, rather than having a "copy" per resource.

In particular this allows us to guard against both the thumbnail and
download resource triggering a download of remote content at the same
time.
2016-04-19 11:24:59 +01:00
Matthew Hodgson aaabbd3e9e explicitly pass in the charset from Content-Type to lxml to fix cyrillic woes better 2016-04-15 14:32:25 +01:00
Matthew Hodgson 84f9cac4d0 fix cyrillic URL previews by hardcoding all page decoding to UTF-8 for now, rather than relying on lxml's heuristics which seem to get it wrong 2016-04-15 13:20:08 +01:00
Matthew Hodgson f78b479118 fix urlparse import thinko breaking tiny URLs 2016-04-14 15:23:55 +01:00
Matthew Hodgson bd77216d06 comment out 2c838f6459 due to risk of https://en.wikipedia.org/wiki/Billion_laughs attacks - thanks @torhve 2016-04-14 14:39:24 +01:00
Erik Johnston d0633e6dbe Sanitize the optional dependencies for spider API 2016-04-13 13:38:09 +01:00
Erik Johnston 17515bae14 PEP8 2016-04-11 11:02:50 +01:00
Matthew Hodgson 5ffacc5e84 fix typos and needless try/except from PR review 2016-04-11 10:39:16 +01:00
Matthew Hodgson 83b2f83da0 actually throw meaningful errors 2016-04-08 21:36:59 +01:00
Mark Haines b36270b5e1 Fix pep8 warning 2016-04-08 19:52:23 +01:00
Matthew Hodgson 1ccabe2965 more PR feedback 2016-04-08 18:58:08 +01:00
Matthew Hodgson dafef5a688 Add url_preview_enabled config option to turn on/off preview_url endpoint. defaults to off.
Add url_preview_ip_range_blacklist to let admins specify internal IP ranges that must not be spidered.
Add url_preview_url_blacklist to let admins specify URL patterns that must not be spidered.
Implement a custom SpiderEndpoint and associated support classes to implement url_preview_ip_range_blacklist
Add commentary and generally address PR feedback
2016-04-08 18:37:15 +01:00
Matthew Hodgson cf51c4120e report image size (bytewise) in OG meta 2016-04-03 23:57:05 +01:00
Matthew Hodgson 0834b152fb char encoding 2016-04-03 12:59:27 +01:00
Matthew Hodgson 8b98a7e8c3 pep8 2016-04-03 12:56:29 +01:00
Matthew Hodgson eab4d462f8 fix etag typing error. fix timestamp typing error 2016-04-03 02:02:46 +01:00
Matthew Hodgson c3916462f6 rebase all image URLs 2016-04-03 01:33:12 +01:00
Matthew Hodgson 110780b18b remove stale todo 2016-04-03 00:48:31 +01:00
Matthew Hodgson b09e29a03c Ensure only one download for a given URL is active at a time 2016-04-03 00:47:40 +01:00
Matthew Hodgson 7426c86eb8 add a persistent cache of URL lookups, and fix up the in-memory one to work 2016-04-03 00:31:57 +01:00
Matthew Hodgson d1b154a10f support gzip compression, and don't pass through error msgs 2016-04-02 03:06:39 +01:00
Matthew Hodgson 9377157961 how was _respond_default_thumbnail ever meant to work? 2016-04-02 02:31:45 +01:00
Matthew Hodgson 2c838f6459 pass back SVGs as their own thumbnails 2016-04-02 02:30:07 +01:00
Matthew Hodgson 5037ee0d37 handle missing dimensions without crashing 2016-04-02 02:29:57 +01:00
Matthew Hodgson b26e8604f1 make meta comparisons case insensitive 2016-04-02 01:35:44 +01:00
Matthew Hodgson 5fd07da764 refactor calc_og; spider image URLs; fix xpath; add a (broken) expiringcache; loads of other fixes 2016-04-02 00:35:49 +01:00
Matthew Hodgson c60b751694 fix assorted redirect, unicode and screenscraping bugs 2016-04-01 02:17:48 +01:00
Matthew Hodgson 683e564815 handle spidered relative images correctly 2016-03-31 23:52:58 +01:00
Matthew Hodgson 72550c3803 prevent choking on invalid utf-8, and handle image thumbnailing smarter 2016-03-31 15:14:14 +01:00
Matthew Hodgson bb9a2ca87c synthesise basig OG metadata from pages lacking it 2016-03-31 14:15:09 +01:00
Matthew Hodgson a8a5dd3b44 handle requests with missing content-length headers (e.g. YouTube) 2016-03-31 01:55:21 +01:00
Matthew Hodgson ae5831d303 fix bugs 2016-03-29 03:32:55 +01:00
Matthew Hodgson 19038582d3 debug 2016-03-29 03:14:16 +01:00
Matthew Hodgson 64b4aead15 make it work 2016-03-29 03:13:25 +01:00
Matthew Hodgson dd4287ca5d make it build 2016-03-29 02:07:57 +01:00
Matthew Hodgson d9d48aad2d Merge branch 'develop' into matthew/preview_urls 2016-03-27 22:54:42 +01:00
Mark Haines 58c9f20692 Catch the exceptions thrown by twisted when you write to a closed connection 2016-02-12 13:46:59 +00:00
Erik Johnston 33c71c3a4b Preserve log context over when deferring to thread pool in media repo 2016-02-03 16:17:18 +00:00
Matthew Hodgson 7dd0c1730a initial WIP of a tentative preview_url endpoint - incomplete, untested, experimental, etc. just putting it here for safekeeping for now 2016-01-24 18:47:27 -05:00
Daniel Wagner-Hall 42aa1f3f33 Merge pull request #478 from matrix-org/daniel/userobject
Introduce a User object

I'm sick of passing around more and more things as tuple items around
the whole world, and needing to edit every call site every time there is
more information about a user. So pass them around together as an
object.

This object has incredibly poorly named fields because we have a
convention that `user` indicates a UserID object, and `user_id`
indicates a string. I tried to clean up the whole repo to fix this, but
gave up. So instead, I introduce a second convention. A user_object is a
User, and a user_id_object is a UserId. I may have cried a little bit.
2016-01-11 17:50:22 +00:00
Daniel Wagner-Hall 2110e35fd6 Introduce a Requester object
This tracks data about the entity which made the request. This is
instead of passing around a tuple, which requires call-site
modifications every time a new piece of optional context is passed
around.

I tried to introduce a User object. I gave up.
2016-01-11 17:48:45 +00:00
Mark Haines 8677b7d698 Only use cropped thumbnails when asked for a cropped thumbnail.
Even though ones cropped with scale might be technically valid.
2016-01-07 18:57:15 +00:00
Matthew Hodgson 6c28ac260c copyrights 2016-01-07 04:26:29 +00:00
Daniel Wagner-Hall cfd07aafff Allow guests to upgrade their accounts 2016-01-05 18:01:18 +00:00
Erik Johnston 8737ead008 Use larger thumbnail rather than smaller. 2016-01-05 17:30:30 +00:00
Daniel Wagner-Hall f522f50a08 Allow guests to register and call /events?room_id=
This follows the same flows-based flow as regular registration, but as
the only implemented flow has no requirements, it auto-succeeds. In the
future, other flows (e.g. captcha) may be required, so clients should
treat this like the regular registration flow choices.
2015-11-04 17:29:07 +00:00
Mark Haines a7122692d9 Merge branch 'release-v0.10.0' into develop
Conflicts:
	synapse/handlers/auth.py
	synapse/python_dependencies.py
	synapse/rest/client/v1/login.py
2015-08-28 11:15:27 +01:00
Erik Johnston ddf4d2bd98 Consistency 2015-08-27 10:50:49 +01:00
Erik Johnston 66ec6cf9b8 Check for an internationalised filename first 2015-08-27 10:48:58 +01:00
Erik Johnston 53c2eed862 None check the correct variable 2015-08-27 10:38:22 +01:00
Erik Johnston f02532baad Check for None 2015-08-27 10:37:02 +01:00
Mark Haines c9cb354b58 Give a sensible error message if the filename is invalid UTF-8 2015-08-26 17:27:23 +01:00
Mark Haines 5a9e0c3682 Handle unicode filenames given when downloading or received over federation 2015-08-26 17:08:47 +01:00
Mark Haines e85c7873dc Allow non-ascii filenames for attachments 2015-08-26 16:26:37 +01:00
Daniel Wagner-Hall a0b181bd17 Remove completely unused concepts from codebase
Removes device_id and ClientInfo

device_id is never actually written, and the matrix.org DB has no
non-null entries for it. Right now, it's just cluttering up code.

This doesn't remove the columns from the database, because that's
fiddly.
2015-08-25 16:23:06 +01:00
Mark Haines b16cd18a86 Merge remote-tracking branch 'origin/develop' into erikj/generate_presice_thumbnails 2015-08-13 17:23:39 +01:00
Mark Haines fdb724cb70 Add config option for setting the list of thumbnail sizes to precalculate 2015-08-12 10:55:27 +01:00
Mark Haines 7e3d1c7d92 Make a config option for whether to generate new thumbnail sizes dynamically 2015-08-12 10:54:38 +01:00
Erik Johnston 459085184c Factor out thumbnail() 2015-07-23 15:59:53 +01:00
Erik Johnston 2b4f47db9c Generate local thumbnails on a thread 2015-07-23 14:52:29 +01:00
Erik Johnston 33d83f3615 Fix remote thumbnailing 2015-07-23 14:24:21 +01:00
Erik Johnston ff7c2e41de Always return a thumbnail of the requested size.
Before, we returned a thumbnail that was at least as big (if possible)
as the requested size. Now, if we don't have a thumbnail of the given
size we generate (and persist) one of that size.
2015-07-23 14:12:49 +01:00
Erik Johnston 103e1c2431 Pick larger than desired thumbnail for 'crop' 2015-07-23 11:12:49 +01:00
Matthew Hodgson 8cedf3ce95 bump up image quality a bit more as it looks crap 2015-07-14 23:53:13 +01:00
Erik Johnston 12b83f1a0d If user supplies filename in URL when downloading from media repo, use that name in Content Disposition 2015-07-03 11:24:55 +01:00
Erik Johnston 9beaedd164 Enforce ascii filenames for uploads 2015-06-30 10:31:59 +01:00
Erik Johnston 2124f668db Add Content-Disposition headers to media repo v1 downloads 2015-06-30 09:35:44 +01:00
Erik Johnston 5730b20c6d Merge pull request #175 from matrix-org/erikj/thumbnail_thread
Thumbnail images on a seperate thread
2015-06-03 17:26:56 +01:00
Erik Johnston 2ef2f6d593 SYN-403: Make content repository use its own http client. 2015-06-03 10:17:37 +01:00
Erik Johnston 5044e6c544 Thumbnail images on a seperate thread 2015-06-02 15:39:08 +01:00
Erik Johnston fca28d243e Change the way we create observers to deferreds so that we don't get spammed by 'unhandled errors' 2015-05-08 16:28:08 +01:00
Erik Johnston e701aec2d1 Implement locks using create_observer for fetching media and server keys 2015-04-27 14:20:26 +01:00
Mark Haines 4e2f8b8722 Copyright notices 2015-04-24 10:35:29 +01:00
Mark Haines 812a99100b Set a version_string in BaseMediaResource so that the request_handler wrapper works 2015-04-21 16:43:58 +01:00
Mark Haines 1967650bc4 Combine the request wrappers in rest/media/v1 and http/server into a single wrapper decorator 2015-04-21 16:35:53 +01:00
Mark Haines 6375bd3e33 SYN-282: Don't log tracebacks for client errors 2015-02-18 12:01:37 +00:00
Erik Johnston 4ebbaf0d43 Blunty replace json with simplejson 2015-02-11 14:23:10 +00:00
Mark Haines 84a769cdb7 Fix code-style 2015-02-10 17:58:36 +00:00
Mark Haines b085fac735 Code-style fixes 2015-02-10 16:30:48 +00:00
Matthew Hodgson 582019f870 ...and here's the actual impl. git fail. 2015-02-07 13:32:14 +00:00
Matthew Hodgson e117bc3fc5 thou shalt specify a content-length 2015-02-07 12:56:21 +00:00
Matthew Hodgson 34c39398fa i hate weakly typed languages 2015-02-07 12:55:13 +00:00
Mark Haines 1bb0528316 Add Cache-Control header to identicon 2015-02-02 16:57:26 +00:00
Mark Haines f2eda123b7 Fix setting identicon width and height 2015-02-02 16:32:33 +00:00
Mark Haines 038f5afb07 Spell height more correctly 2015-02-02 16:29:18 +00:00