d51b8a1674 
								
							
								 
							
						 
						
							
							
								
								Add quotes and be explicity about script-src  
							
							
							
						 
						
							2016-09-05 17:35:01 +01:00  
				
					
						
							
							
								 
						
							
							
								662b031a30 
								
							
								 
							
						 
						
							
							
								
								Allow PDF to be rendered from media repo  
							
							
							
						 
						
							2016-09-05 17:25:26 +01:00  
				
					
						
							
							
								 
						
							
							
								0af9e1a637 
								
							
								 
							
						 
						
							
							
								
								Set `Content-Security-Policy` on media repo  
							
							... 
							
							
							
							This is to inform browsers that they should sandbox the returned
media. This is particularly cruical for javascript/HTML files. 
							
						 
						
							2016-08-17 16:27:39 +01:00  
				
					
						
							
							
								 
						
							
							
								f90b3d83a3 
								
							
								 
							
						 
						
							
							
								
								Add None check to _iterate_over_text  
							
							
							
						 
						
							2016-08-17 15:17:17 +01:00  
				
					
						
							
							
								 
						
							
							
								109a560905 
								
							
								 
							
						 
						
							
							
								
								Flake8  
							
							
							
						 
						
							2016-08-16 14:57:21 +01:00  
				
					
						
							
							
								 
						
							
							
								48b5829aea 
								
							
								 
							
						 
						
							
							
								
								Fix up preview URL API. Add tests.  
							
							... 
							
							
							
							This includes:
- Splitting out methods of a class into stand alone functions, to make
  them easier to test.
- Adding unit tests to split out functions, testing HTML -> preview.
- Handle the fact that elements in lxml may have tail text. 
							
						 
						
							2016-08-16 14:53:24 +01:00  
				
					
						
							
							
								 
						
							
							
								5bcccfde6c 
								
							
								 
							
						 
						
							
							
								
								Don't include html comments in description  
							
							
							
						 
						
							2016-08-05 14:45:11 +01:00  
				
					
						
							
							
								 
						
							
							
								b5525c76d1 
								
							
								 
							
						 
						
							
							
								
								Typo  
							
							
							
						 
						
							2016-08-04 16:10:08 +01:00  
				
					
						
							
							
								 
						
							
							
								e97648c4e2 
								
							
								 
							
						 
						
							
							
								
								Test summarization  
							
							
							
						 
						
							2016-08-04 16:09:09 +01:00  
				
					
						
							
							
								 
						
							
							
								58c9653c6b 
								
							
								 
							
						 
						
							
							
								
								Don't infer paragrahs from newlines  
							
							
							
						 
						
							2016-08-02 18:50:24 +01:00  
				
					
						
							
							
								 
						
							
							
								6b58ade2f0 
								
							
								 
							
						 
						
							
							
								
								Comment on why we clone  
							
							
							
						 
						
							2016-08-02 18:41:22 +01:00  
				
					
						
							
							
								 
						
							
							
								9e66c58ceb 
								
							
								 
							
						 
						
							
							
								
								Spelling.  
							
							
							
						 
						
							2016-08-02 18:37:31 +01:00  
				
					
						
							
							
								 
						
							
							
								f83f5fbce8 
								
							
								 
							
						 
						
							
							
								
								Make it actually compile  
							
							
							
						 
						
							2016-08-02 18:32:42 +01:00  
				
					
						
							
							
								 
						
							
							
								aecaec3e10 
								
							
								 
							
						 
						
							
							
								
								Change the way we summarize URLs  
							
							... 
							
							
							
							Using XPath is slow on some machines (for unknown reasons), so use a
different approach to get a list of text nodes.
Try to generate a summary that respect paragraph and then word
boundaries, adding ellipses when appropriate. 
							
						 
						
							2016-08-02 18:25:53 +01:00  
				
					
						
							
							
								 
						
							
							
								f52cb4cd78 
								
							
								 
							
						 
						
							
							
								
								Remove race  
							
							
							
						 
						
							2016-06-29 15:24:50 +01:00  
				
					
						
							
							
								 
						
							
							
								a70688445d 
								
							
								 
							
						 
						
							
							
								
								Implement purge_media_cache admin API  
							
							
							
						 
						
							2016-06-29 14:57:59 +01:00  
				
					
						
							
							
								 
						
							
							
								314b146b2e 
								
							
								 
							
						 
						
							
							
								
								Track approximate last access time for remote media  
							
							
							
						 
						
							2016-06-29 11:41:20 +01:00  
				
					
						
							
							
								 
						
							
							
								13e334506c 
								
							
								 
							
						 
						
							
							
								
								Remove the legacy v0 content upload API.  
							
							... 
							
							
							
							The existing content can still be downloaded. The last upload to the
matrix.org server was in January 2015, so it is probably safe to remove
the upload API. 
							
						 
						
							2016-06-21 11:47:39 +01:00  
				
					
						
							
							
								 
						
							
							
								09a17f965c 
								
							
								 
							
						 
						
							
							
								
								Line lengths  
							
							
							
						 
						
							2016-06-15 16:58:12 +01:00  
				
					
						
							
							
								 
						
							
							
								1e9026e484 
								
							
								 
							
						 
						
							
							
								
								Handle floats as img widths  
							
							
							
						 
						
							2016-06-15 16:58:05 +01:00  
				
					
						
							
							
								 
						
							
							
								a60169ea09 
								
							
								 
							
						 
						
							
							
								
								Handle og props with not content  
							
							
							
						 
						
							2016-06-15 16:57:48 +01:00  
				
					
						
							
							
								 
						
							
							
								eba4ff1bcb 
								
							
								 
							
						 
						
							
							
								
								502 on /thumbnail when can't contact remote server  
							
							
							
						 
						
							2016-06-09 11:29:43 +01:00  
				
					
						
							
							
								 
						
							
							
								eb79110beb 
								
							
								 
							
						 
						
							
							
								
								Clean up the blacklist/whitelist handling.  
							
							... 
							
							
							
							Always set the config key with an empty list, even if a list isn't specified.
This means that the codepaths are the same for both the empty list and
for a missing key. Since the behaviour is the same for both cases this
makes the code somewhat easier to reason about. 
							
						 
						
							2016-05-16 13:03:59 +01:00  
				
					
						
							
							
								 
						
							
							
								8d7ad44331 
								
							
								 
							
						 
						
							
							
								
								Report per request metrics for all of the things using request_handler  
							
							
							
						 
						
							2016-04-28 10:57:49 +01:00  
				
					
						
							
							
								 
						
							
							
								e8884e5e9c 
								
							
								 
							
						 
						
							
							
								
								Add self.media_repo to PreviewUrlResource  
							
							
							
						 
						
							2016-04-19 14:51:34 +01:00  
				
					
						
							
							
								 
						
							
							
								a7001c311b 
								
							
								 
							
						 
						
							
							
								
								_make_dirs was moved to MediaRepository  
							
							
							
						 
						
							2016-04-19 14:49:31 +01:00  
				
					
						
							
							
								 
						
							
							
								9181e2f4c7 
								
							
								 
							
						 
						
							
							
								
								Add store to PreviewUrlResource  
							
							
							
						 
						
							2016-04-19 14:48:24 +01:00  
				
					
						
							
							
								 
						
							
							
								fb76a81ff7 
								
							
								 
							
						 
						
							
							
								
								Reorder imports  
							
							
							
						 
						
							2016-04-19 14:45:05 +01:00  
				
					
						
							
							
								 
						
							
							
								0c93df89b6 
								
							
								 
							
						 
						
							
							
								
								Move MediaRepository to media_repository module  
							
							
							
						 
						
							2016-04-19 11:31:43 +01:00  
				
					
						
							
							
								 
						
							
							
								43f0941e8f 
								
							
								 
							
						 
						
							
							
								
								Split out BaseMediaResource into MediaRepository  
							
							... 
							
							
							
							This is so that a single MediaRepository can be shared across all
resources, rather than having a "copy" per resource.
In particular this allows us to guard against both the thumbnail and
download resource triggering a download of remote content at the same
time. 
							
						 
						
							2016-04-19 11:24:59 +01:00  
				
					
						
							
							
								 
						
							
							
								aaabbd3e9e 
								
							
								 
							
						 
						
							
							
								
								explicitly pass in the charset from Content-Type to lxml to fix cyrillic woes better  
							
							
							
						 
						
							2016-04-15 14:32:25 +01:00  
				
					
						
							
							
								 
						
							
							
								84f9cac4d0 
								
							
								 
							
						 
						
							
							
								
								fix cyrillic URL previews by hardcoding all page decoding to UTF-8 for now, rather than relying on lxml's heuristics which seem to get it wrong  
							
							
							
						 
						
							2016-04-15 13:20:08 +01:00  
				
					
						
							
							
								 
						
							
							
								f78b479118 
								
							
								 
							
						 
						
							
							
								
								fix urlparse import thinko breaking tiny URLs  
							
							
							
						 
						
							2016-04-14 15:23:55 +01:00  
				
					
						
							
							
								 
						
							
							
								bd77216d06 
								
							
								 
							
						 
						
							
							
								
								comment out  2c838f6459 due to risk of  https://en.wikipedia.org/wiki/Billion_laughs  attacks - thanks @torhve  
							
							
							
						 
						
							2016-04-14 14:39:24 +01:00  
				
					
						
							
							
								 
						
							
							
								d0633e6dbe 
								
							
								 
							
						 
						
							
							
								
								Sanitize the optional dependencies for spider API  
							
							
							
						 
						
							2016-04-13 13:38:09 +01:00  
				
					
						
							
							
								 
						
							
							
								17515bae14 
								
							
								 
							
						 
						
							
							
								
								PEP8  
							
							
							
						 
						
							2016-04-11 11:02:50 +01:00  
				
					
						
							
							
								 
						
							
							
								5ffacc5e84 
								
							
								 
							
						 
						
							
							
								
								fix typos and needless try/except from PR review  
							
							
							
						 
						
							2016-04-11 10:39:16 +01:00  
				
					
						
							
							
								 
						
							
							
								83b2f83da0 
								
							
								 
							
						 
						
							
							
								
								actually throw meaningful errors  
							
							
							
						 
						
							2016-04-08 21:36:59 +01:00  
				
					
						
							
							
								 
						
							
							
								b36270b5e1 
								
							
								 
							
						 
						
							
							
								
								Fix pep8 warning  
							
							
							
						 
						
							2016-04-08 19:52:23 +01:00  
				
					
						
							
							
								 
						
							
							
								1ccabe2965 
								
							
								 
							
						 
						
							
							
								
								more PR feedback  
							
							
							
						 
						
							2016-04-08 18:58:08 +01:00  
				
					
						
							
							
								 
						
							
							
								dafef5a688 
								
							
								 
							
						 
						
							
							
								
								Add url_preview_enabled config option to turn on/off preview_url endpoint. defaults to off.  
							
							... 
							
							
							
							Add url_preview_ip_range_blacklist to let admins specify internal IP ranges that must not be spidered.
Add url_preview_url_blacklist to let admins specify URL patterns that must not be spidered.
Implement a custom SpiderEndpoint and associated support classes to implement url_preview_ip_range_blacklist
Add commentary and generally address PR feedback 
							
						 
						
							2016-04-08 18:37:15 +01:00  
				
					
						
							
							
								 
						
							
							
								cf51c4120e 
								
							
								 
							
						 
						
							
							
								
								report image size (bytewise) in OG meta  
							
							
							
						 
						
							2016-04-03 23:57:05 +01:00  
				
					
						
							
							
								 
						
							
							
								0834b152fb 
								
							
								 
							
						 
						
							
							
								
								char encoding  
							
							
							
						 
						
							2016-04-03 12:59:27 +01:00  
				
					
						
							
							
								 
						
							
							
								8b98a7e8c3 
								
							
								 
							
						 
						
							
							
								
								pep8  
							
							
							
						 
						
							2016-04-03 12:56:29 +01:00  
				
					
						
							
							
								 
						
							
							
								eab4d462f8 
								
							
								 
							
						 
						
							
							
								
								fix etag typing error. fix timestamp typing error  
							
							
							
						 
						
							2016-04-03 02:02:46 +01:00  
				
					
						
							
							
								 
						
							
							
								c3916462f6 
								
							
								 
							
						 
						
							
							
								
								rebase all image URLs  
							
							
							
						 
						
							2016-04-03 01:33:12 +01:00  
				
					
						
							
							
								 
						
							
							
								110780b18b 
								
							
								 
							
						 
						
							
							
								
								remove stale todo  
							
							
							
						 
						
							2016-04-03 00:48:31 +01:00  
				
					
						
							
							
								 
						
							
							
								b09e29a03c 
								
							
								 
							
						 
						
							
							
								
								Ensure only one download for a given URL is active at a time  
							
							
							
						 
						
							2016-04-03 00:47:40 +01:00  
				
					
						
							
							
								 
						
							
							
								7426c86eb8 
								
							
								 
							
						 
						
							
							
								
								add a persistent cache of URL lookups, and fix up the in-memory one to work  
							
							
							
						 
						
							2016-04-03 00:31:57 +01:00  
				
					
						
							
							
								 
						
							
							
								d1b154a10f 
								
							
								 
							
						 
						
							
							
								
								support gzip compression, and don't pass through error msgs  
							
							
							
						 
						
							2016-04-02 03:06:39 +01:00