d51b8a1674 
								
							
								 
							
						 
						
							
							
								
								Add quotes and be explicity about script-src  
							
							
							
						 
						
							2016-09-05 17:35:01 +01:00  
				
					
						
							
							
								 
						
							
							
								662b031a30 
								
							
								 
							
						 
						
							
							
								
								Allow PDF to be rendered from media repo  
							
							
							
						 
						
							2016-09-05 17:25:26 +01:00  
				
					
						
							
							
								 
						
							
							
								0af9e1a637 
								
							
								 
							
						 
						
							
							
								
								Set `Content-Security-Policy` on media repo  
							
							... 
							
							
							
							This is to inform browsers that they should sandbox the returned
media. This is particularly cruical for javascript/HTML files. 
							
						 
						
							2016-08-17 16:27:39 +01:00  
				
					
						
							
							
								 
						
							
							
								f90b3d83a3 
								
							
								 
							
						 
						
							
							
								
								Add None check to _iterate_over_text  
							
							
							
						 
						
							2016-08-17 15:17:17 +01:00  
				
					
						
							
							
								 
						
							
							
								109a560905 
								
							
								 
							
						 
						
							
							
								
								Flake8  
							
							
							
						 
						
							2016-08-16 14:57:21 +01:00  
				
					
						
							
							
								 
						
							
							
								48b5829aea 
								
							
								 
							
						 
						
							
							
								
								Fix up preview URL API. Add tests.  
							
							... 
							
							
							
							This includes:
- Splitting out methods of a class into stand alone functions, to make
  them easier to test.
- Adding unit tests to split out functions, testing HTML -> preview.
- Handle the fact that elements in lxml may have tail text. 
							
						 
						
							2016-08-16 14:53:24 +01:00  
				
					
						
							
							
								 
						
							
							
								5bcccfde6c 
								
							
								 
							
						 
						
							
							
								
								Don't include html comments in description  
							
							
							
						 
						
							2016-08-05 14:45:11 +01:00  
				
					
						
							
							
								 
						
							
							
								b5525c76d1 
								
							
								 
							
						 
						
							
							
								
								Typo  
							
							
							
						 
						
							2016-08-04 16:10:08 +01:00  
				
					
						
							
							
								 
						
							
							
								e97648c4e2 
								
							
								 
							
						 
						
							
							
								
								Test summarization  
							
							
							
						 
						
							2016-08-04 16:09:09 +01:00  
				
					
						
							
							
								 
						
							
							
								58c9653c6b 
								
							
								 
							
						 
						
							
							
								
								Don't infer paragrahs from newlines  
							
							
							
						 
						
							2016-08-02 18:50:24 +01:00  
				
					
						
							
							
								 
						
							
							
								6b58ade2f0 
								
							
								 
							
						 
						
							
							
								
								Comment on why we clone  
							
							
							
						 
						
							2016-08-02 18:41:22 +01:00  
				
					
						
							
							
								 
						
							
							
								9e66c58ceb 
								
							
								 
							
						 
						
							
							
								
								Spelling.  
							
							
							
						 
						
							2016-08-02 18:37:31 +01:00  
				
					
						
							
							
								 
						
							
							
								f83f5fbce8 
								
							
								 
							
						 
						
							
							
								
								Make it actually compile  
							
							
							
						 
						
							2016-08-02 18:32:42 +01:00  
				
					
						
							
							
								 
						
							
							
								aecaec3e10 
								
							
								 
							
						 
						
							
							
								
								Change the way we summarize URLs  
							
							... 
							
							
							
							Using XPath is slow on some machines (for unknown reasons), so use a
different approach to get a list of text nodes.
Try to generate a summary that respect paragraph and then word
boundaries, adding ellipses when appropriate. 
							
						 
						
							2016-08-02 18:25:53 +01:00  
				
					
						
							
							
								 
						
							
							
								f52cb4cd78 
								
							
								 
							
						 
						
							
							
								
								Remove race  
							
							
							
						 
						
							2016-06-29 15:24:50 +01:00  
				
					
						
							
							
								 
						
							
							
								a70688445d 
								
							
								 
							
						 
						
							
							
								
								Implement purge_media_cache admin API  
							
							
							
						 
						
							2016-06-29 14:57:59 +01:00  
				
					
						
							
							
								 
						
							
							
								314b146b2e 
								
							
								 
							
						 
						
							
							
								
								Track approximate last access time for remote media  
							
							
							
						 
						
							2016-06-29 11:41:20 +01:00  
				
					
						
							
							
								 
						
							
							
								09a17f965c 
								
							
								 
							
						 
						
							
							
								
								Line lengths  
							
							
							
						 
						
							2016-06-15 16:58:12 +01:00  
				
					
						
							
							
								 
						
							
							
								1e9026e484 
								
							
								 
							
						 
						
							
							
								
								Handle floats as img widths  
							
							
							
						 
						
							2016-06-15 16:58:05 +01:00  
				
					
						
							
							
								 
						
							
							
								a60169ea09 
								
							
								 
							
						 
						
							
							
								
								Handle og props with not content  
							
							
							
						 
						
							2016-06-15 16:57:48 +01:00  
				
					
						
							
							
								 
						
							
							
								eba4ff1bcb 
								
							
								 
							
						 
						
							
							
								
								502 on /thumbnail when can't contact remote server  
							
							
							
						 
						
							2016-06-09 11:29:43 +01:00  
				
					
						
							
							
								 
						
							
							
								eb79110beb 
								
							
								 
							
						 
						
							
							
								
								Clean up the blacklist/whitelist handling.  
							
							... 
							
							
							
							Always set the config key with an empty list, even if a list isn't specified.
This means that the codepaths are the same for both the empty list and
for a missing key. Since the behaviour is the same for both cases this
makes the code somewhat easier to reason about. 
							
						 
						
							2016-05-16 13:03:59 +01:00  
				
					
						
							
							
								 
						
							
							
								8d7ad44331 
								
							
								 
							
						 
						
							
							
								
								Report per request metrics for all of the things using request_handler  
							
							
							
						 
						
							2016-04-28 10:57:49 +01:00  
				
					
						
							
							
								 
						
							
							
								e8884e5e9c 
								
							
								 
							
						 
						
							
							
								
								Add self.media_repo to PreviewUrlResource  
							
							
							
						 
						
							2016-04-19 14:51:34 +01:00  
				
					
						
							
							
								 
						
							
							
								a7001c311b 
								
							
								 
							
						 
						
							
							
								
								_make_dirs was moved to MediaRepository  
							
							
							
						 
						
							2016-04-19 14:49:31 +01:00  
				
					
						
							
							
								 
						
							
							
								9181e2f4c7 
								
							
								 
							
						 
						
							
							
								
								Add store to PreviewUrlResource  
							
							
							
						 
						
							2016-04-19 14:48:24 +01:00  
				
					
						
							
							
								 
						
							
							
								fb76a81ff7 
								
							
								 
							
						 
						
							
							
								
								Reorder imports  
							
							
							
						 
						
							2016-04-19 14:45:05 +01:00  
				
					
						
							
							
								 
						
							
							
								0c93df89b6 
								
							
								 
							
						 
						
							
							
								
								Move MediaRepository to media_repository module  
							
							
							
						 
						
							2016-04-19 11:31:43 +01:00  
				
					
						
							
							
								 
						
							
							
								43f0941e8f 
								
							
								 
							
						 
						
							
							
								
								Split out BaseMediaResource into MediaRepository  
							
							... 
							
							
							
							This is so that a single MediaRepository can be shared across all
resources, rather than having a "copy" per resource.
In particular this allows us to guard against both the thumbnail and
download resource triggering a download of remote content at the same
time. 
							
						 
						
							2016-04-19 11:24:59 +01:00  
				
					
						
							
							
								 
						
							
							
								aaabbd3e9e 
								
							
								 
							
						 
						
							
							
								
								explicitly pass in the charset from Content-Type to lxml to fix cyrillic woes better  
							
							
							
						 
						
							2016-04-15 14:32:25 +01:00  
				
					
						
							
							
								 
						
							
							
								84f9cac4d0 
								
							
								 
							
						 
						
							
							
								
								fix cyrillic URL previews by hardcoding all page decoding to UTF-8 for now, rather than relying on lxml's heuristics which seem to get it wrong  
							
							
							
						 
						
							2016-04-15 13:20:08 +01:00  
				
					
						
							
							
								 
						
							
							
								f78b479118 
								
							
								 
							
						 
						
							
							
								
								fix urlparse import thinko breaking tiny URLs  
							
							
							
						 
						
							2016-04-14 15:23:55 +01:00  
				
					
						
							
							
								 
						
							
							
								bd77216d06 
								
							
								 
							
						 
						
							
							
								
								comment out  2c838f6459 due to risk of  https://en.wikipedia.org/wiki/Billion_laughs  attacks - thanks @torhve  
							
							
							
						 
						
							2016-04-14 14:39:24 +01:00  
				
					
						
							
							
								 
						
							
							
								d0633e6dbe 
								
							
								 
							
						 
						
							
							
								
								Sanitize the optional dependencies for spider API  
							
							
							
						 
						
							2016-04-13 13:38:09 +01:00  
				
					
						
							
							
								 
						
							
							
								17515bae14 
								
							
								 
							
						 
						
							
							
								
								PEP8  
							
							
							
						 
						
							2016-04-11 11:02:50 +01:00  
				
					
						
							
							
								 
						
							
							
								5ffacc5e84 
								
							
								 
							
						 
						
							
							
								
								fix typos and needless try/except from PR review  
							
							
							
						 
						
							2016-04-11 10:39:16 +01:00  
				
					
						
							
							
								 
						
							
							
								83b2f83da0 
								
							
								 
							
						 
						
							
							
								
								actually throw meaningful errors  
							
							
							
						 
						
							2016-04-08 21:36:59 +01:00  
				
					
						
							
							
								 
						
							
							
								b36270b5e1 
								
							
								 
							
						 
						
							
							
								
								Fix pep8 warning  
							
							
							
						 
						
							2016-04-08 19:52:23 +01:00  
				
					
						
							
							
								 
						
							
							
								1ccabe2965 
								
							
								 
							
						 
						
							
							
								
								more PR feedback  
							
							
							
						 
						
							2016-04-08 18:58:08 +01:00  
				
					
						
							
							
								 
						
							
							
								dafef5a688 
								
							
								 
							
						 
						
							
							
								
								Add url_preview_enabled config option to turn on/off preview_url endpoint. defaults to off.  
							
							... 
							
							
							
							Add url_preview_ip_range_blacklist to let admins specify internal IP ranges that must not be spidered.
Add url_preview_url_blacklist to let admins specify URL patterns that must not be spidered.
Implement a custom SpiderEndpoint and associated support classes to implement url_preview_ip_range_blacklist
Add commentary and generally address PR feedback 
							
						 
						
							2016-04-08 18:37:15 +01:00  
				
					
						
							
							
								 
						
							
							
								cf51c4120e 
								
							
								 
							
						 
						
							
							
								
								report image size (bytewise) in OG meta  
							
							
							
						 
						
							2016-04-03 23:57:05 +01:00  
				
					
						
							
							
								 
						
							
							
								0834b152fb 
								
							
								 
							
						 
						
							
							
								
								char encoding  
							
							
							
						 
						
							2016-04-03 12:59:27 +01:00  
				
					
						
							
							
								 
						
							
							
								8b98a7e8c3 
								
							
								 
							
						 
						
							
							
								
								pep8  
							
							
							
						 
						
							2016-04-03 12:56:29 +01:00  
				
					
						
							
							
								 
						
							
							
								eab4d462f8 
								
							
								 
							
						 
						
							
							
								
								fix etag typing error. fix timestamp typing error  
							
							
							
						 
						
							2016-04-03 02:02:46 +01:00  
				
					
						
							
							
								 
						
							
							
								c3916462f6 
								
							
								 
							
						 
						
							
							
								
								rebase all image URLs  
							
							
							
						 
						
							2016-04-03 01:33:12 +01:00  
				
					
						
							
							
								 
						
							
							
								110780b18b 
								
							
								 
							
						 
						
							
							
								
								remove stale todo  
							
							
							
						 
						
							2016-04-03 00:48:31 +01:00  
				
					
						
							
							
								 
						
							
							
								b09e29a03c 
								
							
								 
							
						 
						
							
							
								
								Ensure only one download for a given URL is active at a time  
							
							
							
						 
						
							2016-04-03 00:47:40 +01:00  
				
					
						
							
							
								 
						
							
							
								7426c86eb8 
								
							
								 
							
						 
						
							
							
								
								add a persistent cache of URL lookups, and fix up the in-memory one to work  
							
							
							
						 
						
							2016-04-03 00:31:57 +01:00  
				
					
						
							
							
								 
						
							
							
								d1b154a10f 
								
							
								 
							
						 
						
							
							
								
								support gzip compression, and don't pass through error msgs  
							
							
							
						 
						
							2016-04-02 03:06:39 +01:00  
				
					
						
							
							
								 
						
							
							
								9377157961 
								
							
								 
							
						 
						
							
							
								
								how was _respond_default_thumbnail ever meant to work?  
							
							
							
						 
						
							2016-04-02 02:31:45 +01:00