Fix PostgreSQL sometimes using table scans for `event_search` (#14409)
PostgreSQL may underestimate the number of distinct `room_id`s in `event_search`, which can cause it to use table scans for queries for multiple rooms. Fix this by setting `n_distinct` on the column. Resolves #14402. Signed-off-by: Sean Quah <seanq@matrix.org>pull/14410/head
							parent
							
								
									d10a85ec9e
								
							
						
					
					
						commit
						b2c2b03079
					
				|  | @ -0,0 +1 @@ | |||
| Fix PostgreSQL sometimes using table scans for queries against the `event_search` table, taking a long time and a large amount of IO. | ||||
|  | @ -0,0 +1,33 @@ | |||
| /* Copyright 2022 The Matrix.org Foundation C.I.C | ||||
|  * | ||||
|  * Licensed under the Apache License, Version 2.0 (the "License"); | ||||
|  * you may not use this file except in compliance with the License. | ||||
|  * You may obtain a copy of the License at | ||||
|  * | ||||
|  *    http://www.apache.org/licenses/LICENSE-2.0 | ||||
|  * | ||||
|  * Unless required by applicable law or agreed to in writing, software | ||||
|  * distributed under the License is distributed on an "AS IS" BASIS, | ||||
|  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||||
|  * See the License for the specific language governing permissions and | ||||
|  * limitations under the License. | ||||
|  */ | ||||
| 
 | ||||
| 
 | ||||
| -- By default the postgres statistics collector massively underestimates the | ||||
| -- number of distinct rooms in `event_search`, which can cause postgres to use | ||||
| -- table scans for queries for multiple rooms. | ||||
| -- | ||||
| -- To work around this we can manually tell postgres the number of distinct rooms | ||||
| -- by setting `n_distinct` (a negative value here is the number of distinct values | ||||
| -- divided by the number of rows, so -0.01 means on average there are 100 rows per | ||||
| -- distinct value). We don't need a particularly accurate number here, as a) we just | ||||
| -- want it to always use index scans and b) our estimate is going to be better than the | ||||
| -- one made by the statistics collector. | ||||
| 
 | ||||
| ALTER TABLE event_search ALTER COLUMN room_id SET (n_distinct = -0.01); | ||||
| 
 | ||||
| -- Ideally we'd do an `ANALYZE event_search (room_id)` here so that | ||||
| -- the above gets picked up immediately, but that can take a bit of time so we | ||||
| -- rely on the autovacuum eventually getting run and doing that in the | ||||
| -- background for us. | ||||
		Loading…
	
		Reference in New Issue
	
	 Sean Quah
						Sean Quah