|
View:
New views
1 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 | Next > |
|
|
Re: Text search performance and postgresqlI should clarify that compression of individual revision text is only
one of several DB-opaque ways text may be stored. -- brion vibber (brion @ wikimedia.org) On Jul 30, 2008, at 1:36, Marc Cousin <mcousin@...> wrote: > Thanks for your aswers. > > Here's what I see from my (narrow) understanding of how mediawiki > works. > > - I already tried to change the parameters you mention, but as they > were > correct for my (pretty old) machine, this brought me even worse > results (this > was expected). The bad plans come from the fact we have a lot of > versions of > page revisions indexed in the page table. > > Anyhow, I find it strange that we use the pagecontent table to store > those > full text vectors when they are in fact the full text vectors for a > page. > Wouldn't this textvector be better placed either in the page table > or in a > searchindex with a 1:1 relationship like in mysql ? That would be more > logical to the planner : how could it guess that the records from the > pagecontent table that are indexed are indeed the latest revisions > from the > page table ? Whereas if the vector is in the page table, the query > becomes a > more classical join : we find the records matching the full text > criterion > from page table, then join to revision and pagecontent to retrieve the > associated text. > > - all revisions of all pages were indexed in the text table. Maybe > because > we're still using 1.9.1 ? (there is no update in SearchPostgres.php > here) > > > If I understand correctly what you explain below, the content of > text is > not 'opacified' for postgresql, as there is no point in doing that > because of > toast compression, so working with triggers is the logical way to go > with > postgresql ? > > On Tuesday 29 July 2008 17:29:44 Greg Sabino Mullane wrote: >> Marc Cousin wrote: >>> The plan is this one >> >> [explain analyze plans] >> >> Those are pretty bad plans - you might benefit from some simple >> tuning. >> Try lowering random_page_cost and boosting effective_cache_size. Or >> come >> round to #postgresql on freenode, for more free tuning help than >> you can >> shake a stick at. >> >> The query you gave ran in 56 ms for me on what I'm guessing is >> a larger wiki (~ 400K rows in pagecontent) (and a pretty busy box). >> >> You should be seeing a Bitmap Index Scan on "ts2_page_text", >> >> If you find yourself using the same namespace restrictions a lot, >> you can >> create some custom indexes as well, e.g. >> >> CREATE INDEX page_index1 ON page(page_title) WHERE page_is_redirect >> = 0 >> AND page_namespace IN (0,9,11); >> >>> Is there a reason we index every version of every content ? >> >> We don't. Or at least, we were not intending to. You can grep for >> UPDATE >> in SearchPostgres.php. I just changed (r38184) the code to double- >> check we >> nullify *all* old revisions: this may explain part of what you were >> seeing >> before. You can check on individual pages and see if they are all >> null-but-latest like so: >> >> SELECT CASE WHEN textvector IS NULL THEN 1 ELSE 0 END AS isnull, >> COUNT(*) >> FROM pagecontent WHERE old_id IN (SELECT rev_text_id FROM revision >> WHERE >> rev_page = (SELECT page_id FROM page WHERE page_namespace = 0 AND >> page_title = 'Foobar' LIMIT 1)) GROUP BY 1; >> >> (mostly for the benefit of other people reading this thread, since >> you >> said you already manually updated the database) >> >> Brion Vibber wrote: >>> Hmm; a fundamental problem here is that text.old_text is opaque to >>> the >>> database -- it may contain compressed text, text in an alternate >>> encoding, or a reference to another table or database at the >>> application level. >> >> Right: $wgCompressRevisions is not recommended for a Postgres >> backend. >> Should probably document that somewhere. Postgres will compress >> automatically anyway. >> >> Not to say that we might not want to change the way things are >> done, but >> the easiest solution to the original poster's problem is to get the >> query >> optimized first, and figure out why the old revision are not being >> nulled. >> >> Platonides wrote: >>> The postgresql backend is not too well maintained, people wanting to >>> postgre appear from time to time and give it a kick. >>> I don't remember who is currently supposed to be taking care of >>> it, but >>> i interpret that as he hasn't commented on this, he doesn't >>> oppose ;) >> >> Wow, that's a bit harsh - not too well maintained? Hardly. :) >> Myself and >> others have put in a lot of work to get it working and keep it so, >> including paving the way for other database backends by discovering >> and >> addressing lots of mysqlisms in the code. I apologize for not >> replying: >> I somehow got recently unsubscribed. > > > > _______________________________________________ > MediaWiki-l mailing list > MediaWiki-l@... > https://lists.wikimedia.org/mailman/listinfo/mediawiki-l _______________________________________________ MediaWiki-l mailing list MediaWiki-l@... https://lists.wikimedia.org/mailman/listinfo/mediawiki-l |
| < Prev | 1 - 2 | Next > |
| Free Forum Powered by Nabble | Forum Help |