|
View:
New views
5 Messages
—
Rating Filter:
Alert me
|
|
|
query planner and scanning methodsHello,
I'm running into performance issues with various queries on a PostgreSQL database (of books). I'm having trouble understanding the thinking behind the query planner in this scenario: http://dpaste.com/hold/80101/ (also attached at bottom of email) Relation sizes: dimension_books: 1998766 rows dimension_library_books: 10397943 rows Version: PostgreSQL 8.3.3 on x86_64-pc-linux-gnu, compiled by GCC cc (GCC) 4.2.3 (Ubuntu 4.2.3-2ubuntu7) Why does the query planner change when adding OFFSET? Is there a way to force it to use the first plan? The second plan is relatively slower than the first. I've run ANALYZE recently and played around with different sets of indexes, but I believe my knowledge here is limited. count() is equally as slow (SELECT count(DISTINCT "dimension_book"."call")...). Eventually I want to paginate the results, kind of like the PostgreSQL Archive search: Results 1-20 of more than 1000. Searching in 706,529 pages took 0.13221 seconds. Result pages: 1 2 3 4 5 6 7 8 9 10 11 ... Next I assume it implements something something along these lines? Thanks, colin /****************************************************** Table "public.dimension_library_books" Column | Type | Modifiers ------------+--------- +---------------------------------------------------------------------- id | integer | not null default nextval('dimension_library_books_id_seq'::regclass) book_id | integer | not null library_id | integer | not null Indexes: "dimension_library_books_pkey" PRIMARY KEY, btree (id) "dimension_library_books_book_id" btree (book_id) "dimension_library_books_library_id" btree (library_id) Foreign-key constraints: "dimension_library_books_book_id_fkey" FOREIGN KEY (book_id) REFERENCES dimension_book(id) DEFERRABLE INITIALLY DEFERRED "dimension_library_books_library_id_fkey" FOREIGN KEY (library_id) REFERENCES dimension_library(id) DEFERRABLE INITIALLY DEFERRED Table "public.dimension_book" Column | Type | Modifiers ----------+------------------------ +------------------------------------------------------------- id | integer | not null default nextval('dimension_book_id_seq'::regclass) acno | character varying(255) | title | character varying(255) | allusage | double precision | dousage | double precision | comusage | double precision | year | integer | language | character varying(255) | bclass | character varying(255) | call | character varying(255) | not null Indexes: "dimension_book_pkey" PRIMARY KEY, btree (id) "call_idx" btree (call) ******************************************************/ dimension=# EXPLAIN ANALYZE SELECT DISTINCT ON ("dimension_book"."call") "dimension_book"."title" FROM "dimension_book" INNER JOIN "dimension_library_books" ON ("dimension_book"."id" = "dimension_library_books"."book_id") WHERE ("dimension_book"."call" >= 'PA0000' AND "dimension_library_books"."library_id" IN (12,15,20)) ORDER BY "dimension_book"."call" ASC LIMIT 10; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=0.00..19141.37 rows=10 width=105) (actual time=0.349..1.874 rows=10 loops=1) -> Unique (cost=0.00..15389657.66 rows=8040 width=105) (actual time=0.348..1.865 rows=10 loops=1) -> Nested Loop (cost=0.00..15389443.94 rows=85489 width=105) (actual time=0.344..1.832 rows=14 loops=1) -> Index Scan using call_idx on dimension_book (cost=0.00..311156.04 rows=806644 width=105) (actual time=0.118..0.452 rows=133 loops=1) Index Cond: ((call)::text >= 'PA0000'::text) -> Index Scan using dimension_library_books_book_id on dimension_library_books (cost=0.00..18.61 rows=7 width=4) (actual time=0.009..0.009 rows=0 loops=133) Index Cond: (dimension_library_books.book_id = dimension_book.id) Filter: (dimension_library_books.library_id = ANY ('{12,15,20}'::integer[])) Total runtime: 1.947 ms (9 rows) Time: 3.157 ms dimension=# EXPLAIN ANALYZE SELECT DISTINCT ON ("dimension_book"."call") "dimension_book"."title" FROM "dimension_book" INNER JOIN "dimension_library_books" ON ("dimension_book"."id" = "dimension_library_books"."book_id") WHERE ("dimension_book"."call" >= 'PA0000' AND "dimension_library_books"."library_id" IN (12,15,20)) ORDER BY "dimension_book"."call" ASC LIMIT 10 OFFSET 100; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=137122.20..137122.73 rows=10 width=105) (actual time=3428.164..3428.180 rows=10 loops=1) -> Unique (cost=137116.88..137544.33 rows=8040 width=105) (actual time=3427.981..3428.159 rows=110 loops=1) -> Sort (cost=137116.88..137330.60 rows=85489 width=105) (actual time=3427.978..3428.039 rows=212 loops=1) Sort Key: dimension_book.call Sort Method: quicksort Memory: 34844kB -> Hash Join (cost=71699.90..133790.78 rows=85489 width=105) (actual time=1676.993..2624.015 rows=167419 loops=1) Hash Cond: (dimension_library_books.book_id = dimension_book.id) -> Bitmap Heap Scan on dimension_library_books (cost=3951.25..63069.35 rows=211789 width=4) (actual time=112.627..581.554 rows=426156 loops=1) Recheck Cond: (library_id = ANY ('{12,15,20}'::integer[])) -> Bitmap Index Scan on dimension_library_books_library_id (cost=0.00..3898.30 rows=211789 width=0) (actual time=95.030..95.030 rows=426156 loops=1) Index Cond: (library_id = ANY ('{12,15,20}'::integer[])) -> Hash (cost=57665.60..57665.60 rows=806644 width=105) (actual time=1484.803..1484.803 rows=799876 loops=1) -> Seq Scan on dimension_book (cost=0.00..57665.60 rows=806644 width=105) (actual time=37.391..1028.518 rows=799876 loops=1) Filter: ((call)::text >= 'PA0000'::text) Total runtime: 3446.154 ms (15 rows) Time: 3447.396 ms -- Colin Copeland Caktus Consulting Group, LLC P.O. Box 1454 Carrboro, NC 27510 (919) 951-0052 http://www.caktusgroup.com -- Sent via pgsql-performance mailing list (pgsql-performance@...) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance |
|
|
Re: query planner and scanning methodsOn Tue, Sep 23, 2008 at 2:22 PM, Colin Copeland <copelco@...> wrote:
> dimension=# EXPLAIN ANALYZE > SELECT DISTINCT ON ("dimension_book"."call") > "dimension_book"."title" > FROM "dimension_book" > INNER JOIN "dimension_library_books" > ON ("dimension_book"."id" = "dimension_library_books"."book_id") > WHERE ("dimension_book"."call" >= 'PA0000' > AND "dimension_library_books"."library_id" IN (12,15,20)) > ORDER BY "dimension_book"."call" ASC > LIMIT 10 OFFSET 100; Ya offset works by scanning over the first 100 rows. When the offsets get big, it become a performance looser. You can guarantee a faster index scan if you recall the last 10th value from the previous query. Then remove the offset predicate and replace it with the following WHERE clause: WHERE ... AND dimension_book.call > _last_queried_10th_row-dimension_book_call, ... LIMIT 10; -- Regards, Richard Broersma Jr. Visit the Los Angeles PostgreSQL Users Group (LAPUG) http://pugs.postgresql.org/lapug -- Sent via pgsql-performance mailing list (pgsql-performance@...) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance |
|
|
Re: query planner and scanning methodsOn Sep 23, 2008, at 6:07 PM, Richard Broersma wrote: > On Tue, Sep 23, 2008 at 2:22 PM, Colin Copeland <copelco@... > > wrote: >> dimension=# EXPLAIN ANALYZE >> SELECT DISTINCT ON ("dimension_book"."call") >> "dimension_book"."title" >> FROM "dimension_book" >> INNER JOIN "dimension_library_books" >> ON ("dimension_book"."id" = >> "dimension_library_books"."book_id") >> WHERE ("dimension_book"."call" >= 'PA0000' >> AND "dimension_library_books"."library_id" IN (12,15,20)) >> ORDER BY "dimension_book"."call" ASC >> LIMIT 10 OFFSET 100; > > Ya offset works by scanning over the first 100 rows. When the offsets > get big, it become a performance looser. > > You can guarantee a faster index scan if you recall the last 10th > value from the previous query. Then remove the offset predicate and > replace it with the following WHERE clause: > > WHERE ... > AND dimension_book.call > _last_queried_10th_row-dimension_book_call, > ... > LIMIT 10; Richard, Yes, I was thinking about this too. How would one generate a list of pages from this, though? I can't predict values of dimension_book.call (it's not a serial number). Thanks, colin -- Colin Copeland Caktus Consulting Group, LLC P.O. Box 1454 Carrboro, NC 27510 (919) 951-0052 http://www.caktusgroup.com -- Sent via pgsql-performance mailing list (pgsql-performance@...) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance |
|
|
Re: query planner and scanning methodsOn Tue, Sep 23, 2008 at 3:25 PM, Colin Copeland <copelco@...> wrote:
>>> dimension=# EXPLAIN ANALYZE >>> SELECT DISTINCT ON ("dimension_book"."call") >>> "dimension_book"."title" >>> FROM "dimension_book" >>> INNER JOIN "dimension_library_books" >>> ON ("dimension_book"."id" = "dimension_library_books"."book_id") >>> WHERE ("dimension_book"."call" >= 'PA0000' >>> AND "dimension_library_books"."library_id" IN (12,15,20)) >>> ORDER BY "dimension_book"."call" ASC >>> LIMIT 10 OFFSET 100; > Yes, I was thinking about this too. How would one generate a list of pages > from this, though? I can't predict values of dimension_book.call (it's not a > serial number). I can think of one very ugly way to get the first record for each page. Hopefully, you will not need to generate these list pages very often. Also, you could probably refine the following query in a couple of ways to improve performance. SELECT A."dimension_book"."call", SUM( B."dimension_book"."call" ) AS OrderedRowNbr FROM ( your_above_query_without_the_limits ) AS A INNER JOIN ( your_above_query_without_the_limits ) AS B ON A."dimension_book"."call" >= B."dimension_book"."call" ORDER BY A."dimension_book"."call" HAVING SUM( A."dimension_book"."call" ) % 10 = 0; -- Regards, Richard Broersma Jr. Visit the Los Angeles PostgreSQL Users Group (LAPUG) http://pugs.postgresql.org/lapug -- Sent via pgsql-performance mailing list (pgsql-performance@...) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance |
|
|
Re: query planner and scanning methodsOn Tue, Sep 23, 2008 at 3:57 PM, Richard Broersma
<richard.broersma@...> wrote: > SELECT A."dimension_book"."call", SUM( B."dimension_book"."call" ) AS > OrderedRowNbr > FROM ( your_above_query_without_the_limits ) AS A > INNER JOIN ( your_above_query_without_the_limits ) AS B > ON A."dimension_book"."call" >= B."dimension_book"."call" > ORDER BY A."dimension_book"."call" > HAVING SUM( A."dimension_book"."call" ) % 10 = 0; Oops I just noticed that I used sum() where count() should be used and that I forgot to include the group by clause. Other than that, I hope the suggestion was at least halfway helpful. -- Regards, Richard Broersma Jr. Visit the Los Angeles PostgreSQL Users Group (LAPUG) http://pugs.postgresql.org/lapug -- Sent via pgsql-performance mailing list (pgsql-performance@...) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance |
| Free Forum Powered by Nabble | Forum Help |