Query sorting and performance on LOM ressources

View: New views
2 Messages — Rating Filter:   Alert me  

Query sorting and performance on LOM ressources

by Igor Barma :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi all,
I have some problems with a query on a data base what store like 15 000 documents for the moment.
My ressources are LOM metadata documents.
My searches are often larges and return 2000 or 5000 documents.

First, what i try to do is something like that:

xquery version "1.0";
declare namespace lom="http://ltsc.ieee.org/xsd/LOM";

let $col := collection('/db/repositories')
let $XML := $col//lom:lom[(.&="searchTerm*")]

let $XMLT := local:transformXML($XML[position()>=0 and position()<=10])
let $res := <results nbResults="{count($XML)}">{$XMLT}</results>
return $res

My problem is that i have to know how many documents that query return. Like that, in my application, i can have a batch navigation.
I don't know if there is an other way to do that without take all the results and after make a filter on the position ??
Any idea ??

My second problem is i want to sort my results on a field so i did something like that:

declare function local:search() as node()*{
let $col := collection('/db/repositories')

for $lom in $col//lom:lom[(.&="searchTerm*")]
  order by xs:date($lom/dateRessource)
return $lom
};

let $XML := local:search()
let $XMLT := local:transformXML($XML[position()>=0 and position()<=10])
let $res := <results nbResults="{count($XML)}">{$XMLT}</results>
return $res


On this kind of request, the processor, i think, handle all the results (nearly 5000) and  make the sort on "dateRessource" element. This operation is very time expensive....
As i speak about the sort, at the begining, my sort field waas like that :

order by xs:date($lom/lom:lifeCycle/lom:contribute[lom:role/lom:value='author']/lom:date/lom:dateTime)

This let me handle the creation date of document (LOM specification) but is very long too for the processor. Is there a way to take this element quickly without copy it inside a top level element like i did with "dateRessource".
I have all my indices defined correctly i think (trace inside the log about use of them) but this part:
$lom/lom:lifeCycle/lom:contribute[lom:role/lom:value='author']/lom:date/lom:dateTime
increase the request dramaticaly....

ya... it's just few of my problems i have with exist database queries and if anybody has some way of search or anything what can help me, it could be very nice.

Just for finish I was thinking about my sorting problem if the database sort it's results on the last modification date of the document if there is a method to set it. This will resolve my problem if it's work like that...

And just to know, is there anybody who resolves the accent problem inside a query (accent insensitive) on a standalone database...

Thanks a lot to everybody 

Igor



-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: Query sorting and performance on LOM ressources

by Wolfgang Meier-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

> First, what i try to do is something like that:
>
> xquery version "1.0";
> declare namespace lom="http://ltsc.ieee.org/xsd/LOM";
>
> let $col := collection('/db/repositories')
> let $XML := $col//lom:lom[(.&="searchTerm*")]
>
> let $XMLT := local:transformXML($XML[position()>=0 and position()<=10])
> let $res := <results nbResults="{count($XML)}">{$XMLT}</results>
> return $res
>
> My problem is that i have to know how many documents that query return.
> Like that, in my application, i can have a batch navigation.
> I don't know if there is an other way to do that without take all the
> results and after make a filter on the position ??

That's basically ok, though I would replace

$XML[position()>=0 and position()<=10]

by

subsequence($XML, 1, 10)

which should be a bit faster (note that positions are 1-based). The
fulltext query itself could be supported by an index defined on the
element's qname (see http://exist-db.org/indexing.html#N10418). You
could also drop the additional () around the predicate expression, which
may irritate the query optimizer:

let $XML := $col//lom:lom[.&="searchTerm*"]

> My second problem is i want to sort my results on a field so i did
> something like that:
>
> declare function local:search() as node()*{
> let $col := collection('/db/repositories')
>
> for $lom in $col//lom:lom[(.&="searchTerm*")]
>   order by xs:date($lom/dateRessource)
> return $lom
> };
>
> let $XML := local:search()
> let $XMLT := local:transformXML($XML[position()>=0 and position()<=10])
> let $res := <results nbResults="{count($XML)}">{$XMLT}</results>
> return $res

This should be ok with the changes I suggested above. Unfortunately,
order by can not be further optimized right now. We would need to
implement additional features for that.

> On this kind of request, the processor, i think, handle all the results
> (nearly 5000) and  make the sort on "dateRessource" element. This
> operation is very time expensive....
> As i speak about the sort, at the begining, my sort field waas like that :
>
> order by
> xs:date($lom/lom:lifeCycle/lom:contribute[lom:role/lom:value='author']/lom:date/lom:dateTime)

A (kind of) dirty trick to speed up expressions like this is to pull the
path expression out of the for loop, e.g.

let $lom := $col//lom:lom[(.&="searchTerm*")]
let $order :=
$lom/lom:lifeCycle/lom:contribute[lom:role/lom:value='author']/lom:date/lom:dateTime
for $l in $lom order by xs:date($lom/$order) return $l

> Just for finish I was thinking about my sorting problem if the database
> sort it's results on the last modification date of the document if there
> is a method to set it. This will resolve my problem if it's work like
> that...

Well, yes, it might be an idea to create an extension function which
takes a bunch of nodes and sorts them according to document properties
like creation or last-modification time. Not difficult to implement and
probably quite useful in many cases. I'll keep that in mind ;-)

Wolfgang

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open
LightInTheBox - Buy quality products at wholesale price