Query returning large XML document

View: New views
13 Messages — Rating Filter:   Alert me  

Query returning large XML document

by Alessandro Vernet :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I have a very simple query that returns a large document, which under the root element contains 150,000 elements, each containing another 5 to 10 elements with some text. I am using the REST API. The issue is that I need to set the VM heap to more than 1 GB for the query to complete successfully.

Could this be happening because eXist is building the whole result set in memory before returning it? Can eXist "stream out" the result set as it builds it, at least for some queries? Do you have any other suggestion on how to better handle a case like this where I need to return a large document?

Alex
Orbeon Forms - Web 2.0 Forms, open-source, for the Enterprise
http://www.orbeon.com/

Parent Message unknown Re: Query returning large XML document

by Ananth Raghuraman :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.
Hi,

Have you tried splitting the document into multiple documents(resources)?
You can then execute the query repeatedly on each document(resource)
to make sure you get results that can be in memory.
Perhaps you could have a (Sub)collection devoted to this document.

Another technique used in relational databases comes to mind.
If your resulting elements have an "id" attribute/child element
and they are sequential, you could query for a batch of these
where ids fall within a range; then you could query for the "next
batch"

However above approaches are only to be used if there
is a way eXist provides for streaming/chunking the results.
I think the maintainers will be better able to answer your
question in this regard.

Thanks

----- Original Message ----
From: Alessandro Vernet <avernet@...>
To: exist-open@...
Sent: Friday, June 13, 2008 8:10:29 PM
Subject: [Exist-open] Query returning large XML document


I have a very simple query that returns a large document, which under the
root element contains 150,000 elements, each containing another 5 to 10
elements with some text. I am using the REST API. The issue is that I need
to set the VM heap to more than 1 GB for the query to complete successfully.

Could this be happening because eXist is building the whole result set in
memory before returning it? Can eXist "stream out" the result set as it
builds it, at least for some queries? Do you have any other suggestion on
how to better handle a case like this where I need to return a large
document?

Alex

-----
Orbeon Forms - Web 2.0 Forms, open-source, for the Enterprise
http://www.orbeon.com/

--
View this message in context: http://www.nabble.com/Query-returning-large-XML-document-tp17834098p17834098.html
Sent from the exist-open mailing list archive at Nabble.com.


-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open


-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: Query returning large XML document

by Alessandro Vernet :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Ananth,

Ananth Raghuraman wrote:
Have you tried splitting the document into multiple documents(resources)?
You can then execute the query repeatedly on each document(resource)
to make sure you get results that can be in memory.
Perhaps you could have a (Sub)collection devoted to this document.
The documents in the database are not large. It is only the document produced by the query which is quite large.

Ananth Raghuraman wrote:
Another technique used in relational databases comes to mind.
If your resulting elements have an "id" attribute/child element
and they are sequential, you could query for a batch of these
where ids fall within a range; then you could query for the "next
batch"
With a (good) relational database, there is a good chance that I wouldn't have this problem, as each one of the 150,000 elements I mentioned would be a row. The data would be streamed row by row from the database server to the client. Maybe there is a way to do something similar in eXist.

Alex
Orbeon Forms - Web 2.0 Forms, open-source, for the Enterprise
http://www.orbeon.com/

Re: Query returning large XML document

by Gary Larsen :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I'll chip in here too.

>
> The documents in the database are not large. It is only the document
> produced by the query which is quite large.

I have a similar situation where queries can aggregate data from a large
number of documents.  Some of these queries can take several minutes to
execute but the value of the results makes that acceptable.

I noticed a large increase in memory usage of the same queries after
upgrading to version 1.2.  As a result I had to roll my application back to
1.1 which I obviously didn't want to do.

Unfortunately I haven't had the bandwidth to quantify or by example document
this. I didn't want to start posting on this without harder facts.

One thing I did seem to notice with a profiler was that v1.1 would perform
gc during the query has I could see memory usage go up and down during
execution.  With v1.2 memory would only free up after the query had
completed.  

gary




-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: Query returning large XML document

by Wolfgang Meier-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Could this be happening because eXist is building the whole result set in
> memory before returning it? Can eXist "stream out" the result set as it
> builds it, at least for some queries? Do you have any other suggestion on
> how to better handle a case like this where I need to return a large
> document?

I checked this and found that the REST interface does indeed not stream
out the query results. It serializes the entire result in memory before
it writes it to the servlet's output stream. We have to fix this...

Apart from that, memory consumption very much depends on what your query
does. eXist uses a lazy approach when constructing XML fragments. If
your constructed XML includes references to nodes stored in the db,
eXist will not resolve those references until the fragment is
serialized. Thus memory consumption should be rather low.

Another issue I was working at today: eXist's default node set
implementation is not very memory efficient if you have a database with
lots of rather small documents. I'm currently testing a
reimplementation, which is much less memory intensive. For queries on a
database containing 100,000 small docs, the new implementation achieves
quite a performance boost. I will commit this into 1.2 and 1.3 soon.

Wolfgang

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: Query returning large XML document

by Alessandro Vernet :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Wolfgang,

Wolfgang Meier-2 wrote:
I checked this and found that the REST interface does indeed not stream
out the query results. It serializes the entire result in memory before
it writes it to the servlet's output stream. We have to fix this...

Apart from that, memory consumption very much depends on what your query
does. eXist uses a lazy approach when constructing XML fragments. If
your constructed XML includes references to nodes stored in the db,
eXist will not resolve those references until the fragment is
serialized. Thus memory consumption should be rather low.
The query looks like:

for $row in /rows/row return
<tr>
    <td>{$row/a/text()}</td>
    <td>{$row/b/text()}</td>
    <td>{$row/c/text()}</td>
    <td>{$row/d/text()}</td>
</tr>

Is this considered as as a reference to the text node in the , , <c>, and <d> element? Is there a better way to write this to use less memory?

I am shooting at returning 150,000 of those <tr>, so maybe I should just wait until you implement streaming in the REST interface :). Does it make sense for me to create a "bug/RFE" in the SF.net tracker for this, so I can be notified when this is implemented?

Alex

Re: Query returning large XML document

by Adam Retter-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Can I help with this streaming in REST Server - what needs to be done?

2008/6/17 Wolfgang <wolfgang@...>:

>> Could this be happening because eXist is building the whole result set in
>> memory before returning it? Can eXist "stream out" the result set as it
>> builds it, at least for some queries? Do you have any other suggestion on
>> how to better handle a case like this where I need to return a large
>> document?
>
> I checked this and found that the REST interface does indeed not stream
> out the query results. It serializes the entire result in memory before
> it writes it to the servlet's output stream. We have to fix this...
>
> Apart from that, memory consumption very much depends on what your query
> does. eXist uses a lazy approach when constructing XML fragments. If
> your constructed XML includes references to nodes stored in the db,
> eXist will not resolve those references until the fragment is
> serialized. Thus memory consumption should be rather low.
>
> Another issue I was working at today: eXist's default node set
> implementation is not very memory efficient if you have a database with
> lots of rather small documents. I'm currently testing a
> reimplementation, which is much less memory intensive. For queries on a
> database containing 100,000 small docs, the new implementation achieves
> quite a performance boost. I will commit this into 1.2 and 1.3 soon.
>
> Wolfgang
>
> -------------------------------------------------------------------------
> Check out the new SourceForge.net Marketplace.
> It's the best place to buy or sell services for
> just about anything Open Source.
> http://sourceforge.net/services/buy/index.php
> _______________________________________________
> Exist-open mailing list
> Exist-open@...
> https://lists.sourceforge.net/lists/listinfo/exist-open
>



--
Adam Retter

eXist Developer
{ England }
adam@...
irc://irc.freenode.net/existdb

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: Query returning large XML document

by Wolfgang Meier-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Can I help with this streaming in REST Server - what needs to be done?

Oh yes, some help would be great ;-) If you look into the RESTServer
class, you'll find that queries are usually executed by calling the
search() method, which returns a String:

String result = search(broker, query, path, howmany, start,
   outputProperties, wrap, cache, request, response);
...
if(!response.isCommitted()) {
   writeResponse(response, result, mimeType, encoding);
}

Somehow, instead of serializing to a string first, we should directly
write to the servlet's output stream.

Wolfgang

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: Query returning large XML document

by Alessandro Vernet :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Adam, Wolfgang,

Adam Retter-3 wrote:
Can I help with this streaming in REST Server - what needs to be done?
Should we create an entry in the SF.net tracker for track this one?

Also, let me know if you make any changes related to streaming in the REST Server. I am looking forward to testing this! :) And there may consequently also be changes we might want to do in our own code (Orbeon Forms) to leverage this.

Alex
Orbeon Forms - Web 2.0 Forms, open-source, for the Enterprise
http://www.orbeon.com/

Re: Query returning large XML document

by Adam Retter-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Alessandro,

Streaming changes to the REST Server were committed into trunk last
night as revision 7917

Let me know how your testing goes...

Cheers Adam.


2008/6/25 Alessandro Vernet <avernet@...>:

>
> Adam, Wolfgang,
>
>
> Adam Retter-3 wrote:
>>
>> Can I help with this streaming in REST Server - what needs to be done?
>>
>
> Should we create an entry in the SF.net tracker for track this one?
>
> Also, let me know if you make any changes related to streaming in the REST
> Server. I am looking forward to testing this! :) And there may consequently
> also be changes we might want to do in our own code (Orbeon Forms) to
> leverage this.
>
> Alex
>
> -----
> Orbeon Forms - Web 2.0 Forms, open-source, for the Enterprise
> http://www.orbeon.com/
>
> --
> View this message in context: http://www.nabble.com/Query-returning-large-XML-document-tp17834098p18104489.html
> Sent from the exist-open mailing list archive at Nabble.com.
>
>
> -------------------------------------------------------------------------
> Check out the new SourceForge.net Marketplace.
> It's the best place to buy or sell services for
> just about anything Open Source.
> http://sourceforge.net/services/buy/index.php
> _______________________________________________
> Exist-open mailing list
> Exist-open@...
> https://lists.sourceforge.net/lists/listinfo/exist-open
>



--
Adam Retter

eXist Developer
{ England }
adam@...
irc://irc.freenode.net/existdb

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: Query returning large XML document

by Wolfgang Meier-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Streaming changes to the REST Server were committed into trunk last
> night as revision 7917

If your query creates and/or returns a large amount of separate, small
XML fragments, my latest commits (r7916 ff.) could be quite interesting
as well.

Wolfgang

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: Query returning large XML document

by Dannes Wessels-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

On Thu, Jun 26, 2008 at 12:09 AM, Wolfgang <wolfgang@...> wrote:
>> Streaming changes to the REST Server were committed into trunk last
>> night as revision 7917
>
> If your query creates and/or returns a large amount of separate, small
> XML fragments, my latest commits (r7916 ff.) could be quite interesting
> as well.

I'll upload a new snappy in a few days.....

regards

Dannes

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: Query returning large XML document

by Alessandro Vernet :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Adam,

Adam Retter-3 wrote:
Streaming changes to the REST Server were committed into trunk last
night as revision 7917
I finally got a chance to test this, and it is working great. Thank you!

Alex
Orbeon Forms - Web 2.0 Forms, open-source, for the Enterprise
http://www.orbeon.com/