Re: Need Saxon-SA? (was: Re: saxon-help Digest, Vol 26, Issue 18)

View: New views
3 Messages — Rating Filter:   Alert me  

Re: Need Saxon-SA? (was: Re: saxon-help Digest, Vol 26, Issue 18)

by Sylvain Hallé-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thanks a lot for your help, now the program runs. ;-)  I decided at the same
time to switch to the s9api which is simpler to use.

However, I have trouble achieving streaming at the source.  I created a custom
URIResolver which traps calls to doc('anything') and sends a StreamSource
taken from somewhere else in my code.  I then wrote the following:

Processor saxp = new Processor(true);
XQueryCompiler xqp = saxp.newXQueryCompiler();
XQueryExecutable xqex = xqp.compile(my_query_string);
XQueryEvaluator xqe = xqex.load();
xqe.setURIResolver(my_uri_resolver);

 From there, I tried two things:

1. Iterate over query results with:

Iterator it = xqe.iterator();
while (it.hasNext()) { Do something }

2. Create a custom Destination and Receiver to trap output events, with:

xqe.run(new MyDestination(new MyReceiver()));

However, when I trace into the code with a query such as "for $x in
(#saxon:stream#) {doc('0')/a/b[1]} return $x", I remark the following:

- In situation #1, the whole source is read on the first call to it.hasNext().
- In situation #2, the whole source is read before any event is called on the
instance of MyReceiver.

Clearly the query result cannot change once the closing tag of the first "b"
has been read; therefore there should be no need to read the rest of the
document, let alone to postpone sending the output until the end of the file.

Yet the query "for $x in (#saxon:stream#) {doc('0')/a/b} return $x", which
returns all b elements, gives me the desired behavior:  it sends them one by
one to the output as the input is being read. I don't understand why the
processor behaves differently for the two queries.

After some experiments, I discovered though that as soon as the "in" clause
includes sibling functions ([1], following-sibling, etc.) or unions of two
path expressions, or if the return clause applies further operations on the
result (such as $x/b above), the whole document is read prior to outputting
the results.

Is there a way to change my code, or did I reach a limit of Saxon's streaming
capabilities?

Thanks,

Sylvain

-----Original Message-----
Date: Thu, 17 Jul 2008 08:39:05 +0100 From: "Michael Kay" <mike@...>
Subject: Re: [saxon] Need Saxon-SA? (was: Re: saxon-help Digest, Vol 26, Issue
18) To: "'Mailing list for the SAXON XSLT and XQuery processor'"
<saxon-help@...> Message-ID:
<BCF9FA33CC214C0393FBE8780C111343@Sealion>
Content-Type: text/plain; charset="iso-8859-1"

If you're invoking Saxon from the command line, you need to use the -sa
option. If you're invoking it from a Java API, you need to specify that you
want the SA processor when you start:

JAXP: instantiate com.saxonica.SchemaAwareTransformerFactory

native Saxon API: instantiate SchemaAwareConfiguration

s9api: new Processor(true).

Sorry about the inconvenience!

Michael Kay
Saxonica

 > > -----Original Message-----
 > > From: saxon-help-bounces@...
 > > [mailto:saxon-help-bounces@...] On Behalf
 > > Of Sylvain Hall?
 > > Sent: 17 July 2008 00:18
 > > To: saxon-help@...
 > > Subject: [saxon] Need Saxon-SA? (was: Re: saxon-help Digest,
 > > Vol 26, Issue 18)
 > >
 > > Thanks Michael.  I wrote a URIResolver which passes whatever
 > > I want to the engine.
 > >
 > > I downloaded an evaluation copy of Saxon-SA and a licence key
 > > in order to test the (#saxon:stream#) functionality; I
 > > deleted my Saxon-B jars, replaced them with those from
 > > saxonsa9-1-0-1j.zip and copied the licence file into my
 > > classpath and restarted the whole thing to make sure the
 > > changes would be noticed.  However I get the following when I
 > > run my test program:
 > >
 > >    XPST0003: XQuery syntax error in #for $x in (#saxon:stream#) {#:
 > >      To use saxon:stream, you need the Saxon-SA processor
 > > from http://www.saxonica.com/
 > >
 > > Besides, whether the licence file is present in the classpath
 > > or not does not change the message.  Shouldn't I be told
 > > "License file saxon-license.lic not found" if I remove the
 > > licence from the classpath?
 > >
 > > I must have missed something obvious; does anyone know what
 > > that can be?
 > >
 > > Sylvain
 > >
 > > --- Original message ---
 > >
 > > Date: Fri, 11 Jul 2008 18:12:30 +0100
 > > From: "Michael Kay" <mike@...>
 > > Subject: Re: [saxon] saxon-help Digest, Vol 26, Issue 18
 > > To: "'Mailing list for the SAXON XSLT and XQuery processor'"
 > > <saxon-help@...>
 > > Message-ID: <9AA132C547FF406881B5D680FF587985@Sealion>
 > > Content-Type: text/plain; charset="us-ascii"
 > >
 > >  > > Thanks.  However, I noticed in the documentation that
 > > this  > > streaming facility is available for an expression
 > > that must  > > start with doc() (i.e. it must read a file).
 > > If I set my  > > document context to another source (e.g. a
 > > character stream  > > produced by another part of my code)
 > > using the bindDocument()  > > method, is there a way to
 > > achieve the same result?
 > >
 > > You can write a URIResolver that intercepts the call on doc()
 > > and returns a StreamSource. But I don't think it can be made
 > > to work with the XQJ
 > > bindDocument() method.
 > >
 > > Michael Kay
 > > http://www.saxonica.com/
 > >

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
saxon-help@...
https://lists.sourceforge.net/lists/listinfo/saxon-help 

Re: Need Saxon-SA? (was: Re: saxon-help Digest, Vol 26, Issue 18)

by Michael Kay :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> However, when I trace into the code with a query such as "for $x in
> (#saxon:stream#) {doc('0')/a/b[1]} return $x", I remark the following:
>
> - In situation #1, the whole source is read on the first call
> to it.hasNext().
> - In situation #2, the whole source is read before any event
> is called on the instance of MyReceiver.
>
> Clearly the query result cannot change once the closing tag
> of the first "b"
> has been read; therefore there should be no need to read the
> rest of the document, let alone to postpone sending the
> output until the end of the file.

There are many path expressions that could theoretically be streamed, but
which Saxon does not in fact stream. For streaming to work, you must use an
expression that follows the rules in

http://www.saxonica.com/documentation/sourcedocs/serial/streamability.html

Note in particular that this does not allow positional predicates.

You should be able to rewrite the expression to get around this restriction:
try

((#saxon:stream#) {doc('0')/a/b})[1]"

>
> Yet the query "for $x in (#saxon:stream#) {doc('0')/a/b}
> return $x", which returns all b elements, gives me the
> desired behavior:  it sends them one by one to the output as
> the input is being read. I don't understand why the processor
> behaves differently for the two queries.
>
> After some experiments, I discovered though that as soon as
> the "in" clause includes sibling functions ([1],
> following-sibling, etc.) or unions of two path expressions,
> or if the return clause applies further operations on the
> result (such as $x/b above), the whole document is read prior
> to outputting the results.
>
This really shouldn't require experiments, the rules are written down on the
page referenced above.

As to "why", it's simply a question of time and effort for doing the
optimizations and testing them. I started with the streaming subset of XPath
defined in XML Schema, and then added a few extra capabilities like union
paths and simple boolean predicates. One of the reasons that positional
predicates aren't currently supported is that the streaming evaluator
currently has to make a decision whether to include a node in the result or
not based solely on knowledge of the names of the ancestors of the node and
the values of its attributes; it doesn't retain any memory about preceding
siblings of any of those nodes.

Also, you shouldn't need to "trace into the code" to see whether streaming
is being used. Use the -explain option on the command line, or
processor.setConfigurationProperty(FeatureKeys.TRACE_OPTIMIZER_DECISIONS,
Boolean.TRUE) from the Java API.

Michael Kay
http://www.saxonica.com/


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
saxon-help@...
https://lists.sourceforge.net/lists/listinfo/saxon-help 

Parent Message unknown Re: Need Saxon-SA? (was: Re: saxon-help Digest, Vol 26, Issue 18)

by Sylvain Hallé-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thank you Michael for these clarifications.  I discovered the rules after I
sent my message.

Extending the range of streamable queries in future versions of Saxon would be
an interesting feature in my view.

Thanks a lot!

Sylvain

--- Original message ---
Date: Thu, 17 Jul 2008 22:46:23 +0100
From: "Michael Kay" <mike@...>
Subject: Re: [saxon] Need Saxon-SA? (was: Re: saxon-help Digest, Vol
        26, Issue 18)
To: "'Mailing list for the SAXON XSLT and XQuery processor'"
        <saxon-help@...>
Message-ID: <2C04A73E503F45048B8009E8ECC3E371@Sealion>
Content-Type: text/plain; charset="US-ASCII"

 > > However, when I trace into the code with a query such as "for $x in
 > > (#saxon:stream#) {doc('0')/a/b[1]} return $x", I remark the following:
 > >
 > > - In situation #1, the whole source is read on the first call
 > > to it.hasNext().
 > > - In situation #2, the whole source is read before any event
 > > is called on the instance of MyReceiver.
 > >
 > > Clearly the query result cannot change once the closing tag
 > > of the first "b"
 > > has been read; therefore there should be no need to read the
 > > rest of the document, let alone to postpone sending the
 > > output until the end of the file.

There are many path expressions that could theoretically be streamed, but
which Saxon does not in fact stream. For streaming to work, you must use an
expression that follows the rules in

http://www.saxonica.com/documentation/sourcedocs/serial/streamability.html

Note in particular that this does not allow positional predicates.

You should be able to rewrite the expression to get around this restriction:
try

((#saxon:stream#) {doc('0')/a/b})[1]"

 > >
 > > Yet the query "for $x in (#saxon:stream#) {doc('0')/a/b}
 > > return $x", which returns all b elements, gives me the
 > > desired behavior:  it sends them one by one to the output as
 > > the input is being read. I don't understand why the processor
 > > behaves differently for the two queries.
 > >
 > > After some experiments, I discovered though that as soon as
 > > the "in" clause includes sibling functions ([1],
 > > following-sibling, etc.) or unions of two path expressions,
 > > or if the return clause applies further operations on the
 > > result (such as $x/b above), the whole document is read prior
 > > to outputting the results.
 > >
This really shouldn't require experiments, the rules are written down on the
page referenced above.

As to "why", it's simply a question of time and effort for doing the
optimizations and testing them. I started with the streaming subset of XPath
defined in XML Schema, and then added a few extra capabilities like union
paths and simple boolean predicates. One of the reasons that positional
predicates aren't currently supported is that the streaming evaluator
currently has to make a decision whether to include a node in the result or
not based solely on knowledge of the names of the ancestors of the node and
the values of its attributes; it doesn't retain any memory about preceding
siblings of any of those nodes.

Also, you shouldn't need to "trace into the code" to see whether streaming
is being used. Use the -explain option on the command line, or
processor.setConfigurationProperty(FeatureKeys.TRACE_OPTIMIZER_DECISIONS,
Boolean.TRUE) from the Java API.

Michael Kay
http://www.saxonica.com/

--
Sylvain Hallé, finissant du doctorat / Ph.D. Candidate

Département d'informatique
Université du Québec à Montréal
C.P. 8888, Succ. Centre-ville
Montréal (Québec)
CANADA H3C 3P8

E-mail: shalle@...
Web:    www.leduotang.com/sylvain
Tel:    +1 (514) 987-4186
Fax:    +1 (514) 987-8477

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
saxon-help@...
https://lists.sourceforge.net/lists/listinfo/saxon-help