Mark Logic

View: New views
14 Messages — Rating Filter:   Alert me  

Mark Logic

by Hugh Cayless-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

At the risk of starting a flame war...

We're having a visit from a Mark Logic salesperson and I asked about  
how ML compares to eXist (which we're already using).  The response I  
got wasn't very enlightening other than that it has 100% support for  
XQuery and that several of their customers started with eXist and then  
switched when they moved to production (I don't want to call this FUD,  
because it's probably true, and I don't have any context re when these  
decisions might have been made).  We've been running eXist in  
production for a year, so I don't really buy it.  I'm sure there are  
differences in scaleability, etc. and I was wondering if anyone on the  
list has experience with ML and is able to talk about how the two line  
up.

Thanks,
Hugh

/**
  * Hugh A. Cayless, Ph.D
  * Head, Research & Development Group
  * Carolina Digital Library and Archives
  * UNC Chapel Hill
  * hcayless@...
  */







-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: Mark Logic

by Robert Koberg :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Sep 12, 2008, at 3:07 PM, Hugh Cayless wrote:

> At the risk of starting a flame war...
>
> We're having a visit from a Mark Logic salesperson and I asked about
> how ML compares to eXist (which we're already using).  The response I
> got wasn't very enlightening other than that it has 100% support for
> XQuery and that several of their customers started with eXist and then
> switched when they moved to production (I don't want to call this FUD,
> because it's probably true, and I don't have any context re when these
> decisions might have been made).  We've been running eXist in
> production for a year, so I don't really buy it.  I'm sure there are
> differences in scaleability, etc. and I was wondering if anyone on the
> list has experience with ML and is able to talk about how the two line
> up.

When ML was pitched in a meeting I attended, the recommended hardware  
was 5 64bit machines maxed out with RAM. I wondered how eXist would  
run on hardware like that. (I also thought, sheesh, why not just use  
the filesystem and load it in memory if you have that kind of RAM and  
horsepower.)

-Rob




>
>
> Thanks,
> Hugh
>
> /**
>  * Hugh A. Cayless, Ph.D
>  * Head, Research & Development Group
>  * Carolina Digital Library and Archives
>  * UNC Chapel Hill
>  * hcayless@...
>  */
>
>
>
>
>
>
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's  
> challenge
> Build the coolest Linux based applications with Moblin SDK & win  
> great prizes
> Grand prize is a trip for two to an Open Source event anywhere in  
> the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Exist-open mailing list
> Exist-open@...
> https://lists.sourceforge.net/lists/listinfo/exist-open


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: Mark Logic

by Dannes Wessels-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

On Sep 12, 2008, at 21:07 , Hugh Cayless wrote:

> The response I got wasn't very enlightening other than that it has  
> 100% support for
> XQuery....

I remember MarkLogic kept using an old 2003 version of the xquery spec  
for a long long time. Although I can't imagine they don't support the  
final 1.0 spec right now, I can't recall I  have seen announcements on  
this, and consulting my friend Google yields into some pages where  
they state to try to be compatible with the 2003 version.

Any one a clue?

> We've been running eXist in  production for a year, so I don't  
> really buy it.  I'm sure there are
> differences in scaleability, etc. and I was wondering if anyone on  
> the  list has experience with ML and is able to talk about how the  
> two line  up.

Sometimes eXist is compared to MarkLogic as MySQL to e.g. Oracle ;  
basically two different worlds :-)

MarkLogic does a very good job regarding scalability. MarkLogic kinda  
runs fine on a PC but can also be executed in a distributed/clustered  
environment probably dealing with Terabytes of data (from public  
resources: Elsevier and O'reilly use ML for there business).

Regarding the terabytes, eXist does not scale up-to that...... On the  
other hand, the licensing model of MarkLogic is also quite scalable.  
For enterprise level scale, you pay a significant price for a  
license.........

regards

Dannes


--
eXist-db Open Source Native XML Database
http://exist-db.org








-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

smime.p7s (2K) Download Attachment

Re: Mark Logic

by Dannes Wessels-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Sep 13, 2008, at 18:04 , Dannes Wessels wrote:

The response I got wasn't very enlightening other than that it has 100% support for
XQuery....

I remember MarkLogic kept using an old 2003 version of the xquery spec for a long long time. Although I can't imagine they don't support the final 1.0 spec right now, I can't recall I  have seen announcements on this, and consulting my friend Google yields into some pages where they state to try to be compatible with the 2003 version.

found it (ML server 3.2)

8.2 Compatibility with XQuery Drafts 
This release implements the XQuery language, functions and operators specified in the May 02, 
2003 W3C XQuery Working Group Draft Recommendations: 
Additionally, much of the added functionality in the January 2007 W3C XQuery 
Recommendation is implemented in MarkLogic Server 3.2. 


--
eXist-db Open Source Native XML Database








-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

smime.p7s (2K) Download Attachment

Re: Mark Logic

by Hugh Cayless-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Sep 13, 2008, at 12:04 PM, Dannes Wessels wrote:

> Sometimes eXist is compared to MarkLogic as MySQL to e.g. Oracle ;  
> basically two different worlds :-)

That was the analogy I was wondering whether it was safe to make :-).  
MySQL isn't nearly as Enterprisey as Oracle, but you can do a lot with  
it, including use it in "Enterprise" solutions, albeit with different  
architectures than you would Oracle.  We use MySQL too (even though  
there's a University site license for Oracle).

>
>
> MarkLogic does a very good job regarding scalability. MarkLogic  
> kinda runs fine on a PC but can also be executed in a distributed/
> clustered environment probably dealing with Terabytes of data (from  
> public resources: Elsevier and O'reilly use ML for there business).
>
> Regarding the terabytes, eXist does not scale up-to that...... On  
> the other hand, the licensing model of MarkLogic is also quite  
> scalable. For enterprise level scale, you pay a significant price  
> for a license.........

Yeah, that's the other thing...

What are the reasonable limits of eXist right now?  I'm wondering  
where that boundary is where it would be economically sensible to  
spend the money on the software rather than on development.

Thanks for all the helpful responses.

Hugh

>
>
> regards
>
> Dannes


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: Mark Logic

by Robert Koberg :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Sep 13, 2008, at 3:28 PM, Hugh Cayless wrote:
>
> What are the reasonable limits of eXist right now?

That would be good to know. What are people out in the wild dealing  
with? Have you hit limits?

Another thing to think about is, do you need everything in one eXist  
instance or cluster? Are there logical boundaries where the content  
can live independently of other content?

best,
-Rob

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: Mark Logic

by Wolfgang Meier-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> What are the reasonable limits of eXist right now?

This really depends a lot on the structure of your data, the type of
queries and your update requirements. It is difficult to generalize.

However, speaking of limitations, there are some well-known issues which
I can point out (again). They are the next steps on my roadmap for this
year:

* removal/updates of documents: the time to remove a document (or some
nodes of it) does directly depend on the collection size. This happens
because eXist organizes all indexes by collection. The more docs you
store into one collection (not including sub-collections though), the
longer it takes to update a document. If you have more than a few
thousand docs in one collection, updating them can become a serious
bottleneck (the usual workaround is to manually split the document space
into sub-collections).

The solution would be to completely separate collection and document
storage in eXist. Collections should be viewed as logical, not physical
units. Indexes should no longer be organized by collection (internally I
would prefer to organize them into fixed-size partitions).

* the separation of collections and documents would also help to reduce
memory consumption at query time, in particular concerning the amount of
memory needed for document metadata when working with huge document sets
(millions of docs). Loading the document metadata can take a major part
of the overall query time.

* finally, we have already pushed the limits for some types of queries -
those that can make a selective use of indexes. For those queries, the
new query-rewriting optimizer can improve query times considerably and
this is a huge step forward. However, there are other areas which need
different optimizations, e.g. when a huge node set needs to be iterated
node-by-node. As a basis for future developments, I already started to
implement a dedicated statistics module, which collects information on
the distribution of nodes in the db. The idea behind this is to better
estimate the number of nodes which can be expected as input to a certain
expression and thus apply the best optimization method.

Wolfgang

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Parent Message unknown Re: Mark Logic

by Andrew Trese :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Wolfgang, Thanks for the great synopsis of known technical gates that affect large scales and are being targeted.

Team,
it would still be great if the people "out in the wild" could comment on how large their big implementations are and if they have hit walls somewhere. 

somewhere in the thread, somebody said something like, ML can scale up to terabytes, where eXist can not.   can anyone comment on the upper limit of storage in eXist?  this number can probably be stretched by proper index and collection management, but can anyone comment on upper levels they have hit, where XQuery and other performance metrics begin to degrade, etc?

thanks,
andrew

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: Mark Logic

by Dannes Wessels-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

On Sep 14, 2008, at 15:46 , Andrew Trese wrote:

> somewhere in the thread, somebody said something like, ML can scale  
> up to terabytes, where eXist can not.   can anyone comment on the  
> upper limit of storage in eXist?  this number can probably be  
> stretched by proper index and collection management, but can anyone  
> comment on upper levels they have hit, where XQuery and other  
> performance metrics begin to degrade, etc?

the topic has been on the ML a few times, e.g. http://markmail.org/message/6kp3mxg7ktwrio7g

regards

Dannes
eXist-db Open Source Native XML Database
http://exist-db.org








-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

smime.p7s (2K) Download Attachment

Re: Mark Logic

by Robert Koberg :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Sep 14, 2008, at 10:20 AM, Dannes Wessels wrote:

> Hi,
>
> On Sep 14, 2008, at 15:46 , Andrew Trese wrote:
>
>> somewhere in the thread, somebody said something like, ML can scale  
>> up to terabytes, where eXist can not.   can anyone comment on the  
>> upper limit of storage in eXist?  this number can probably be  
>> stretched by proper index and collection management, but can anyone  
>> comment on upper levels they have hit, where XQuery and other  
>> performance metrics begin to degrade, etc?
>
> the topic has been on the ML a few times, e.g. http://markmail.org/message/6kp3mxg7ktwrio7g


So this is a sufficient examination?

And Wolfgang's:

"However, speaking of limitations, there are some well-known issues  
which
I can point out (again). "

Perhaps having to explain it again indicates a lack of documentation?  
Or at least a lack of easily findable information. Searches that  
include the word 'exist' tend toward irrelevance.

As someone who is trying to make a case for eXist, I find these lacking.

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: Mark Logic

by James Fuller-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I can, but I am on vacation ... give me a shout in a week
(jim.fuller@...).

cheers, Jim Fuller

On Sun, Sep 14, 2008 at 3:46 PM, Andrew Trese <atrese@...> wrote:

> Wolfgang, Thanks for the great synopsis of known technical gates that affect
> large scales and are being targeted.
>
> Team,
> it would still be great if the people "out in the wild" could comment on how
> large their big implementations are and if they have hit walls somewhere.
>
> somewhere in the thread, somebody said something like, ML can scale up to
> terabytes, where eXist can not.   can anyone comment on the upper limit of
> storage in eXist?  this number can probably be stretched by proper index and
> collection management, but can anyone comment on upper levels they have hit,
> where XQuery and other performance metrics begin to degrade, etc?
>
> thanks,
> andrew
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great
> prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Exist-open mailing list
> Exist-open@...
> https://lists.sourceforge.net/lists/listinfo/exist-open
>
>

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: Mark Logic

by José María Fernández González-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Andrew,
        some months ago I was doing some scalability experiments with eXist, and the
biggest instance used about 1.5GB of memory and about 170GB for the storage,
including indexes needed for queries. Of course, I had to apply the tweaks
pointed out by Wolfgang, so data load was made in a reasonable time (about 3
days).

        Talking about bottlenecks, there is at least one related to eXist native
full-text indexes bigger than cache size (leading to a thrashing situation)
which is avoided limiting the complexity of the indexed content (for instance,
splitting the collection). I still have to test new Lucene index in order to
see which one behaves better on queries and data loads.

        Best Regards,
                José María

Andrew Trese wrote:

> Wolfgang, Thanks for the great synopsis of known technical gates that
> affect large scales and are being targeted.
>
> Team,
> it would still be great if the people "out in the wild" could comment on
> how large their big implementations are and if they have hit walls
> somewhere.
>
> somewhere in the thread, somebody said something like, ML can scale up
> to terabytes, where eXist can not.   can anyone comment on the upper
> limit of storage in eXist?  this number can probably be stretched by
> proper index and collection management, but can anyone comment on upper
> levels they have hit, where XQuery and other performance metrics begin
> to degrade, etc?
>
> thanks,
> andrew
>
>
> ------------------------------------------------------------------------
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Exist-open mailing list
> Exist-open@...
> https://lists.sourceforge.net/lists/listinfo/exist-open

--
"There is no reason why anybody would want a computer in their home" -
        Ken Olson, founder of DEC 1977
"640K ought to be enough for anybody" - Bill Gates, 1981
"Nobody will ever outgrow a 20Mb hard drive." - ???

"Premature optimization is the root of all evil." - Donald Knuth

José María Fernández González
Tlfn: (+34) 91 732 80 00 / 91 224 69 00 (ext 3061)
e-mail: jmfernandez@... Fax: (+34) 91 224 69 76
Unidad del Instituto Nacional de Bioinformática
Biología Estructural y Biocomputación Structural Biology and Biocomputing
Centro Nacional de Investigaciones Oncológicas
C.P.: 28029 Zip Code: 28029
C/. Melchor Fernández Almagro, 3 Madrid (Spain)

**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en su caso los ficheros adjuntos, pueden contener información protegida para el uso exclusivo de su destinatario. Se prohíbe la distribución, reproducción o cualquier otro tipo de transmisión por parte de otra persona que no sea el destinatario. Si usted recibe por error este correo, se ruega comunicarlo al remitente y borrar el mensaje recibido.
**CONFIDENTIALITY NOTICE** This email communication and any attachments may contain confidential and privileged information for the sole use of the designated recipient named above. Distribution, reproduction or any other use of this transmission by any party other than the intended recipient is prohibited. If you are not the intended recipient please contact the sender and delete all copies.


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: Mark Logic

by Martin Mueller-6 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I'd like to echo the question about eXist's scalability. I've been  
trying to persuade programmers in my university to look at it, but  
they tend to be skeptical, and they often seem to remember earlier  
versions. How big a job can eXist handle? seems to me a question that  
requires a three-dimensional answer:

1. The total number of documents or file the system can handle before  
it is likely to run into performance problesm
2. The maximum size of documents
3. The nesting  of documents

I assume, without technical knowledge, that these questions interact.  
In the world in which I work documents can nest pretty deeply. Things  
like body/div/div/epigraph/lg/l/hi are not unusual.

It would be even better to couple general remarks with some examples.  
I know that exist is being used for the Anglo-Normal dictionary, but I  
don't know what kind of scale issues it faces. The question that  
started this thread comes from Hugh Cayless, and I suspect, without  
particular knowledge, that the question has its practical origin in  
coping with the State Records of North Carolina, a very substantial  
text archive, and probably of considerable complexity.  I've seen some  
very striking uses of exist at the University of Victoria, but those  
projects on the face of it didn't have big scale problems, where you  
have thousands of documents, some of which may run into thousands of  
pages.





On Sep 14, 2008, at 10:36 AM, Robert Koberg wrote:

>
> On Sep 14, 2008, at 10:20 AM, Dannes Wessels wrote:
>
>> Hi,
>>
>> On Sep 14, 2008, at 15:46 , Andrew Trese wrote:
>>
>>> somewhere in the thread, somebody said something like, ML can scale
>>> up to terabytes, where eXist can not.   can anyone comment on the
>>> upper limit of storage in eXist?  this number can probably be
>>> stretched by proper index and collection management, but can anyone
>>> comment on upper levels they have hit, where XQuery and other
>>> performance metrics begin to degrade, etc?
>>
>> the topic has been on the ML a few times, e.g. http://markmail.org/message/6kp3mxg7ktwrio7g
>
>
> So this is a sufficient examination?
>
> And Wolfgang's:
>
> "However, speaking of limitations, there are some well-known issues
> which
> I can point out (again). "
>
> Perhaps having to explain it again indicates a lack of documentation?
> Or at least a lack of easily findable information. Searches that
> include the word 'exist' tend toward irrelevance.
>
> As someone who is trying to make a case for eXist, I find these  
> lacking.
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's  
> challenge
> Build the coolest Linux based applications with Moblin SDK & win  
> great prizes
> Grand prize is a trip for two to an Open Source event anywhere in  
> the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Exist-open mailing list
> Exist-open@...
> https://lists.sourceforge.net/lists/listinfo/exist-open


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: Mark Logic

by Wolfgang Meier-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> 3. The nesting  of documents
>
> I assume, without technical knowledge, that these questions interact.  
> In the world in which I work documents can nest pretty deeply. Things  
> like body/div/div/epigraph/lg/l/hi are not unusual.

The nesting depth is not that relevant, at least for query speed.
Contrary to most in-memory implementations, eXist does not need to
traverse the document node tree. Instead, it uses the structural index
to directly select nodes, wherever they are in the tree. For example,
/body//div should be as fast as /body//hi, though <hi> is much deeper in
the tree.

I often see people trying to *optimize* their query by avoiding the
descendant axis. For example, instead of a quick /TEI//div, they specify
the full path: /TEI/text/body/div. Don't do this unless it is really
relevant! a//d requires just 2 index lookups, but a/b/c/d requires 4!

For query speed, it is important to look at the raw number of nodes that
have to be processed in a given expression. If you execute a query like
//div[p &= "xxx"], we can expect the query time to increase with a
growing number of divs and p's in the database. Assuming that the string
'xxx' is rather rare in the db, most of the time will be spent on
computing the parent-child join div/p. The earlier you can limit the
number of nodes in the context set, the better.

And this is exactly where the new query optimizations in 1.2.x come to
the rescue: with a proper index defined on p, eXist can limit the range
of nodes in advance, before it starts evaluating the rest of the
expression. So instead of computing a join between a few million div and
p nodes, we only need to do that for a dozen nodes (those which actually
contain 'xxx' in p).

Wolfgang

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open
LightInTheBox - Buy quality products at wholesale price!