|
View:
New views
14 Messages
—
Rating Filter:
Alert me
|
|
|
Mark LogicAt the risk of starting a flame war...
We're having a visit from a Mark Logic salesperson and I asked about how ML compares to eXist (which we're already using). The response I got wasn't very enlightening other than that it has 100% support for XQuery and that several of their customers started with eXist and then switched when they moved to production (I don't want to call this FUD, because it's probably true, and I don't have any context re when these decisions might have been made). We've been running eXist in production for a year, so I don't really buy it. I'm sure there are differences in scaleability, etc. and I was wondering if anyone on the list has experience with ML and is able to talk about how the two line up. Thanks, Hugh /** * Hugh A. Cayless, Ph.D * Head, Research & Development Group * Carolina Digital Library and Archives * UNC Chapel Hill * hcayless@... */ ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
|
|
Re: Mark LogicOn Sep 12, 2008, at 3:07 PM, Hugh Cayless wrote: > At the risk of starting a flame war... > > We're having a visit from a Mark Logic salesperson and I asked about > how ML compares to eXist (which we're already using). The response I > got wasn't very enlightening other than that it has 100% support for > XQuery and that several of their customers started with eXist and then > switched when they moved to production (I don't want to call this FUD, > because it's probably true, and I don't have any context re when these > decisions might have been made). We've been running eXist in > production for a year, so I don't really buy it. I'm sure there are > differences in scaleability, etc. and I was wondering if anyone on the > list has experience with ML and is able to talk about how the two line > up. When ML was pitched in a meeting I attended, the recommended hardware was 5 64bit machines maxed out with RAM. I wondered how eXist would run on hardware like that. (I also thought, sheesh, why not just use the filesystem and load it in memory if you have that kind of RAM and horsepower.) -Rob > > > Thanks, > Hugh > > /** > * Hugh A. Cayless, Ph.D > * Head, Research & Development Group > * Carolina Digital Library and Archives > * UNC Chapel Hill > * hcayless@... > */ > > > > > > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > Build the coolest Linux based applications with Moblin SDK & win > great prizes > Grand prize is a trip for two to an Open Source event anywhere in > the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Exist-open mailing list > Exist-open@... > https://lists.sourceforge.net/lists/listinfo/exist-open ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
|
|
Re: Mark LogicHi,
On Sep 12, 2008, at 21:07 , Hugh Cayless wrote: > The response I got wasn't very enlightening other than that it has > 100% support for > XQuery.... I remember MarkLogic kept using an old 2003 version of the xquery spec for a long long time. Although I can't imagine they don't support the final 1.0 spec right now, I can't recall I have seen announcements on this, and consulting my friend Google yields into some pages where they state to try to be compatible with the 2003 version. Any one a clue? > We've been running eXist in production for a year, so I don't > really buy it. I'm sure there are > differences in scaleability, etc. and I was wondering if anyone on > the list has experience with ML and is able to talk about how the > two line up. Sometimes eXist is compared to MarkLogic as MySQL to e.g. Oracle ; basically two different worlds :-) MarkLogic does a very good job regarding scalability. MarkLogic kinda runs fine on a PC but can also be executed in a distributed/clustered environment probably dealing with Terabytes of data (from public resources: Elsevier and O'reilly use ML for there business). Regarding the terabytes, eXist does not scale up-to that...... On the other hand, the licensing model of MarkLogic is also quite scalable. For enterprise level scale, you pay a significant price for a license......... regards Dannes -- eXist-db Open Source Native XML Database http://exist-db.org ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
|
|
Re: Mark LogicOn Sep 13, 2008, at 18:04 , Dannes Wessels wrote: The response I got wasn't very enlightening other than that it has 100% support forXQuery.... found it (ML server 3.2) 8.2 Compatibility with XQuery Drafts This release implements the XQuery language, functions and operators specified in the May 02, 2003 W3C XQuery Working Group Draft Recommendations: Additionally, much of the added functionality in the January 2007 W3C XQuery Recommendation is implemented in MarkLogic Server 3.2. -- ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
|
|
Re: Mark LogicOn Sep 13, 2008, at 12:04 PM, Dannes Wessels wrote: > Sometimes eXist is compared to MarkLogic as MySQL to e.g. Oracle ; > basically two different worlds :-) That was the analogy I was wondering whether it was safe to make :-). MySQL isn't nearly as Enterprisey as Oracle, but you can do a lot with it, including use it in "Enterprise" solutions, albeit with different architectures than you would Oracle. We use MySQL too (even though there's a University site license for Oracle). > > > MarkLogic does a very good job regarding scalability. MarkLogic > kinda runs fine on a PC but can also be executed in a distributed/ > clustered environment probably dealing with Terabytes of data (from > public resources: Elsevier and O'reilly use ML for there business). > > Regarding the terabytes, eXist does not scale up-to that...... On > the other hand, the licensing model of MarkLogic is also quite > scalable. For enterprise level scale, you pay a significant price > for a license......... Yeah, that's the other thing... What are the reasonable limits of eXist right now? I'm wondering where that boundary is where it would be economically sensible to spend the money on the software rather than on development. Thanks for all the helpful responses. Hugh > > > regards > > Dannes ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
|
|
Re: Mark LogicOn Sep 13, 2008, at 3:28 PM, Hugh Cayless wrote: > > What are the reasonable limits of eXist right now? That would be good to know. What are people out in the wild dealing with? Have you hit limits? Another thing to think about is, do you need everything in one eXist instance or cluster? Are there logical boundaries where the content can live independently of other content? best, -Rob ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
|
|
Re: Mark Logic> What are the reasonable limits of eXist right now?
This really depends a lot on the structure of your data, the type of queries and your update requirements. It is difficult to generalize. However, speaking of limitations, there are some well-known issues which I can point out (again). They are the next steps on my roadmap for this year: * removal/updates of documents: the time to remove a document (or some nodes of it) does directly depend on the collection size. This happens because eXist organizes all indexes by collection. The more docs you store into one collection (not including sub-collections though), the longer it takes to update a document. If you have more than a few thousand docs in one collection, updating them can become a serious bottleneck (the usual workaround is to manually split the document space into sub-collections). The solution would be to completely separate collection and document storage in eXist. Collections should be viewed as logical, not physical units. Indexes should no longer be organized by collection (internally I would prefer to organize them into fixed-size partitions). * the separation of collections and documents would also help to reduce memory consumption at query time, in particular concerning the amount of memory needed for document metadata when working with huge document sets (millions of docs). Loading the document metadata can take a major part of the overall query time. * finally, we have already pushed the limits for some types of queries - those that can make a selective use of indexes. For those queries, the new query-rewriting optimizer can improve query times considerably and this is a huge step forward. However, there are other areas which need different optimizations, e.g. when a huge node set needs to be iterated node-by-node. As a basis for future developments, I already started to implement a dedicated statistics module, which collects information on the distribution of nodes in the db. The idea behind this is to better estimate the number of nodes which can be expected as input to a certain expression and thus apply the best optimization method. Wolfgang ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
|
|
|
|
|
Re: Mark LogicHi,
On Sep 14, 2008, at 15:46 , Andrew Trese wrote: > somewhere in the thread, somebody said something like, ML can scale > up to terabytes, where eXist can not. can anyone comment on the > upper limit of storage in eXist? this number can probably be > stretched by proper index and collection management, but can anyone > comment on upper levels they have hit, where XQuery and other > performance metrics begin to degrade, etc? the topic has been on the ML a few times, e.g. http://markmail.org/message/6kp3mxg7ktwrio7g regards Dannes eXist-db Open Source Native XML Database http://exist-db.org ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
|
|
Re: Mark LogicOn Sep 14, 2008, at 10:20 AM, Dannes Wessels wrote: > Hi, > > On Sep 14, 2008, at 15:46 , Andrew Trese wrote: > >> somewhere in the thread, somebody said something like, ML can scale >> up to terabytes, where eXist can not. can anyone comment on the >> upper limit of storage in eXist? this number can probably be >> stretched by proper index and collection management, but can anyone >> comment on upper levels they have hit, where XQuery and other >> performance metrics begin to degrade, etc? > > the topic has been on the ML a few times, e.g. http://markmail.org/message/6kp3mxg7ktwrio7g So this is a sufficient examination? And Wolfgang's: "However, speaking of limitations, there are some well-known issues which I can point out (again). " Perhaps having to explain it again indicates a lack of documentation? Or at least a lack of easily findable information. Searches that include the word 'exist' tend toward irrelevance. As someone who is trying to make a case for eXist, I find these lacking. ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
|
|
Re: Mark LogicI can, but I am on vacation ... give me a shout in a week
(jim.fuller@...). cheers, Jim Fuller On Sun, Sep 14, 2008 at 3:46 PM, Andrew Trese <atrese@...> wrote: > Wolfgang, Thanks for the great synopsis of known technical gates that affect > large scales and are being targeted. > > Team, > it would still be great if the people "out in the wild" could comment on how > large their big implementations are and if they have hit walls somewhere. > > somewhere in the thread, somebody said something like, ML can scale up to > terabytes, where eXist can not. can anyone comment on the upper limit of > storage in eXist? this number can probably be stretched by proper index and > collection management, but can anyone comment on upper levels they have hit, > where XQuery and other performance metrics begin to degrade, etc? > > thanks, > andrew > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Exist-open mailing list > Exist-open@... > https://lists.sourceforge.net/lists/listinfo/exist-open > > ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
|
|
Re: Mark LogicHi Andrew,
some months ago I was doing some scalability experiments with eXist, and the biggest instance used about 1.5GB of memory and about 170GB for the storage, including indexes needed for queries. Of course, I had to apply the tweaks pointed out by Wolfgang, so data load was made in a reasonable time (about 3 days). Talking about bottlenecks, there is at least one related to eXist native full-text indexes bigger than cache size (leading to a thrashing situation) which is avoided limiting the complexity of the indexed content (for instance, splitting the collection). I still have to test new Lucene index in order to see which one behaves better on queries and data loads. Best Regards, José María Andrew Trese wrote: > Wolfgang, Thanks for the great synopsis of known technical gates that > affect large scales and are being targeted. > > Team, > it would still be great if the people "out in the wild" could comment on > how large their big implementations are and if they have hit walls > somewhere. > > somewhere in the thread, somebody said something like, ML can scale up > to terabytes, where eXist can not. can anyone comment on the upper > limit of storage in eXist? this number can probably be stretched by > proper index and collection management, but can anyone comment on upper > levels they have hit, where XQuery and other performance metrics begin > to degrade, etc? > > thanks, > andrew > > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > > > ------------------------------------------------------------------------ > > _______________________________________________ > Exist-open mailing list > Exist-open@... > https://lists.sourceforge.net/lists/listinfo/exist-open -- "There is no reason why anybody would want a computer in their home" - Ken Olson, founder of DEC 1977 "640K ought to be enough for anybody" - Bill Gates, 1981 "Nobody will ever outgrow a 20Mb hard drive." - ??? "Premature optimization is the root of all evil." - Donald Knuth José María Fernández González Tlfn: (+34) 91 732 80 00 / 91 224 69 00 (ext 3061) e-mail: jmfernandez@... Fax: (+34) 91 224 69 76 Unidad del Instituto Nacional de Bioinformática Biología Estructural y Biocomputación Structural Biology and Biocomputing Centro Nacional de Investigaciones Oncológicas C.P.: 28029 Zip Code: 28029 C/. Melchor Fernández Almagro, 3 Madrid (Spain) **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en su caso los ficheros adjuntos, pueden contener información protegida para el uso exclusivo de su destinatario. Se prohíbe la distribución, reproducción o cualquier otro tipo de transmisión por parte de otra persona que no sea el destinatario. Si usted recibe por error este correo, se ruega comunicarlo al remitente y borrar el mensaje recibido. **CONFIDENTIALITY NOTICE** This email communication and any attachments may contain confidential and privileged information for the sole use of the designated recipient named above. Distribution, reproduction or any other use of this transmission by any party other than the intended recipient is prohibited. If you are not the intended recipient please contact the sender and delete all copies. ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
|
|
Re: Mark LogicI'd like to echo the question about eXist's scalability. I've been
trying to persuade programmers in my university to look at it, but they tend to be skeptical, and they often seem to remember earlier versions. How big a job can eXist handle? seems to me a question that requires a three-dimensional answer: 1. The total number of documents or file the system can handle before it is likely to run into performance problesm 2. The maximum size of documents 3. The nesting of documents I assume, without technical knowledge, that these questions interact. In the world in which I work documents can nest pretty deeply. Things like body/div/div/epigraph/lg/l/hi are not unusual. It would be even better to couple general remarks with some examples. I know that exist is being used for the Anglo-Normal dictionary, but I don't know what kind of scale issues it faces. The question that started this thread comes from Hugh Cayless, and I suspect, without particular knowledge, that the question has its practical origin in coping with the State Records of North Carolina, a very substantial text archive, and probably of considerable complexity. I've seen some very striking uses of exist at the University of Victoria, but those projects on the face of it didn't have big scale problems, where you have thousands of documents, some of which may run into thousands of pages. On Sep 14, 2008, at 10:36 AM, Robert Koberg wrote: > > On Sep 14, 2008, at 10:20 AM, Dannes Wessels wrote: > >> Hi, >> >> On Sep 14, 2008, at 15:46 , Andrew Trese wrote: >> >>> somewhere in the thread, somebody said something like, ML can scale >>> up to terabytes, where eXist can not. can anyone comment on the >>> upper limit of storage in eXist? this number can probably be >>> stretched by proper index and collection management, but can anyone >>> comment on upper levels they have hit, where XQuery and other >>> performance metrics begin to degrade, etc? >> >> the topic has been on the ML a few times, e.g. http://markmail.org/message/6kp3mxg7ktwrio7g > > > So this is a sufficient examination? > > And Wolfgang's: > > "However, speaking of limitations, there are some well-known issues > which > I can point out (again). " > > Perhaps having to explain it again indicates a lack of documentation? > Or at least a lack of easily findable information. Searches that > include the word 'exist' tend toward irrelevance. > > As someone who is trying to make a case for eXist, I find these > lacking. > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > Build the coolest Linux based applications with Moblin SDK & win > great prizes > Grand prize is a trip for two to an Open Source event anywhere in > the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Exist-open mailing list > Exist-open@... > https://lists.sourceforge.net/lists/listinfo/exist-open ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
|
|
Re: Mark Logic> 3. The nesting of documents
> > I assume, without technical knowledge, that these questions interact. > In the world in which I work documents can nest pretty deeply. Things > like body/div/div/epigraph/lg/l/hi are not unusual. The nesting depth is not that relevant, at least for query speed. Contrary to most in-memory implementations, eXist does not need to traverse the document node tree. Instead, it uses the structural index to directly select nodes, wherever they are in the tree. For example, /body//div should be as fast as /body//hi, though <hi> is much deeper in the tree. I often see people trying to *optimize* their query by avoiding the descendant axis. For example, instead of a quick /TEI//div, they specify the full path: /TEI/text/body/div. Don't do this unless it is really relevant! a//d requires just 2 index lookups, but a/b/c/d requires 4! For query speed, it is important to look at the raw number of nodes that have to be processed in a given expression. If you execute a query like //div[p &= "xxx"], we can expect the query time to increase with a growing number of divs and p's in the database. Assuming that the string 'xxx' is rather rare in the db, most of the time will be spent on computing the parent-child join div/p. The earlier you can limit the number of nodes in the context set, the better. And this is exactly where the new query optimizations in 1.2.x come to the rescue: with a proper index defined on p, eXist can limit the range of nodes in advance, before it starts evaluating the rest of the expression. So instead of computing a join between a few million div and p nodes, we only need to do that for a dozen nodes (those which actually contain 'xxx' in p). Wolfgang ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
| Free Forum Powered by Nabble | Forum Help |