|
View:
New views
15 Messages
—
Rating Filter:
Alert me
|
|
|
Having problem indexing/uploading large xml document (450MB)Hi, I'm working with de DBLP xml dataset, which is about 450MB, I went to the admin section of eXist, then to the Browse Selection and tried to upload the document. It takes a few seconds, and then a blank page shows up with just the menu on the left side. The document was not uploaded and of course I can't query it.
What's wrong? Another question I have, is not really about eXist, but I think someone here can anwser. I'm parsing this DBLP XML dataset (with 450MB) with the Xerces SAX Parser (java). The problem is i'm getting the 64000 entity expansion limit error, with the limit up to 100000 I still get the error, but when I raise the limit again to 150000 or more, the parser takes hours and doens't finish. I also tried to set FEATURE_SECURE_PROCESSING to false, so the parser stop worrying about entity limit, but it still takes hours to index. Anyone knows this problem? Thanks Felipe Hummel ------------------------------------------------------------------------- Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
Re: Having problem indexing/uploading large xml document (450MB)Hi,
Felipe Hummel a écrit : > Hi, I'm working with de DBLP xml dataset, which is about 450MB, I went > to the admin section of eXist, then to the Browse Selection and tried to > upload the document. It takes a few seconds, and then a blank page shows > up with just the menu on the left side. The document was not uploaded > and of course I can't query it. > What's wrong? See the logs : they should tell you what's going wrong. Possibly a memory issue. FYI : eXist's benchmark uses the DBLP without any problem. Cheers, p.b. ------------------------------------------------------------------------- Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
Re: Having problem indexing/uploading large xml document (450MB)> Hi, I'm working with de DBLP xml dataset, which is about 450MB, I went to
> the admin section of eXist, then to the Browse Selection and tried to upload > the document. It takes a few seconds, and then a blank page shows up with > just the menu on the left side. The document was not uploaded and of course > I can't query it. > What's wrong? 450MB are a bit too much for uploading via web forms. I guess the web server times out the request or runs out of memory. Please use the Java admin client for jobs like this. For maximum performance, you can launch the client in embedded mode, which saves you the time for uploading the data. To parse the DBLP fast enough, you should also increase cacheSize in conf.xml, as well as provide a little more memory to Java. Please read the following notes though: http://atomic.exist-db.org/blogs/eXist/WarningBadMemory Wolfgang ------------------------------------------------------------------------- Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
Re: Having problem indexing/uploading large xml document (450MB)Hi, thanks for the anwsers.
I tried to upload dblp.xml through the java admin client (embedded mode), but I got this error: Impossible to store a resource /home/felipehummel/workspace/dblp.xml: error at (4,74) : Attribute "mdate" must be declared for element type "incollection". It found something wrong in the dblp.dtd, but if I remove the DTD line from the dblp.xml, then I get this error: Impossible to store a resource /home/felipehummel/workspace/dblp.xml: fatal error at (18,20) : The entity "eacute" was referenced, but not declared. Any suggestion? Thank you Felipe Hummel On Thu, Jul 3, 2008 at 4:56 AM, Wolfgang Meier <wolfgang@...> wrote:
------------------------------------------------------------------------- Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
Re: Having problem indexing/uploading large xml document (450MB)HI,
On Fri, Jul 4, 2008 at 4:09 AM, Felipe Hummel <felipehummel@...> wrote: > It found something wrong in the dblp.dtd, but if I remove the DTD line from > the dblp.xml, then I get this error: > >> Impossible to store a resource /home/felipehummel/workspace/dblp.xml: >> fatal error at (18,20) : The entity "eacute" was referenced, but not >> declared. You could register the DTD in the 'catalog.xml' ; other files should be resolved relative to this catalog. regards Dannes ------------------------------------------------------------------------- Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
Re: Having problem indexing/uploading large xml document (450MB)Hi,
On Fri, Jul 4, 2008 at 8:30 AM, Dannes Wessels <dizzzz@...> wrote: > You could register the DTD in the 'catalog.xml' ; other files should > be resolved relative to this catalog. or switch off validation in conf.xml ; regards Dannes ------------------------------------------------------------------------- Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
Re: Having problem indexing/uploading large xml document (450MB)Hi. After letting the indexing running over the night. I got this error:
"Impossible to store a resource /home/felipehummel/workspace/dblp.xml: 68" The process was finished (100%), but still with this error. But the thing is, now the dblp.xml is in the resources with the other xml documents. But I wasn't able to query it over the Sandbox interface. I'm still going to check the logs for the reason of the error. Thanks for the tips. Felipe Hummel On Fri, Jul 4, 2008 at 2:30 AM, Dannes Wessels <dizzzz@...> wrote: Hi, ------------------------------------------------------------------------- Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
Re: Having problem indexing/uploading large xml document (450MB)I've checked the Logs for the time I tried to upload dblp.xml 450MB database:
At xmldb.log: 2008-07-07 00:34:49,833 [Thread-8] DEBUG (LocalCollection.java [storeResource]:647) - storing document dblp.xml At exist.log, I found 3 log files of almost the same error, all about the same time of the error above (01:26:05), I copied here (skipping stacktrace) only a few times those messages appears, but they do appear a lot more . 2008-07-07 01:26:00,408 [Thread-8] DEBUG (BFile.java [<init>]:2556) - not a data-page: page: 32282; file = words.dbx; address = 7e1b000; page header = 64; data start = 7e1b040 Just to remember my previous e-mail, the message I got from the admin Client after finishing indexing the dblp document was: "Impossible to store a resource /home/felipehummel/workspace/dblp.xml: 68" Any help? Thank you! Felipe Hummel On Mon, Jul 7, 2008 at 9:46 AM, Felipe Hummel <felipehummel@...> wrote: Hi. After letting the indexing running over the night. I got this error: ------------------------------------------------------------------------- Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
Re: Having problem indexing/uploading large xml document (450MB)Hi,
Felipe Hummel a écrit : > At exist.log, I found 3 log files of almost the same error, all about > the same time of the error above (01:26:05), I copied here (skipping > stacktrace) only a few times those messages appears, but they do appear > a lot more . > > /2008-07-07 01:26:00,408 [Thread-8] DEBUG (BFile.java [<init>]:2556) > - not a data-page: page: 32282; file = words.dbx; address = 7e1b000; > page header = 64; data start = 7e1b040 > java.io.IOException: not a data-page: 0 > > 2008-07-07 01:26:00,385 [Thread-8] DEBUG (BFile.java [<init>]:2556) > - not a data-page: page: 32282; file = words.dbx; address = 7e1b000; > page header = 64; data start = 7e1b040 > java.io.IOException: not a data-page: 0 > > 2008-07-07 01:26:00,384 [Thread-8] DEBUG (BFile.java [<init>]:2556) > - not a data-page: page: 32282; file = words.dbx; address = 7e1b000; > page header = 64; data start = 7e1b040 > java.io.IOException: not a data-page: 0/ > > Any help? You apparently have a corruption in the fulltext index. Disable fulltext undexing, upload, re-enable fulltext indexing then reindex ? Cheers, p.b. ------------------------------------------------------------------------- Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
Re: Having problem indexing/uploading large xml document (450MB)> /2008-07-07 00:34:49,833 [Thread-8] DEBUG (LocalCollection.java
> [storeResource]:647) - storing document dblp.xml > 2008-07-07 01:26:05,011 [Thread-8] ERROR (LocalCollection.java > [storeXMLResource]:755) - java.lang.ArrayIndexOutOfBoundsException: 68 / Are you sure you have enough disk space? I'm using the dblp for some tests and - while it does take some time to store the file - it does work as expected. You can try running build.sh benchmark I use cacheSize="128M" in conf.xml and 768m max memory for Java. Wolfgang ------------------------------------------------------------------------- Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
Re: Having problem indexing/uploading large xml document (450MB)I do have disk space. But I think the problem may be my swap space that is only 512MB (my main memory is 1GB).
I just tried to run it again, and got almost the same error: [Thread-8] ERROR (LocalCollection.java [storeXMLResource]:755) - java.lang.ArrayIndexOutOfBoundsException: 68 I'll try to run in another machine, can eXist DB be totally installed from the terminal? Thanks a lot. On Thu, Jul 10, 2008 at 6:16 PM, Wolfgang <wolfgang@...> wrote: /2008-07-07 00:34:49,833 [Thread-8] DEBUG (LocalCollection.java ------------------------------------------------------------------------- Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
Re: Having problem indexing/uploading large xml document (450MB)I just uploaded the dblp.xml to exist in a server machine, through the command-line: bin/client.sh -l -u admin -P 123 -m /db/dblp -p /root/hummel/xlabrador/dblp.xml with this output: creating '/db/dblp' Although it seens to be a successful operation, I'm looking at the browse collection webpage and the dblp.xml doesn't shows there (neither in the Java admin client interface). I also tried to store another (much smaller) dataset, and got the same result. I'm trying to upload the document so I can query it (I'm testing through the Sandbox). One problem is solved, I could upload the dblp.xml (450MB) but it doesn't appear on the admin interface, neither I can query it. Anyone? Thank you guys for the anwsers. Felipe Hummel On Fri, Jul 11, 2008 at 1:46 PM, Felipe Hummel <felipehummel@...> wrote: I do have disk space. But I think the problem may be my swap space that is only 512MB (my main memory is 1GB). ------------------------------------------------------------------------- Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
Re: Having problem indexing/uploading large xml document (450MB)Sorry for almost double posting. But I just got to upload correctly the dblp.xml dataset. I executed like this:
bin/client.sh -u admin -P 123 -C /root/hummel/exist/conf.xml -m /db/dblp -p /root/hummel/xlabrador/dblp.xml Without the "-l" option. Well, now the sandbox quering is working, but not properly, I'm getting this type of error:
The error is repeated for each result. I think that is a interface problem, since through the SOAP interface (using PheXist) I could retrieve the results properly. Thanks Felipe Hummel
------------------------------------------------------------------------- Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
Re: Having problem indexing/uploading large xml document (450MB)> Well, now the sandbox quering is working, but not properly, I'm getting this
> type of error: > >> Error found >> >> Error checking function parameter 2 in call transform:transform($item, >> doc(untyped-value-check[xs:string, $sandbox:XML_HIGHLIGHT_STYLE]), ): The >> actual cardinality for parameter 2 does not match the cardinality declared For the sandbox to work properly, you should open the admin webapp and go through the "Example Setup" once. This will install the stylesheets required for the sandbox. Wolfgang ------------------------------------------------------------------------- Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
Re: Having problem indexing/uploading large xml document (450MB)Thanks a lot. Now everything is working fine. I'm developing a XML Keyword search engine at the top of eXist for research purposes. Now, I'm just working around socket connection between my system and PHP interface.
Once again, thanks for the help. On Mon, Jul 14, 2008 at 5:20 PM, Wolfgang Meier <wolfgangmm@...> wrote:
------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
| Free Forum Powered by Nabble | Forum Help |