Slow parsing of rtf files even if they are small.

View: New views
4 Messages — Rating Filter:   Alert me  

Slow parsing of rtf files even if they are small.

by Soeren Laursen :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.

Hi,

 

I was looking at itextsharp for generating pdf files and found out that it also could be used for parsing/generationg rtf files.

 

So I have been using itextsharp to merge rtf files. The rtf files are very small and have almost no advance features, just text and color. Even thou when I am  merging rtf data using MemoryStream and the data are retrieved from a database it takes very long and the memory footprint goes up to about 400 to 450 mb.

 

Are anybody doing the same kind of batch merging?

 

I can see that if I am making one of the files empty the speed goes up so I guess it must be the parsing?

 

Are there anything gained if I know that I will be parsing for example 4 files that I do it in one go instead of merging them in single steps while I retrieve them from the database?

 

Doing:

rtfWriter.importRtfDocument( memFile1);

rtfWriter.importRtfDocument( memFile2);

rtfWriter.importRtfDocument( memFile3);

rtfWriter.importRtfDocument( memFile4);

 

instead of (because of the database layout):

 

rtfWriter.importRtfDocument( memFile1);

newRtfFile = rtfWriter.importRtfDocument( memFile2);

 

 

// Find next document

rtfWriter.importRtfDocument(newRtfFile );

newRtfFile2 =  rtfWriter.importRtfDocument( memFile3 );

 

// Find next document

rtfWriter.importRtfDocument(newRtfFile2 );

newRtfFile =  rtfWriter.importRtfDocument( memFile4 );

 

 

Regards,

 

Søren 

 

 

 


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
iText-questions mailing list
iText-questions@...
https://lists.sourceforge.net/lists/listinfo/itext-questions

Do you like iText?
Buy the iText book: http://www.1t3xt.com/docs/book.php
Or leave a tip: https://tipit.to/itexttipjar

Parent Message unknown Re: Slow parsing of rtf files even if they are small.

by Howard Shank :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.
Hi Søren,

I have not tested the C# port personally, but the Java version as written does not exhibit the massive memory use you describe. What version are you using?

As for being slow, are we talking seconds or minutes or hours?

I can tell you this. As you import additional documents, the required time for processing the header and mapping of the data structures so they match the new document will increase.

When you import full documents, it is required to construct additional color, font, and other table entries in the header
and mappings for those entries. The imported document then has a conversion that takes place each time it encounters one of the mapped values.

The parser is also new and I am seeking ways it can be improved. I'm sure there's lots of ways at this point.

Howard


----- Original Message ----
From: Søren Laursen <sl@...>
To: itext-questions@...
Sent: Wednesday, May 14, 2008 7:42:08 AM
Subject: [iText-questions] Slow parsing of rtf files even if they are small.

Hi,

 

I was looking at itextsharp for generating pdf files and found out that it also could be used for parsing/generationg rtf files.

 

So I have been using itextsharp to merge rtf files. The rtf files are very small and have almost no advance features, just text and color. Even thou when I am  merging rtf data using MemoryStream and the data are retrieved from a database it takes very long and the memory footprint goes up to about 400 to 450 mb.

 

Are anybody doing the same kind of batch merging?

 

I can see that if I am making one of the files empty the speed goes up so I guess it must be the parsing?

 

Are there anything gained if I know that I will be parsing for example 4 files that I do it in one go instead of merging them in single steps while I retrieve them from the database?

 

Doing:

rtfWriter.importRtfDocument( memFile1);

rtfWriter.importRtfDocument( memFile2);

rtfWriter.importRtfDocument( memFile3);

rtfWriter.importRtfDocument( memFile4);

 

instead of (because of the database layout):

 

rtfWriter.importRtfDocument( memFile1);

newRtfFile = rtfWriter.importRtfDocument( memFile2);

 

 

// Find next document

rtfWriter.importRtfDocument(newRtfFile );

newRtfFile2 =  rtfWriter.importRtfDocument( memFile3 );

 

// Find next document

rtfWriter.importRtfDocument(newRtfFile2 );

newRtfFile =  rtfWriter.importRtfDocument( memFile4 );

 

 

Regards,

 

Søren 

 

 

 



-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
iText-questions mailing list
iText-questions@...
https://lists.sourceforge.net/lists/listinfo/itext-questions

Do you like iText?
Buy the iText book: http://www.1t3xt.com/docs/book.php
Or leave a tip: https://tipit.to/itexttipjar

Parent Message unknown Re: Slow parsing of rtf files even if they are small.

by Soeren Laursen :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.

Merging 4-5 files where the size is 618 chars (on an average) it takes up to 5 seconds. I have to mention than it is a iterative merge because I have to check if the merged size is larger than 3K (database restrictions) before I add a new document.

 

Tried a new approach where I set up RtfWrite2 in one method and in another I add rtf document and I get the current rtf document. This fails because of the streams. I guess that if I continued to add to an existing rtf document I would save some time by not have to parse the merge rtf from the previous run.

 

The memory use just increase. The first version of the program I used RichTextBox and copy and pasted the rtf text, the program usually  got executed in about 20 minutes and used 32 mb, now we are talking about 12-14 hours! I had to drop the RichTextBox approach because sometimes it fails and return a empty rtf files. Itext solves this problem but the runtime and memory use is a big concern.

 

Søren

 

 


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
iText-questions mailing list
iText-questions@...
https://lists.sourceforge.net/lists/listinfo/itext-questions

Do you like iText?
Buy the iText book: http://www.1t3xt.com/docs/book.php
Or leave a tip: https://tipit.to/itexttipjar

Re: Slow parsing of rtf files even if they aresmall.

by Paulo Soares :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Post a small standalone program with all the required rtf files. I'll run it
in C# and java and compare the times. It may be a porting issue or not.

Paulo

----- Original Message -----
From: "Søren Laursen" <sl@...>
To: <itext-questions@...>
Sent: Wednesday, May 14, 2008 11:18 PM
Subject: Re: [iText-questions] Slow parsing of rtf files even if they
aresmall.


Merging 4-5 files where the size is 618 chars (on an average) it takes up to
5 seconds. I have to mention than it is a iterative merge because I have to
check if the merged size is larger than 3K (database restrictions) before I
add a new document.



Tried a new approach where I set up RtfWrite2 in one method and in another I
add rtf document and I get the current rtf document. This fails because of
the streams. I guess that if I continued to add to an existing rtf document
I would save some time by not have to parse the merge rtf from the previous
run.



The memory use just increase. The first version of the program I used
RichTextBox and copy and pasted the rtf text, the program usually  got
executed in about 20 minutes and used 32 mb, now we are talking about 12-14
hours! I had to drop the RichTextBox approach because sometimes it fails and
return a empty rtf files. Itext solves this problem but the runtime and
memory use is a big concern.



Søren


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
iText-questions mailing list
iText-questions@...
https://lists.sourceforge.net/lists/listinfo/itext-questions

Do you like iText?
Buy the iText book: http://www.1t3xt.com/docs/book.php
Or leave a tip: https://tipit.to/itexttipjar