|
View:
New views
7 Messages
—
Rating Filter:
Alert me
|
|
|
Can iText meet my idear?Can iText meet my idear?
1.Supposed I have an ebook(book_en.pdf) in English. 2.Parse book_en.pdf : 2.1 Export a paragraph or a phrase, tanslate to Chinese, and rewrite to book_zh.pdf; 2.2 and so on, and so on... 2.3 end of book_en.pdf Generally speaking, book_en.pdf is same to book_zh.pdf except for language. Most import of all, Formatting (chapter, section, subsection and so on) must be identical. In advanced, Thanks! Best regards! Jin Jiankang E-mail:jinjiankang1980@... ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ iText-questions mailing list iText-questions@... https://lists.sourceforge.net/lists/listinfo/itext-questions Do you like iText? Buy the iText book: http://www.1t3xt.com/docs/book.php Or leave a tip: https://tipit.to/itexttipjar |
|
|
Re: Can iText meet my idear?金健康 wrote:
> Can iText meet my idear? > > 1.Supposed I have an ebook(book_en.pdf) in English. Consider this PDF to be like a vector image. (This is an ASSUMPTION based on an educated guess.) > 2.Parse book_en.pdf : You can parse the file structure of a PDF and discover PDF objects such as null, boolean, number, string, name, array, dictionary, stream. And you can parse content streams that consist of operators and operands (the Adobe Imaging Model). > 2.1 Export a paragraph or a phrase, tanslate to Chinese, and rewrite > to book_zh.pdf; The concept of a paragraph and a phrase is UNKNOWN in PDF. If your PDF is tagged, then you could have a chance of retrieving the English text; otherwise you need OCR software. Translating English to Chinese is off-topic on this list. > 2.2 and so on, and so on... You should learn more about PDF before even thinking about "and so on" options. > Generally speaking, book_en.pdf is same to book_zh.pdf except for language. > > Most import of all, Formatting (chapter, section, subsection and so > on) must be identical. The best way to do this is to hire a human being to translate the PDF and to create a new PDF based on the translation. br, Bruno ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ iText-questions mailing list iText-questions@... https://lists.sourceforge.net/lists/listinfo/itext-questions Do you like iText? Buy the iText book: http://www.1t3xt.com/docs/book.php Or leave a tip: https://tipit.to/itexttipjar |
|
|
Re: Can iText meet my idear?If you have the legal rights to translate the book, you should just
get teh original source material from the author and translate it there. Leonard On May 15, 2008, at 3:51 AM, Bruno Lowagie wrote: > 金健康 wrote: >> Can iText meet my idear? >> >> 1.Supposed I have an ebook(book_en.pdf) in English. > > Consider this PDF to be like a vector image. > (This is an ASSUMPTION based on an educated guess.) > >> 2.Parse book_en.pdf : > > You can parse the file structure of a PDF and discover PDF objects > such as null, boolean, number, string, name, array, dictionary, > stream. > And you can parse content streams that consist of operators and > operands (the Adobe Imaging Model). > >> 2.1 Export a paragraph or a phrase, tanslate to Chinese, and >> rewrite >> to book_zh.pdf; > > The concept of a paragraph and a phrase is UNKNOWN in PDF. > If your PDF is tagged, then you could have a chance of retrieving > the English text; otherwise you need OCR software. > Translating English to Chinese is off-topic on this list. > >> 2.2 and so on, and so on... > > You should learn more about PDF before even thinking > about "and so on" options. > >> Generally speaking, book_en.pdf is same to book_zh.pdf except for >> language. >> >> Most import of all, Formatting (chapter, section, subsection and so >> on) must be identical. > > The best way to do this is to hire a human being to > translate the PDF and to create a new PDF based on > the translation. > > br, > Bruno > > ---------------------------------------------------------------------- > --- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > iText-questions mailing list > iText-questions@... > https://lists.sourceforge.net/lists/listinfo/itext-questions > > Do you like iText? > Buy the iText book: http://www.1t3xt.com/docs/book.php > Or leave a tip: https://tipit.to/itexttipjar ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ iText-questions mailing list iText-questions@... https://lists.sourceforge.net/lists/listinfo/itext-questions Do you like iText? Buy the iText book: http://www.1t3xt.com/docs/book.php Or leave a tip: https://tipit.to/itexttipjar |
|
|
Re: Can iText meet my idear?There are many excellent open source third party(for example iText), each publishes some E-documents freely:Tutorial.pdf, DevelopersGuide.pdf, Spec.pdf and so on. Why not developing a tool that would translate RAPIDLY those *.pdf in Enghish to Chinese.pdf, French.pdf, Korea.pdf ... with the help of Google AJAX Language API? Indeed, the translating is NOT accurately.
After contacting corresponding third party charger, I think legal issues are not obstacle, and it can help enlarging power of influence for third party. I'm looking forwards to your reply. Thanks. Best regards. |
|
|
Re: Can iText meet my idear?jinjiankang wrote:
> There are many excellent open source third party(for example iText), each > publishes some E-documents freely:Tutorial.pdf, DevelopersGuide.pdf, > Spec.pdf and so on. Why not developing a tool that would translate RAPIDLY > those *.pdf in Enghish to Chinese.pdf, French.pdf, Korea.pdf ... with the > help of Google AJAX Language API? Indeed, the translating is NOT accurately. > > After contacting corresponding third party charger, I think legal issues are > not obstacle, and it can help enlarging power of influence for third party. > > I'm looking forwards to your reply. Legal and translation problems set aside, then you still have the first obstacle: how to extract the text from the existing PDFs. I my book, I recommend PdfBox and maybe JPedal can help you get a long way to, but... due to the nature of PDF there are limitations: text in a PDF is 'drawn' on a page. The concept of paragraphs, chapters, sections, tables,... is lost. How are you going to solve that? br, Bruno ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ iText-questions mailing list iText-questions@... https://lists.sourceforge.net/lists/listinfo/itext-questions Do you like iText? Buy the iText book: http://www.1t3xt.com/docs/book.php Or leave a tip: https://tipit.to/itexttipjar |
|
|
Re: Can iText meet my idear?Hi Jin Jiankang,
DTP (DeskTop Publishing) suits such as texlive/miktex or framemaker meet your requirements better than iText or Pdfbox. In order to garantee the identical layout and the identical bells and whistles in the book, basically you need some layout templates from the original English publisher and stuff the Chinese text into the template, XSL-Fo might be the programatic way to do so. If the layout is relatively simple, building a template with texlive/miktex or framemaker is not too difficult. According to Pdfbox's source code, it is awkward when you are using characterset other than ASCII or UTF-16. Our problem with Pdfbox is, it does not write common western European characters such as ä, ö, ü into target pdf files, that's why we switched to iText. I don't think Pdfbox will write Chinese characters correctly into your target pdf file. Best Regards, Ellen N. Zhao jinjiankang schrieb: > There are many excellent open source third party(for example iText), each > publishes some E-documents freely:Tutorial.pdf, DevelopersGuide.pdf, > Spec.pdf and so on. Why not developing a tool that would translate RAPIDLY > those *.pdf in Enghish to Chinese.pdf, French.pdf, Korea.pdf ... with the > help of Google AJAX Language API? Indeed, the translating is NOT accurately. > > After contacting corresponding third party charger, I think legal issues are > not obstacle, and it can help enlarging power of influence for third party. > > I'm looking forwards to your reply. > > Thanks. > > Best regards. > > -- HaCon Ingenieurgesellschaft mbH Lister Str. 15 30163 Hannover Germany/Deutschland Tel. +49 511 33699-363 Fax. +49 511 33699-99 Email: ning.zhao@... http://www.hacon.de Registry Court/Amtsgericht Hannover HRB 1712 Managing Directors/Geschäftsführer: Michael Frankenberg, Dr.-Ing. Marian Gaidzik, Dr.-Ing. Werner Kretschmer, Werner Sommerfeld, Dr.-Ing. Volker Sustrate, Peter Talke ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ iText-questions mailing list iText-questions@... https://lists.sourceforge.net/lists/listinfo/itext-questions Do you like iText? Buy the iText book: http://www.1t3xt.com/docs/book.php Or leave a tip: https://tipit.to/itexttipjar |
|
|
Re: Can iText meet my idear?Ning Zhao wrote:
> Hi Jin Jiankang, > > DTP (DeskTop Publishing) suits such as texlive/miktex or framemaker meet > your requirements better than iText or Pdfbox. In order to garantee the > identical layout and the identical bells and whistles in the book, > basically you need some layout templates from the original English > publisher YES! This is very good advice. And while you are at it, ask the source that was used to build the PDF (that was implicitly advised by Leonard too). > and stuff the Chinese text into the template, XSL-Fo might be > the programatic way to do so. If the source of the PDF file is XML, chosing the XSL-FO path is a good suggestion. > If the layout is relatively simple, > building a template with texlive/miktex or framemaker is not too difficult. Thanks for the advice, Ellen. I think that the OP's idea is very noble, but he simply doesn't understand that PDF is supposed to be the end product. He really should contact the publisher(s) of those PDF and ask them for the layout templates and the content sources of those PDFs he wants to translate. br, Bruno ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ iText-questions mailing list iText-questions@... https://lists.sourceforge.net/lists/listinfo/itext-questions Do you like iText? Buy the iText book: http://www.1t3xt.com/docs/book.php Or leave a tip: https://tipit.to/itexttipjar |
| Free Forum Powered by Nabble | Forum Help |