Can iText meet my idear?

View: New views
7 Messages — Rating Filter:   Alert me  

Can iText meet my idear?

by jinjiankang :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Can iText meet my idear?

1.Supposed I have an ebook(book_en.pdf) in English.

2.Parse book_en.pdf :
  2.1 Export a paragraph or a phrase, tanslate to Chinese, and rewrite
to book_zh.pdf;
  2.2 and so on, and so on...
  2.3 end of book_en.pdf

Generally speaking, book_en.pdf is same to book_zh.pdf except for language.

Most import of all, Formatting (chapter, section, subsection and so
on) must be identical.

In advanced, Thanks!

Best regards!

Jin Jiankang

E-mail:jinjiankang1980@...

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
iText-questions mailing list
iText-questions@...
https://lists.sourceforge.net/lists/listinfo/itext-questions

Do you like iText?
Buy the iText book: http://www.1t3xt.com/docs/book.php
Or leave a tip: https://tipit.to/itexttipjar

Re: Can iText meet my idear?

by Bruno Lowagie (iText) :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

金健康 wrote:
> Can iText meet my idear?
>
> 1.Supposed I have an ebook(book_en.pdf) in English.

Consider this PDF to be like a vector image.
(This is an ASSUMPTION based on an educated guess.)

> 2.Parse book_en.pdf :

You can parse the file structure of a PDF and discover PDF objects
such as null, boolean, number, string, name, array, dictionary, stream.
And you can parse content streams that consist of operators and
operands (the Adobe Imaging Model).

>   2.1 Export a paragraph or a phrase, tanslate to Chinese, and rewrite
> to book_zh.pdf;

The concept of a paragraph and a phrase is UNKNOWN in PDF.
If your PDF is tagged, then you could have a chance of retrieving
the English text; otherwise you need OCR software.
Translating English to Chinese is off-topic on this list.

>   2.2 and so on, and so on...

You should learn more about PDF before even thinking
about "and so on" options.

> Generally speaking, book_en.pdf is same to book_zh.pdf except for language.
>
> Most import of all, Formatting (chapter, section, subsection and so
> on) must be identical.

The best way to do this is to hire a human being to
translate the PDF and to create a new PDF based on
the translation.

br,
Bruno

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
iText-questions mailing list
iText-questions@...
https://lists.sourceforge.net/lists/listinfo/itext-questions

Do you like iText?
Buy the iText book: http://www.1t3xt.com/docs/book.php
Or leave a tip: https://tipit.to/itexttipjar

Re: Can iText meet my idear?

by Leonard Rosenthol :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

If you have the legal rights to translate the book, you should just  
get teh original source material from the author and translate it there.

Leonard

On May 15, 2008, at 3:51 AM, Bruno Lowagie wrote:

> 金健康 wrote:
>> Can iText meet my idear?
>>
>> 1.Supposed I have an ebook(book_en.pdf) in English.
>
> Consider this PDF to be like a vector image.
> (This is an ASSUMPTION based on an educated guess.)
>
>> 2.Parse book_en.pdf :
>
> You can parse the file structure of a PDF and discover PDF objects
> such as null, boolean, number, string, name, array, dictionary,  
> stream.
> And you can parse content streams that consist of operators and
> operands (the Adobe Imaging Model).
>
>>   2.1 Export a paragraph or a phrase, tanslate to Chinese, and  
>> rewrite
>> to book_zh.pdf;
>
> The concept of a paragraph and a phrase is UNKNOWN in PDF.
> If your PDF is tagged, then you could have a chance of retrieving
> the English text; otherwise you need OCR software.
> Translating English to Chinese is off-topic on this list.
>
>>   2.2 and so on, and so on...
>
> You should learn more about PDF before even thinking
> about "and so on" options.
>
>> Generally speaking, book_en.pdf is same to book_zh.pdf except for  
>> language.
>>
>> Most import of all, Formatting (chapter, section, subsection and so
>> on) must be identical.
>
> The best way to do this is to hire a human being to
> translate the PDF and to create a new PDF based on
> the translation.
>
> br,
> Bruno
>
> ----------------------------------------------------------------------
> ---
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> iText-questions mailing list
> iText-questions@...
> https://lists.sourceforge.net/lists/listinfo/itext-questions
>
> Do you like iText?
> Buy the iText book: http://www.1t3xt.com/docs/book.php
> Or leave a tip: https://tipit.to/itexttipjar


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
iText-questions mailing list
iText-questions@...
https://lists.sourceforge.net/lists/listinfo/itext-questions

Do you like iText?
Buy the iText book: http://www.1t3xt.com/docs/book.php
Or leave a tip: https://tipit.to/itexttipjar

Re: Can iText meet my idear?

by jinjiankang :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

There are many excellent open source third party(for example iText), each publishes some E-documents freely:Tutorial.pdf, DevelopersGuide.pdf, Spec.pdf and so on. Why not developing a tool that would translate RAPIDLY those *.pdf in Enghish to Chinese.pdf, French.pdf, Korea.pdf ... with the help of Google AJAX Language API? Indeed, the translating is NOT accurately.

After contacting corresponding third party charger, I think legal issues are not obstacle, and it can help enlarging power of influence for third party.

I'm looking forwards to your reply.

Thanks.

Best regards.

Re: Can iText meet my idear?

by Bruno Lowagie (iText) :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

jinjiankang wrote:

> There are many excellent open source third party(for example iText), each
> publishes some E-documents freely:Tutorial.pdf, DevelopersGuide.pdf,
> Spec.pdf and so on. Why not developing a tool that would translate RAPIDLY
> those *.pdf in Enghish to Chinese.pdf, French.pdf, Korea.pdf ... with the
> help of Google AJAX Language API? Indeed, the translating is NOT accurately.
>
> After contacting corresponding third party charger, I think legal issues are
> not obstacle, and it can help enlarging power of influence for third party.
>
> I'm looking forwards to your reply.

Legal and translation problems set aside, then you still have
the first obstacle: how to extract the text from the existing
PDFs. I my book, I recommend PdfBox and maybe JPedal can help
you get a long way to, but... due to the nature of PDF there
are limitations: text in a PDF is 'drawn' on a page. The concept
of paragraphs, chapters, sections, tables,... is lost.
How are you going to solve that?
br,
Bruno

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
iText-questions mailing list
iText-questions@...
https://lists.sourceforge.net/lists/listinfo/itext-questions

Do you like iText?
Buy the iText book: http://www.1t3xt.com/docs/book.php
Or leave a tip: https://tipit.to/itexttipjar

Re: Can iText meet my idear?

by Ning Zhao :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Jin Jiankang,

DTP (DeskTop Publishing) suits such as texlive/miktex or framemaker meet
your requirements better than iText or Pdfbox. In order to garantee the
identical layout and the identical bells and whistles in the book,  
basically you need some layout templates from the original English
publisher and stuff the Chinese text into the template, XSL-Fo might be
the programatic way to do so. If the layout is relatively simple,
building a template with texlive/miktex or framemaker is not too difficult.

According to Pdfbox's source code, it is awkward when you are using
characterset other than ASCII or UTF-16. Our problem with Pdfbox is, it
does not write common western European characters such as ä, ö, ü into
target pdf files, that's why we switched to iText. I don't think Pdfbox
will write Chinese characters correctly into your target pdf file.


Best Regards,
Ellen N. Zhao

jinjiankang schrieb:

> There are many excellent open source third party(for example iText), each
> publishes some E-documents freely:Tutorial.pdf, DevelopersGuide.pdf,
> Spec.pdf and so on. Why not developing a tool that would translate RAPIDLY
> those *.pdf in Enghish to Chinese.pdf, French.pdf, Korea.pdf ... with the
> help of Google AJAX Language API? Indeed, the translating is NOT accurately.
>
> After contacting corresponding third party charger, I think legal issues are
> not obstacle, and it can help enlarging power of influence for third party.
>
> I'm looking forwards to your reply.
>
> Thanks.
>
> Best regards.
>
>  


--
HaCon Ingenieurgesellschaft mbH
Lister Str. 15
30163 Hannover
Germany/Deutschland
Tel. +49 511 33699-363
Fax. +49 511 33699-99
Email: ning.zhao@...
http://www.hacon.de

Registry Court/Amtsgericht Hannover HRB 1712
Managing Directors/Geschäftsführer: Michael Frankenberg,
Dr.-Ing. Marian Gaidzik, Dr.-Ing. Werner Kretschmer,
Werner Sommerfeld, Dr.-Ing. Volker Sustrate, Peter Talke


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
iText-questions mailing list
iText-questions@...
https://lists.sourceforge.net/lists/listinfo/itext-questions

Do you like iText?
Buy the iText book: http://www.1t3xt.com/docs/book.php
Or leave a tip: https://tipit.to/itexttipjar

Re: Can iText meet my idear?

by Bruno Lowagie (iText) :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Ning Zhao wrote:
> Hi Jin Jiankang,
>
> DTP (DeskTop Publishing) suits such as texlive/miktex or framemaker meet
> your requirements better than iText or Pdfbox. In order to garantee the
> identical layout and the identical bells and whistles in the book,  
> basically you need some layout templates from the original English
> publisher

YES! This is very good advice.
And while you are at it, ask the source that was used to
build the PDF (that was implicitly advised by Leonard too).

> and stuff the Chinese text into the template, XSL-Fo might be
> the programatic way to do so.

If the source of the PDF file is XML, chosing the XSL-FO path
is a good suggestion.

> If the layout is relatively simple,
> building a template with texlive/miktex or framemaker is not too difficult.

Thanks for the advice, Ellen.
I think that the OP's idea is very noble, but he simply doesn't
understand that PDF is supposed to be the end product. He really
should contact the publisher(s) of those PDF and ask them for
the layout templates and the content sources of those PDFs he
wants to translate.

br,
Bruno

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
iText-questions mailing list
iText-questions@...
https://lists.sourceforge.net/lists/listinfo/itext-questions

Do you like iText?
Buy the iText book: http://www.1t3xt.com/docs/book.php
Or leave a tip: https://tipit.to/itexttipjar