Migrating documentation from HTML files

View: New views
8 Messages — Rating Filter:   Alert me  

Migrating documentation from HTML files

by krycho fandino :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I'm a newbie using doxia. I've a lot of documentation in HTML format an I'd
like convert these files to apt format. Is there some way to transform
easily? I want to create a maven site for my project and, right now, I only
have this documentation in HTML format without css styles nor menu.

Could you help me? Very thanks
Cristóbal

Re: Migrating documentation from HTML files

by Vincent Siveton :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

Frankly, I never test your use case.

But I guess that you need to have an XHTML file in input with no
header, footer or navbar something to the div bodyColumn in [1].

The snippet should be something like the following:

File f = new File( "blabla.html" );
XhtmlParser parser = new XhtmlParser();
StringWriter output = new StringWriter();
Sink sink = new AptSink( output );
parser.parse( new FileReader( f ), output );

Output will contain APT declaration.

HTH,

Vincent

[1] http://maven.apache.org/doxia/

2008/3/1, krycho fandino <cristobalft@...>:
> I'm a newbie using doxia. I've a lot of documentation in HTML format an I'd
>  like convert these files to apt format. Is there some way to transform
>  easily? I want to create a maven site for my project and, right now, I only
>  have this documentation in HTML format without css styles nor menu.
>
>  Could you help me? Very thanks
>  Cristóbal
>

Re: Migrating documentation from HTML files

by Lukas Theussl-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

If you use the current development branch of doxia (beta-1-SNAPSHOT)
then this should work rather well for simple html files. However, you
will probably loose a lot of information if you have anything fancy (eg
special layout, tables, figures are not well supported), don't expect it
to be perfect. In particular if you have figures you might try to
translate to xdoc instead of apt (use XdocSink), that should work better.

Cheers,
-Lukas


Vincent Siveton wrote:

> Hi,
>
> Frankly, I never test your use case.
>
> But I guess that you need to have an XHTML file in input with no
> header, footer or navbar something to the div bodyColumn in [1].
>
> The snippet should be something like the following:
>
> File f = new File( "blabla.html" );
> XhtmlParser parser = new XhtmlParser();
> StringWriter output = new StringWriter();
> Sink sink = new AptSink( output );
> parser.parse( new FileReader( f ), output );
>
> Output will contain APT declaration.
>
> HTH,
>
> Vincent
>
> [1] http://maven.apache.org/doxia/
>
> 2008/3/1, krycho fandino <cristobalft@...>:
>
>>I'm a newbie using doxia. I've a lot of documentation in HTML format an I'd
>> like convert these files to apt format. Is there some way to transform
>> easily? I want to create a maven site for my project and, right now, I only
>> have this documentation in HTML format without css styles nor menu.
>>
>> Could you help me? Very thanks
>> Cristóbal
>
>>
>

Re: Migrating documentation from HTML files

by krycho fandino :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thanks for your help, however my HTML files isn't XHTML and XhtmlParser
throws a lot of exceptions. Perhaps, I should convert these HTML files to
XHTML format, but I've a lot of pages and should be a hard task.

Really, I has generated these HTML files using latex2html conversion tool. I
don't know how I could transform latex files to some markup languages
supported by doxia (apt or xdoc). Could you give me some advice?


2008/3/2, Lukas Theussl <ltheussl@...>:

>
> If you use the current development branch of doxia (beta-1-SNAPSHOT)
> then this should work rather well for simple html files. However, you
> will probably loose a lot of information if you have anything fancy (eg
> special layout, tables, figures are not well supported), don't expect it
> to be perfect. In particular if you have figures you might try to
> translate to xdoc instead of apt (use XdocSink), that should work better.
>
> Cheers,
>
> -Lukas
>
>
>
> Vincent Siveton wrote:
> > Hi,
> >
> > Frankly, I never test your use case.
> >
> > But I guess that you need to have an XHTML file in input with no
> > header, footer or navbar something to the div bodyColumn in [1].
> >
> > The snippet should be something like the following:
> >
> > File f = new File( "blabla.html" );
> > XhtmlParser parser = new XhtmlParser();
> > StringWriter output = new StringWriter();
> > Sink sink = new AptSink( output );
> > parser.parse( new FileReader( f ), output );
> >
> > Output will contain APT declaration.
> >
> > HTH,
> >
> > Vincent
> >
> > [1] http://maven.apache.org/doxia/
> >
> > 2008/3/1, krycho fandino <cristobalft@...>:
> >
> >>I'm a newbie using doxia. I've a lot of documentation in HTML format an
> I'd
> >> like convert these files to apt format. Is there some way to transform
> >> easily? I want to create a maven site for my project and, right now, I
> only
> >> have this documentation in HTML format without css styles nor menu.
> >>
> >> Could you help me? Very thanks
> >> Cristóbal
> >
> >>
> >
>

Re: Migrating documentation from HTML files

by Lukas Theussl-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

doxia doesn't have a latex parser (I'd like to have one too!),
latex2html is the only solution I can think of (there exist other latex
translators though but that's the only one I know). I am not sure what
kind of output latex2html produces, however, the difference HTML - xhtml
shouldn't matter here. What kind of exceptions do you get? Maybe you
could attach an example file at jira [1] with a snippet of your code so
we can try to reproce the problem?

-Lukas

[1] http://jira.codehaus.org/browse/DOXIA

krycho fandino wrote:

> Thanks for your help, however my HTML files isn't XHTML and XhtmlParser
> throws a lot of exceptions. Perhaps, I should convert these HTML files to
> XHTML format, but I've a lot of pages and should be a hard task.
>
> Really, I has generated these HTML files using latex2html conversion tool. I
> don't know how I could transform latex files to some markup languages
> supported by doxia (apt or xdoc). Could you give me some advice?
>
>
> 2008/3/2, Lukas Theussl <ltheussl@...>:
>
>>If you use the current development branch of doxia (beta-1-SNAPSHOT)
>>then this should work rather well for simple html files. However, you
>>will probably loose a lot of information if you have anything fancy (eg
>>special layout, tables, figures are not well supported), don't expect it
>>to be perfect. In particular if you have figures you might try to
>>translate to xdoc instead of apt (use XdocSink), that should work better.
>>
>>Cheers,
>>
>>-Lukas
>>
>>
>>
>>Vincent Siveton wrote:
>>
>>>Hi,
>>>
>>>Frankly, I never test your use case.
>>>
>>>But I guess that you need to have an XHTML file in input with no
>>>header, footer or navbar something to the div bodyColumn in [1].
>>>
>>>The snippet should be something like the following:
>>>
>>>File f = new File( "blabla.html" );
>>>XhtmlParser parser = new XhtmlParser();
>>>StringWriter output = new StringWriter();
>>>Sink sink = new AptSink( output );
>>>parser.parse( new FileReader( f ), output );
>>>
>>>Output will contain APT declaration.
>>>
>>>HTH,
>>>
>>>Vincent
>>>
>>>[1] http://maven.apache.org/doxia/
>>>
>>>2008/3/1, krycho fandino <cristobalft@...>:
>>>
>>>
>>>>I'm a newbie using doxia. I've a lot of documentation in HTML format an
>>
>>I'd
>>
>>>>like convert these files to apt format. Is there some way to transform
>>>>easily? I want to create a maven site for my project and, right now, I
>>
>>only
>>
>>>>have this documentation in HTML format without css styles nor menu.
>>>>
>>>>Could you help me? Very thanks
>>>>Cristóbal
>>>
>>
>

Re: Migrating documentation from HTML files

by krycho fandino :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Output latex2html produces no XHTML code. For example:

HTML
==========
<LINK REL="STYLESHEET" HREF="embebidos.css">

XhtmlParser
==========
org.apache.maven.doxia.parser.ParseException: Error parsing the model: end
tag name </HEAD> must be the same as start tag <LINK> from line 19
(position: TEXT seen ...<LINK REL="STYLESHEET"
HREF="embebidos.css">\n\n</HEAD>...
@21:8)
    at org.apache.maven.doxia.parser.AbstractXmlParser.parse(
AbstractXmlParser.java:57)


HTML
==========
<H2><A NAME="SECTION00221000000000000000"></A>
<A NAME="74"></A>
<BR>
Grupos de usuarios
</H2>

XhtmlParser
==========
org.apache.maven.doxia.parser.ParseException: Error parsing the model: end
tag name </H2> must be the same as start tag <BR> from line 119 (position:
TEXT seen ...<BR>\nGrupos de usuarios\n</H2>... @121:6)
    at org.apache.maven.doxia.parser.AbstractXmlParser.parse(
AbstractXmlParser.java:57)


XhtmlParser
==========
org.apache.maven.doxia.parser.ParseException: Error parsing the model:
attribute value must start with quotation or apostrophe not 3 (position:
TEXT seen ...<A NAME="91"></A>\n<TABLE CELLPADDING=3... @171:21)
    at org.apache.maven.doxia.parser.AbstractXmlParser.parse(
AbstractXmlParser.java:57)

... and far more


2008/3/3, Lukas Theussl <ltheussl@...>:

>
> doxia doesn't have a latex parser (I'd like to have one too!),
> latex2html is the only solution I can think of (there exist other latex
> translators though but that's the only one I know). I am not sure what
> kind of output latex2html produces, however, the difference HTML - xhtml
> shouldn't matter here. What kind of exceptions do you get? Maybe you
> could attach an example file at jira [1] with a snippet of your code so
> we can try to reproce the problem?
>
> -Lukas
>
> [1] http://jira.codehaus.org/browse/DOXIA
>
>
> krycho fandino wrote:
> > Thanks for your help, however my HTML files isn't XHTML and XhtmlParser
> > throws a lot of exceptions. Perhaps, I should convert these HTML files
> to
> > XHTML format, but I've a lot of pages and should be a hard task.
> >
> > Really, I has generated these HTML files using latex2html conversion
> tool. I
> > don't know how I could transform latex files to some markup languages
> > supported by doxia (apt or xdoc). Could you give me some advice?
> >
> >
> > 2008/3/2, Lukas Theussl <ltheussl@...>:
> >
> >>If you use the current development branch of doxia (beta-1-SNAPSHOT)
> >>then this should work rather well for simple html files. However, you
> >>will probably loose a lot of information if you have anything fancy (eg
> >>special layout, tables, figures are not well supported), don't expect it
> >>to be perfect. In particular if you have figures you might try to
> >>translate to xdoc instead of apt (use XdocSink), that should work
> better.
> >>
> >>Cheers,
> >>
> >>-Lukas
> >>
> >>
> >>
> >>Vincent Siveton wrote:
> >>
> >>>Hi,
> >>>
> >>>Frankly, I never test your use case.
> >>>
> >>>But I guess that you need to have an XHTML file in input with no
> >>>header, footer or navbar something to the div bodyColumn in [1].
> >>>
> >>>The snippet should be something like the following:
> >>>
> >>>File f = new File( "blabla.html" );
> >>>XhtmlParser parser = new XhtmlParser();
> >>>StringWriter output = new StringWriter();
> >>>Sink sink = new AptSink( output );
> >>>parser.parse( new FileReader( f ), output );
> >>>
> >>>Output will contain APT declaration.
> >>>
> >>>HTH,
> >>>
> >>>Vincent
> >>>
> >>>[1] http://maven.apache.org/doxia/
> >>>
> >>>2008/3/1, krycho fandino <cristobalft@...>:
> >>>
> >>>
> >>>>I'm a newbie using doxia. I've a lot of documentation in HTML format
> an
> >>
> >>I'd
> >>
> >>>>like convert these files to apt format. Is there some way to transform
> >>>>easily? I want to create a maven site for my project and, right now, I
> >>
> >>only
> >>
> >>>>have this documentation in HTML format without css styles nor menu.
> >>>>
> >>>>Could you help me? Very thanks
> >>>>Cristóbal
> >>>
> >>
> >
>

Re: Migrating documentation from HTML files

by Lukas Theussl-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Ehm, yes, sorry, I talked quicker than I thought. Of course, the parser
is an xml parser so it will cough up any tags that are not properly
closed. So it has to be xhtml. You can use tools like htmltidy [1] to
convert html to xhtml.

Btw, Vincent just added a simple tool to do document translations with
doxia: http://svn.apache.org/viewvc?view=rev&revision=633328
Feel free to test and comment! :)

Cheers,
-Lukas

[1] http://tidy.sourceforge.net/


Cristóbal Fandiño wrote:

> Output latex2html produces no XHTML code. For example:
>
> HTML
> ==========
> <LINK REL="STYLESHEET" HREF="embebidos.css">
>
> XhtmlParser
> ==========
> org.apache.maven.doxia.parser.ParseException: Error parsing the model: end
> tag name </HEAD> must be the same as start tag <LINK> from line 19
> (position: TEXT seen ...<LINK REL="STYLESHEET"
> HREF="embebidos.css">\n\n</HEAD>...
> @21:8)
>     at org.apache.maven.doxia.parser.AbstractXmlParser.parse(
> AbstractXmlParser.java:57)
>
>
> HTML
> ==========
> <H2><A NAME="SECTION00221000000000000000"></A>
> <A NAME="74"></A>
> <BR>
> Grupos de usuarios
> </H2>
>
> XhtmlParser
> ==========
> org.apache.maven.doxia.parser.ParseException: Error parsing the model: end
> tag name </H2> must be the same as start tag <BR> from line 119 (position:
> TEXT seen ...<BR>\nGrupos de usuarios\n</H2>... @121:6)
>     at org.apache.maven.doxia.parser.AbstractXmlParser.parse(
> AbstractXmlParser.java:57)
>
>
> XhtmlParser
> ==========
> org.apache.maven.doxia.parser.ParseException: Error parsing the model:
> attribute value must start with quotation or apostrophe not 3 (position:
> TEXT seen ...<A NAME="91"></A>\n<TABLE CELLPADDING=3... @171:21)
>     at org.apache.maven.doxia.parser.AbstractXmlParser.parse(
> AbstractXmlParser.java:57)
>
> ... and far more
>
>
> 2008/3/3, Lukas Theussl <ltheussl@...>:
>
>>doxia doesn't have a latex parser (I'd like to have one too!),
>>latex2html is the only solution I can think of (there exist other latex
>>translators though but that's the only one I know). I am not sure what
>>kind of output latex2html produces, however, the difference HTML - xhtml
>>shouldn't matter here. What kind of exceptions do you get? Maybe you
>>could attach an example file at jira [1] with a snippet of your code so
>>we can try to reproce the problem?
>>
>>-Lukas
>>
>>[1] http://jira.codehaus.org/browse/DOXIA
>>
>>
>>krycho fandino wrote:
>>
>>>Thanks for your help, however my HTML files isn't XHTML and XhtmlParser
>>>throws a lot of exceptions. Perhaps, I should convert these HTML files
>>
>>to
>>
>>>XHTML format, but I've a lot of pages and should be a hard task.
>>>
>>>Really, I has generated these HTML files using latex2html conversion
>>
>>tool. I
>>
>>>don't know how I could transform latex files to some markup languages
>>>supported by doxia (apt or xdoc). Could you give me some advice?
>>>
>>>
>>>2008/3/2, Lukas Theussl <ltheussl@...>:
>>>
>>>
>>>>If you use the current development branch of doxia (beta-1-SNAPSHOT)
>>>>then this should work rather well for simple html files. However, you
>>>>will probably loose a lot of information if you have anything fancy (eg
>>>>special layout, tables, figures are not well supported), don't expect it
>>>>to be perfect. In particular if you have figures you might try to
>>>>translate to xdoc instead of apt (use XdocSink), that should work
>>
>>better.
>>
>>>>Cheers,
>>>>
>>>>-Lukas
>>>>
>>>>
>>>>
>>>>Vincent Siveton wrote:
>>>>
>>>>
>>>>>Hi,
>>>>>
>>>>>Frankly, I never test your use case.
>>>>>
>>>>>But I guess that you need to have an XHTML file in input with no
>>>>>header, footer or navbar something to the div bodyColumn in [1].
>>>>>
>>>>>The snippet should be something like the following:
>>>>>
>>>>>File f = new File( "blabla.html" );
>>>>>XhtmlParser parser = new XhtmlParser();
>>>>>StringWriter output = new StringWriter();
>>>>>Sink sink = new AptSink( output );
>>>>>parser.parse( new FileReader( f ), output );
>>>>>
>>>>>Output will contain APT declaration.
>>>>>
>>>>>HTH,
>>>>>
>>>>>Vincent
>>>>>
>>>>>[1] http://maven.apache.org/doxia/
>>>>>
>>>>>2008/3/1, krycho fandino <cristobalft@...>:
>>>>>
>>>>>
>>>>>
>>>>>>I'm a newbie using doxia. I've a lot of documentation in HTML format
>>
>>an
>>
>>>>I'd
>>>>
>>>>
>>>>>>like convert these files to apt format. Is there some way to transform
>>>>>>easily? I want to create a maven site for my project and, right now, I
>>>>
>>>>only
>>>>
>>>>
>>>>>>have this documentation in HTML format without css styles nor menu.
>>>>>>
>>>>>>Could you help me? Very thanks
>>>>>>Cristóbal
>>>>>
>>
>

Re: Migrating documentation from HTML files

by Vincent Siveton :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

2008/3/4, Lukas Theussl <ltheussl@...>:
> Ehm, yes, sorry, I talked quicker than I thought. Of course, the parser
>  is an xml parser so it will cough up any tags that are not properly
>  closed. So it has to be xhtml. You can use tools like htmltidy [1] to
>  convert html to xhtml.
>
>  Btw, Vincent just added a simple tool to do document translations with
>  doxia: http://svn.apache.org/viewvc?view=rev&revision=633328
>  Feel free to test and comment! :)

You need to use the entire trunk for this.

I guess it will be easy to patch the converter with jtidy to support
html as an input format. Patches are welcome :)

Cheers,

Vincent

>  Cheers,
>  -Lukas
>
>  [1] http://tidy.sourceforge.net/
>
>
>
>  Cristóbal Fandiño wrote:
>  > Output latex2html produces no XHTML code. For example:
>  >
>  > HTML
>  > ==========
>  > <LINK REL="STYLESHEET" HREF="embebidos.css">
>  >
>  > XhtmlParser
>  > ==========
>  > org.apache.maven.doxia.parser.ParseException: Error parsing the model: end
>  > tag name </HEAD> must be the same as start tag <LINK> from line 19
>  > (position: TEXT seen ...<LINK REL="STYLESHEET"
>  > HREF="embebidos.css">\n\n</HEAD>...
>  > @21:8)
>  >     at org.apache.maven.doxia.parser.AbstractXmlParser.parse(
>  > AbstractXmlParser.java:57)
>  >
>  >
>  > HTML
>  > ==========
>  > <H2><A NAME="SECTION00221000000000000000"></A>
>  > <A NAME="74"></A>
>  > <BR>
>  > Grupos de usuarios
>  > </H2>
>  >
>  > XhtmlParser
>  > ==========
>  > org.apache.maven.doxia.parser.ParseException: Error parsing the model: end
>  > tag name </H2> must be the same as start tag <BR> from line 119 (position:
>  > TEXT seen ...<BR>\nGrupos de usuarios\n</H2>... @121:6)
>  >     at org.apache.maven.doxia.parser.AbstractXmlParser.parse(
>  > AbstractXmlParser.java:57)
>  >
>  >
>  > XhtmlParser
>  > ==========
>  > org.apache.maven.doxia.parser.ParseException: Error parsing the model:
>  > attribute value must start with quotation or apostrophe not 3 (position:
>  > TEXT seen ...<A NAME="91"></A>\n<TABLE CELLPADDING=3... @171:21)
>  >     at org.apache.maven.doxia.parser.AbstractXmlParser.parse(
>  > AbstractXmlParser.java:57)
>  >
>  > ... and far more
>  >
>  >
>  > 2008/3/3, Lukas Theussl <ltheussl@...>:
>  >
>  >>doxia doesn't have a latex parser (I'd like to have one too!),
>  >>latex2html is the only solution I can think of (there exist other latex
>  >>translators though but that's the only one I know). I am not sure what
>  >>kind of output latex2html produces, however, the difference HTML - xhtml
>  >>shouldn't matter here. What kind of exceptions do you get? Maybe you
>  >>could attach an example file at jira [1] with a snippet of your code so
>  >>we can try to reproce the problem?
>  >>
>  >>-Lukas
>  >>
>  >>[1] http://jira.codehaus.org/browse/DOXIA
>  >>
>  >>
>  >>krycho fandino wrote:
>  >>
>  >>>Thanks for your help, however my HTML files isn't XHTML and XhtmlParser
>  >>>throws a lot of exceptions. Perhaps, I should convert these HTML files
>  >>
>  >>to
>  >>
>  >>>XHTML format, but I've a lot of pages and should be a hard task.
>  >>>
>  >>>Really, I has generated these HTML files using latex2html conversion
>  >>
>  >>tool. I
>  >>
>  >>>don't know how I could transform latex files to some markup languages
>  >>>supported by doxia (apt or xdoc). Could you give me some advice?
>  >>>
>  >>>
>  >>>2008/3/2, Lukas Theussl <ltheussl@...>:
>  >>>
>  >>>
>  >>>>If you use the current development branch of doxia (beta-1-SNAPSHOT)
>  >>>>then this should work rather well for simple html files. However, you
>  >>>>will probably loose a lot of information if you have anything fancy (eg
>  >>>>special layout, tables, figures are not well supported), don't expect it
>  >>>>to be perfect. In particular if you have figures you might try to
>  >>>>translate to xdoc instead of apt (use XdocSink), that should work
>  >>
>  >>better.
>  >>
>  >>>>Cheers,
>  >>>>
>  >>>>-Lukas
>  >>>>
>  >>>>
>  >>>>
>  >>>>Vincent Siveton wrote:
>  >>>>
>  >>>>
>  >>>>>Hi,
>  >>>>>
>  >>>>>Frankly, I never test your use case.
>  >>>>>
>  >>>>>But I guess that you need to have an XHTML file in input with no
>  >>>>>header, footer or navbar something to the div bodyColumn in [1].
>  >>>>>
>  >>>>>The snippet should be something like the following:
>  >>>>>
>  >>>>>File f = new File( "blabla.html" );
>  >>>>>XhtmlParser parser = new XhtmlParser();
>  >>>>>StringWriter output = new StringWriter();
>  >>>>>Sink sink = new AptSink( output );
>  >>>>>parser.parse( new FileReader( f ), output );
>  >>>>>
>  >>>>>Output will contain APT declaration.
>  >>>>>
>  >>>>>HTH,
>  >>>>>
>  >>>>>Vincent
>  >>>>>
>  >>>>>[1] http://maven.apache.org/doxia/
>  >>>>>
>  >>>>>2008/3/1, krycho fandino <cristobalft@...>:
>  >>>>>
>  >>>>>
>  >>>>>
>  >>>>>>I'm a newbie using doxia. I've a lot of documentation in HTML format
>  >>
>  >>an
>  >>
>  >>>>I'd
>  >>>>
>  >>>>
>  >>>>>>like convert these files to apt format. Is there some way to transform
>  >>>>>>easily? I want to create a maven site for my project and, right now, I
>  >>>>
>  >>>>only
>  >>>>
>  >>>>
>  >>>>>>have this documentation in HTML format without css styles nor menu.
>  >>>>>>
>  >>>>>>Could you help me? Very thanks
>  >>>>>>Cristóbal
>  >>>>>
>  >>
>  >
>
LightInTheBox - Buy quality products at wholesale price