Wikipedia HTML & Syntax specification

View: New views
5 Messages — Rating Filter:   Alert me  

Wikipedia HTML & Syntax specification

by O. Olson-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

        I have two questions related to each other.

        I would like to know if there is a simple way to get the Static HTML from the Wikipedia Articles i.e. extraction as HTML files.

        In this regard I managed to put the Text Table into a MySQL database. It can give me the Wiki Text – which I could then parse. However I found the Wikipedia Syntax more complicated than what I had used when contributing to Wikipedia myself. Is there some place where the complete syntax is specified? At least if I have the specification I can think about working on a parser.

Thanks a lot.
O.O.


__________________________________________________
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ¡gratis!
Regístrate ya - http://correo.espanol.yahoo.com/ 

_______________________________________________
Wikipedia-l mailing list
Wikipedia-l@...
https://lists.wikimedia.org/mailman/listinfo/wikipedia-l

Re: Wikipedia HTML & Syntax specification

by Rolf Lampa :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

O. Olson skrev:
> Hi,
>
> I have two questions related to each other.
>
> I would like to know if there is a simple way to get the Static HTML from the Wikipedia Articles i.e. extraction as HTML files.

Try this:
http://static.wikipedia.org/

Regards,

// Rolf Lampa


_______________________________________________
Wikipedia-l mailing list
Wikipedia-l@...
https://lists.wikimedia.org/mailman/listinfo/wikipedia-l

Re: Wikipedia HTML & Syntax specification

by O. Olson-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Dear Rolf,

        I don’t think I understood what you are referring to. To get the Static HTML you mean I would have to spider/crawl through those pages? I thought Wikipedia explicitly did not allow this.

        Also any idea regarding the syntax specification?

Thanks again for your post.
O.O.



--- El sáb 5-jul-08, Rolf Lampa <rolf.lampa@...> escribió:

> De: Rolf Lampa <rolf.lampa@...>
> Asunto: Re: [Wikipedia-l] Wikipedia HTML & Syntax specification
> A: wikipedia-l@...
> Fecha: sábado, 5 julio, 2008, 5:49 am
> O. Olson skrev:
> > Hi,
> >
> > I have two questions related to each other.
> >
> > I would like to know if there is a simple way to get
> the Static HTML from the Wikipedia Articles i.e. extraction
> as HTML files.
>
> Try this:
> http://static.wikipedia.org/
>
> Regards,
>
> // Rolf Lampa
>
>
> _______________________________________________
> Wikipedia-l mailing list
> Wikipedia-l@...
> https://lists.wikimedia.org/mailman/listinfo/wikipedia-l

__________________________________________________
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ¡gratis!
Regístrate ya - http://correo.espanol.yahoo.com/ 

_______________________________________________
Wikipedia-l mailing list
Wikipedia-l@...
https://lists.wikimedia.org/mailman/listinfo/wikipedia-l

Re: Wikipedia HTML & Syntax specification

by Rolf Lampa :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

O. Olson skrev:
> Dear Rolf,
>
> I don’t think I understood what you are referring to.
 >       To get the Static HTML you mean I would have to
 >       spider/crawl through those pages? I thought Wikipedia
 >       explicitly did not allow this.

???

You wouldn't have to crawl. There's a download-link in the middle of the
page I linked to. If you for example would like to have the entire
English Wikipedia's content in html format, then download it:
http://static.wikipedia.org/downloads/2008-06/en/

Or did I misunderstand your question  entirely?

> Also any idea regarding the syntax specification?

There's no end to all the pages written on that subject. But I don't
know of any place where you can find all of it on one page. For a start
see this:
http://en.wikipedia.org/wiki/Help:Contents
and
http://en.wikipedia.org/wiki/Wikipedia:Cheatsheet
and go on with referred pages, like:
http://meta.wikimedia.org/wiki/Help:Link

Regards,

// Rolf Lampa





_______________________________________________
Wikipedia-l mailing list
Wikipedia-l@...
https://lists.wikimedia.org/mailman/listinfo/wikipedia-l

Re: Wikipedia HTML & Syntax specification

by O. Olson-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message




--- El lun 7-jul-08, Rolf Lampa <rolf.lampa@...> escribió:

> You wouldn't have to crawl. There's a download-link
> in the middle of the
> page I linked to. If you for example would like to have the
> entire
> English Wikipedia's content in html format, then
> download it:
> http://static.wikipedia.org/downloads/2008-06/en/
>

Thanks Rolf. This was not clear from your original post – but I have since downloaded it. However it seems too big to extract in the 200 GB space I have on my drive. I am trying to borrow a terabyte drive from my friend over the next week to see how everything looks.
Thanks again.
O.O.


__________________________________________________
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ¡gratis!
Regístrate ya - http://correo.espanol.yahoo.com/ 

_______________________________________________
Wikipedia-l mailing list
Wikipedia-l@...
https://lists.wikimedia.org/mailman/listinfo/wikipedia-l
LightInTheBox - Buy quality products at wholesale price!