|
View:
New views
20 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 | Next > |
|
|
wkipedia rendering engineHi,
I was at the erlang exchange and heard the *magnificant* talk "Building a transactional distributed data store with Erlang", by Alexander Reinefeld. I'll be blogging this as soon as I have the URL of the video of the talk. (in advance of this there was talk at the google conference on scalability http://video.google.com/videoplay?docid=-6526287646296437003&q=erlang+scalable&ei=cZ9oSLiDNIiCiwLL9fGwCA&hl=en oh and they also seem to have won the SCALE 2008 prize at the CCGrid conferense in Lyon but there is zero publicity about this AFAICS ) We (collectively) promised to help Alexander - I promised to provide him with a rendering engine (in Erlang) for the wikipedia markup language. Before I start hacking has anybody done this before? /Joe Armstrong _______________________________________________ erlang-questions mailing list erlang-questions@... http://www.erlang.org/mailman/listinfo/erlang-questions |
|
|
|
|
|
|
|
|
|
|
|
Re: wkipedia rendering engineOn Mon, Jun 30, 2008 at 12:53 PM, Joe Armstrong <erlang@...> wrote:
>> Also deciding on what point of the analysis to expand {{templates}} >> can lead the same code to get very different results. > > Glurk - you mean it's *undefined* - wow - I'll guess I'll discover this Might well be undefined in the sense that the only definition is "what the (php) code does is right", I'm not sure. The code seems to be in http://svn.wikimedia.org/viewvc/mediawiki/branches/stable/phase3/includes/OutputPage.php?view=log -- Andre Engels, andreengels@... ICQ: 6260644 -- Skype: a_engels _______________________________________________ erlang-questions mailing list erlang-questions@... http://www.erlang.org/mailman/listinfo/erlang-questions |
|
|
Re: wkipedia rendering engineOn Mon, Jun 30, 2008 at 12:58 PM, Joe Armstrong <erlang@...> wrote:
{joke, "It's not amazing, it's wikiworld!"}.
-- --Hynek (Pichi) Vychodil _______________________________________________ erlang-questions mailing list erlang-questions@... http://www.erlang.org/mailman/listinfo/erlang-questions |
|
|
Re: wkipedia rendering engineFollowup to myself.
I guess the wikipedia is stored internally in the format that is presented to the users for editing ie in MetaWiki markup language. Is there a REST interface so that I can retreive the latest version of the MetaWiki markup for a specific page with, for example, a wget command. Has anybody made an erlang interface to scrape individual pages from the wikipedia - or to bulk convert the entire wikipedia to erlang terms :-) /Joe On Mon, Jun 30, 2008 at 11:39 AM, Joe Armstrong <erlang@...> wrote: > Hi, > > I was at the erlang exchange and heard the *magnificant* talk > > "Building a transactional distributed data store with Erlang", by > Alexander Reinefeld. > > I'll be blogging this as soon as I have the URL of the video of the talk. > > (in advance of this there was talk at the google conference on scalability > > http://video.google.com/videoplay?docid=-6526287646296437003&q=erlang+scalable&ei=cZ9oSLiDNIiCiwLL9fGwCA&hl=en > > oh and they also seem to have won the SCALE 2008 prize at the > CCGrid conferense in Lyon but there is zero publicity about this AFAICS > ) > > We (collectively) promised to help Alexander - I promised to provide him with a > rendering engine (in Erlang) for the wikipedia markup language. > > Before I start hacking has anybody done this before? > > /Joe Armstrong > -- fra@...; ingvar.akesson@... [Kopia av detta meddelande skickas till FRA för övervakningsändamål. De vill ju ändå läsa min e-post.] [A copy of this mail has been sent to FRA for monitoring purposes. FRA wants to read all my e-mail and have been allowed to do by the Swedish parliment - in violation of article 12 of the UN Universal Declaration of Human Rights] _______________________________________________ erlang-questions mailing list erlang-questions@... http://www.erlang.org/mailman/listinfo/erlang-questions |
|
|
Re: wkipedia rendering engineOn Mon, Jun 30, 2008 at 1:23 PM, Joe Armstrong <erlang@...> wrote:
> Is there a REST interface so that I can retreive the latest version of > the MetaWiki markup for a specific page with, for example, > a wget command. What's a REST interface? There's several ways to get the MediaWiki markup of a specific page: * Go to the edit page; it contains the latest version of the markup * Go to [[Special:Export]], where you can get either the current version or all versions of a number of pages, in XML * At http://download.wikimedia.org/backup-index.html are the complete database dumps of the various wikis; the content of the page is in one of the tables -- Andre Engels, andreengels@... ICQ: 6260644 -- Skype: a_engels _______________________________________________ erlang-questions mailing list erlang-questions@... http://www.erlang.org/mailman/listinfo/erlang-questions |
|
|
Re: wkipedia rendering engineOn Jun 30, 2008, at 13:23, Joe Armstrong wrote:
> Is there a REST interface so that I can retreive the latest version of > the MetaWiki markup for a specific page with, for example, > a wget command. You can get bulk dumps http://en.wikipedia.org/wiki/Wikipedia:Database_download#Where_do_I_get ... Why would you do individual scraping? In order to keep up to date with changes that happened between the last dump and now()? Cheers Jan -- > Has anybody made an erlang interface to scrape individual pages from > the wikipedia - or to bulk convert the entire > wikipedia to erlang terms :-) > > /Joe > > > > On Mon, Jun 30, 2008 at 11:39 AM, Joe Armstrong <erlang@...> > wrote: >> Hi, >> >> I was at the erlang exchange and heard the *magnificant* talk >> >> "Building a transactional distributed data store with Erlang", by >> Alexander Reinefeld. >> >> I'll be blogging this as soon as I have the URL of the video of the >> talk. >> >> (in advance of this there was talk at the google conference on >> scalability >> >> http://video.google.com/videoplay?docid=-6526287646296437003&q=erlang+scalable&ei=cZ9oSLiDNIiCiwLL9fGwCA&hl=en >> >> oh and they also seem to have won the SCALE 2008 prize at the >> CCGrid conferense in Lyon but there is zero publicity about this >> AFAICS >> ) >> >> We (collectively) promised to help Alexander - I promised to >> provide him with a >> rendering engine (in Erlang) for the wikipedia markup language. >> >> Before I start hacking has anybody done this before? >> >> /Joe Armstrong >> > > > > -- > fra@...; ingvar.akesson@... > > [Kopia av detta meddelande skickas till FRA för övervakningsändamål. > De vill ju ändå läsa min e-post.] > > [A copy of this mail has been sent to > FRA for monitoring purposes. FRA wants to read all my e-mail and have > been allowed to do by the Swedish parliment - in violation of article > 12 of the UN Universal Declaration of Human Rights] > _______________________________________________ > erlang-questions mailing list > erlang-questions@... > http://www.erlang.org/mailman/listinfo/erlang-questions > _______________________________________________ erlang-questions mailing list erlang-questions@... http://www.erlang.org/mailman/listinfo/erlang-questions |
|
|
Re: wkipedia rendering engineOn Mon, Jun 30, 2008 at 1:33 PM, Andre Engels <andreengels@...> wrote:
> On Mon, Jun 30, 2008 at 1:23 PM, Joe Armstrong <erlang@...> wrote: > >> Is there a REST interface so that I can retreive the latest version of >> the MetaWiki markup for a specific page with, for example, >> a wget command. > > What's a REST interface? http://en.wikipedia.org/wiki/Representational_State_Transfer /J > There's several ways to get the MediaWiki > markup of a specific page: > * Go to the edit page; it contains the latest version of the markup > * Go to [[Special:Export]], where you can get either the current > version or all versions of a number of pages, in XML > * At http://download.wikimedia.org/backup-index.html are the complete > database dumps of the various wikis; the content of the page is in one > of the tables > > > -- > Andre Engels, andreengels@... > ICQ: 6260644 -- Skype: a_engels > -- fra@...; ingvar.akesson@... [Kopia av detta meddelande skickas till FRA för övervakningsändamål. De vill ju ändå läsa min e-post.] [A copy of this mail has been sent to FRA for monitoring purposes. FRA wants to read all my e-mail and have been allowed to do by the Swedish parliment - in violation of article 12 of the UN Universal Declaration of Human Rights] _______________________________________________ erlang-questions mailing list erlang-questions@... http://www.erlang.org/mailman/listinfo/erlang-questions |
|
|
Re: wkipedia rendering engineOn Mon, Jun 30, 2008 at 1:36 PM, Jan Lehnardt <jan@...> wrote:
> On Jun 30, 2008, at 13:23, Joe Armstrong wrote: >> >> Is there a REST interface so that I can retreive the latest version of >> the MetaWiki markup for a specific page with, for example, >> a wget command. > > You can get bulk dumps > http://en.wikipedia.org/wiki/Wikipedia:Database_download#Where_do_I_get... > > Why would you do individual scraping? In order to keep up to date with > changes that happened between the last dump and now()? > To get a few test cases to test my parser on *before* download the entire thing. Also I suspect the dumps are in MySQL format with xml junk - so it might not be a trival job to extract the raw data. I (presumably) will have to install MySQL and turn some XML stuff into the raw data (just guessing here) - thought that could be a job for a volunteer :-) /Joe > Cheers > Jan > -- > >> Has anybody made an erlang interface to scrape individual pages from >> the wikipedia - or to bulk convert the entire >> wikipedia to erlang terms :-) >> >> /Joe >> >> >> >> On Mon, Jun 30, 2008 at 11:39 AM, Joe Armstrong <erlang@...> wrote: >>> >>> Hi, >>> >>> I was at the erlang exchange and heard the *magnificant* talk >>> >>> "Building a transactional distributed data store with Erlang", by >>> Alexander Reinefeld. >>> >>> I'll be blogging this as soon as I have the URL of the video of the talk. >>> >>> (in advance of this there was talk at the google conference on >>> scalability >>> >>> >>> http://video.google.com/videoplay?docid=-6526287646296437003&q=erlang+scalable&ei=cZ9oSLiDNIiCiwLL9fGwCA&hl=en >>> >>> oh and they also seem to have won the SCALE 2008 prize at the >>> CCGrid conferense in Lyon but there is zero publicity about this AFAICS >>> ) >>> >>> We (collectively) promised to help Alexander - I promised to provide him >>> with a >>> rendering engine (in Erlang) for the wikipedia markup language. >>> >>> Before I start hacking has anybody done this before? >>> >>> /Joe Armstrong >>> >> >> >> >> -- >> fra@...; ingvar.akesson@... >> >> [Kopia av detta meddelande skickas till FRA för övervakningsändamål. >> De vill ju ändå läsa min e-post.] >> >> [A copy of this mail has been sent to >> FRA for monitoring purposes. FRA wants to read all my e-mail and have >> been allowed to do by the Swedish parliment - in violation of article >> 12 of the UN Universal Declaration of Human Rights] >> _______________________________________________ >> erlang-questions mailing list >> erlang-questions@... >> http://www.erlang.org/mailman/listinfo/erlang-questions >> > > -- fra@...; ingvar.akesson@... [Kopia av detta meddelande skickas till FRA för övervakningsändamål. De vill ju ändå läsa min e-post.] [A copy of this mail has been sent to FRA for monitoring purposes. FRA wants to read all my e-mail and have been allowed to do by the Swedish parliment - in violation of article 12 of the UN Universal Declaration of Human Rights] _______________________________________________ erlang-questions mailing list erlang-questions@... http://www.erlang.org/mailman/listinfo/erlang-questions |
|
|
Re: wkipedia rendering engineHi all,
as I am partially to blame for the noise around the wikirenderer, I will add my two cents. For our experiments, we used the XML dumps available at http://download.wikimedia.org. We have a small Java program which converts the XML dump to Erlang terms (http://www.zib.de/schuett/dumpreader.tgz). E.g. converting the bavarian dump: java -jar dumpreader.jar /home/schuett/barwiki-20080225-pages-meta-history.xml But you still have to parse the mediawiki text and convert it to HTML. For the last step we currently have two solutions: 1. Early experiments used flexbisonparse (http://svn.wikimedia.org/viewvc/mediawiki/trunk/flexbisonparse/) to convert the mediawiki text to XML and XSLT to convert the XML to HTML. 2. The current code is based on plog4u/bliki( see http://matheclipse.org/en/Java_Wikipedia_API) Thorsten On Monday 30 June 2008, Joe Armstrong wrote: > On Mon, Jun 30, 2008 at 1:36 PM, Jan Lehnardt <jan@...> wrote: > > On Jun 30, 2008, at 13:23, Joe Armstrong wrote: > >> Is there a REST interface so that I can retreive the latest version of > >> the MetaWiki markup for a specific page with, for example, > >> a wget command. > > > > You can get bulk dumps > > http://en.wikipedia.org/wiki/Wikipedia:Database_download#Where_do_I_get.. > >. > > > > Why would you do individual scraping? In order to keep up to date with > > changes that happened between the last dump and now()? > > To get a few test cases to test my parser on *before* download the entire > thing. > > Also I suspect the dumps are in MySQL format with xml junk - so it might > not be a trival job to extract the raw data. I (presumably) will have to > install MySQL and > turn some XML stuff into the raw data (just guessing here) - thought > that could be a job for a > volunteer :-) > > /Joe > > > Cheers > > Jan > > -- > > > >> Has anybody made an erlang interface to scrape individual pages from > >> the wikipedia - or to bulk convert the entire > >> wikipedia to erlang terms :-) > >> > >> /Joe > >> > >> On Mon, Jun 30, 2008 at 11:39 AM, Joe Armstrong <erlang@...> wrote: > >>> Hi, > >>> > >>> I was at the erlang exchange and heard the *magnificant* talk > >>> > >>> "Building a transactional distributed data store with Erlang", by > >>> Alexander Reinefeld. > >>> > >>> I'll be blogging this as soon as I have the URL of the video of the > >>> talk. > >>> > >>> (in advance of this there was talk at the google conference on > >>> scalability > >>> > >>> > >>> http://video.google.com/videoplay?docid=-6526287646296437003&q=erlang+s > >>>calable&ei=cZ9oSLiDNIiCiwLL9fGwCA&hl=en > >>> > >>> oh and they also seem to have won the SCALE 2008 prize at the > >>> CCGrid conferense in Lyon but there is zero publicity about this AFAICS > >>> ) > >>> > >>> We (collectively) promised to help Alexander - I promised to provide > >>> him with a > >>> rendering engine (in Erlang) for the wikipedia markup language. > >>> > >>> Before I start hacking has anybody done this before? > >>> > >>> /Joe Armstrong > >> > >> -- > >> fra@...; ingvar.akesson@... > >> > >> [Kopia av detta meddelande skickas till FRA för övervakningsändamål. > >> De vill ju ändå läsa min e-post.] > >> > >> [A copy of this mail has been sent to > >> FRA for monitoring purposes. FRA wants to read all my e-mail and have > >> been allowed to do by the Swedish parliment - in violation of article > >> 12 of the UN Universal Declaration of Human Rights] > >> _______________________________________________ > >> erlang-questions mailing list > >> erlang-questions@... > >> http://www.erlang.org/mailman/listinfo/erlang-questions _______________________________________________ erlang-questions mailing list erlang-questions@... http://www.erlang.org/mailman/listinfo/erlang-questions |
|
|
|
|
|
Re: wkipedia rendering engineRock and roll....
can you be more explicit than http://download.wikimedia.org can you point me to a specific file that I can download that works with your dump reader? Thanks /Joe On Mon, Jun 30, 2008 at 2:13 PM, Thorsten Schuett <schuett@...> wrote: > Hi all, > > as I am partially to blame for the noise around the wikirenderer, I will add > my two cents. > > For our experiments, we used the XML dumps available at > http://download.wikimedia.org. We have a small Java program which converts > the XML dump to Erlang terms (http://www.zib.de/schuett/dumpreader.tgz). E.g. > converting the bavarian dump: > java -jar dumpreader.jar /home/schuett/barwiki-20080225-pages-meta-history.xml > > But you still have to parse the mediawiki text and convert it to HTML. > For the last step we currently have two solutions: > > 1. Early experiments used flexbisonparse > (http://svn.wikimedia.org/viewvc/mediawiki/trunk/flexbisonparse/) to convert > the mediawiki text to XML and XSLT to convert the XML to HTML. > > 2. The current code is based on plog4u/bliki( see > http://matheclipse.org/en/Java_Wikipedia_API) > > Thorsten > > On Monday 30 June 2008, Joe Armstrong wrote: >> On Mon, Jun 30, 2008 at 1:36 PM, Jan Lehnardt <jan@...> wrote: >> > On Jun 30, 2008, at 13:23, Joe Armstrong wrote: >> >> Is there a REST interface so that I can retreive the latest version of >> >> the MetaWiki markup for a specific page with, for example, >> >> a wget command. >> > >> > You can get bulk dumps >> > http://en.wikipedia.org/wiki/Wikipedia:Database_download#Where_do_I_get.. >> >. >> > >> > Why would you do individual scraping? In order to keep up to date with >> > changes that happened between the last dump and now()? >> >> To get a few test cases to test my parser on *before* download the entire >> thing. >> >> Also I suspect the dumps are in MySQL format with xml junk - so it might >> not be a trival job to extract the raw data. I (presumably) will have to >> install MySQL and >> turn some XML stuff into the raw data (just guessing here) - thought >> that could be a job for a >> volunteer :-) >> >> /Joe >> >> > Cheers >> > Jan >> > -- >> > >> >> Has anybody made an erlang interface to scrape individual pages from >> >> the wikipedia - or to bulk convert the entire >> >> wikipedia to erlang terms :-) >> >> >> >> /Joe >> >> >> >> On Mon, Jun 30, 2008 at 11:39 AM, Joe Armstrong <erlang@...> wrote: >> >>> Hi, >> >>> >> >>> I was at the erlang exchange and heard the *magnificant* talk >> >>> >> >>> "Building a transactional distributed data store with Erlang", by >> >>> Alexander Reinefeld. >> >>> >> >>> I'll be blogging this as soon as I have the URL of the video of the >> >>> talk. >> >>> >> >>> (in advance of this there was talk at the google conference on >> >>> scalability >> >>> >> >>> >> >>> |