Tragical dynamics: that run for the number of articles

View: New views
20 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 | Next >

Tragical dynamics: that run for the number of articles

by Ziko van Dijk :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Maybe this is not the most popular item, but I do like to comment on
the news about Japanese and Polish Wikipedias and their 500,000
articles each. In fact, jp.WP actually has 500,000, but pl.WP does
not.
In an attempt to compare Wikipedia language editions I have clicked
the button "random articles" and with a sample of 50 clicks each I
have calculated how many articles a language edition really has, minus
all those pseudo articles.

A pseudo article is e.g.
http://pdc.wikipedia.org/wiki/Bikini
http://co.wikipedia.org/wiki/191
http://ksh.wikipedia.org/wiki/Varsseveld
http://pl.wikipedia.org/wiki/Tandil
http://vo.wikipedia.org/wiki/Poplar_Bluff

Many Wikipedias loose, in my calculation, quite a huge percentage of
their articles. There is one honourable exception: Japanese Wikipedia,
which in 50 clicks showed absolutely no pseudo article. If Japanese
Wikipedia would have such a floppy policy about new articles as many
others have, jp.WP were already close to one million "articles". Pl.WP
has for about 300,000 real articles, very respectable, but not what it
seems to be.

Since the beginnings, Wikipedians report about the number of articles,
having to tell something about to the media and to be proud about
their achievements. They rank Wikipedia language editions by the
number of articles. This has caused tragical dynamics: many
Wikipedians and Wikipedias are so obsessed with this number that they
produce rubbish articles to show off. Volapük Wikipedia with more than
100,000 pseudo articles created by a single bot using user is only the
top of the iceberg, and when someone called to close vo.WP, vo.WP was
supported by a amazing number of users from many language editions:
cosi fan tutte. Wikipedians could and should use their time for more
useful article work.

It would be good if the community found a different way to compare or
to measure it's successes.

Ziko





--
Ziko van Dijk
NL-Silvolde

_______________________________________________
foundation-l mailing list
foundation-l@...
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Re: Tragical dynamics: that run for the number of articles

by Andre Engels :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Jun 27, 2008 at 3:49 PM, Ziko van Dijk <zvandijk@...> wrote:

> A pseudo article is e.g.
> http://pdc.wikipedia.org/wiki/Bikini
> http://co.wikipedia.org/wiki/191
> http://ksh.wikipedia.org/wiki/Varsseveld
> http://pl.wikipedia.org/wiki/Tandil
> http://vo.wikipedia.org/wiki/Poplar_Bluff

Ok, I understand numbers 2, 4 and 5 in your list. Number 1 is
presumably included for being extremely stubby, but what's the issue
with the ksh: page? Only thing I notice is that the text part hasn't
got any internal links. But to consider something like that a 'non
article' like the co: and pl: examples seems harsh in the extreme.


--
Andre Engels, andreengels@...
ICQ: 6260644 -- Skype: a_engels

_______________________________________________
foundation-l mailing list
foundation-l@...
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Re: Tragical dynamics: that run for the number of articles

by Ziko van Dijk :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

http://ksh.wikipedia.org/wiki/Varsseveld
It's not Ripuarian (ksh), but Nedersaksisch, the text is taken
directly from nds-nl.
Ziko

2008/6/27 Andre Engels <andreengels@...>:

> On Fri, Jun 27, 2008 at 3:49 PM, Ziko van Dijk <zvandijk@...> wrote:
>
>> A pseudo article is e.g.
>> http://pdc.wikipedia.org/wiki/Bikini
>> http://co.wikipedia.org/wiki/191
>> http://ksh.wikipedia.org/wiki/Varsseveld
>> http://pl.wikipedia.org/wiki/Tandil
>> http://vo.wikipedia.org/wiki/Poplar_Bluff
>
> Ok, I understand numbers 2, 4 and 5 in your list. Number 1 is
> presumably included for being extremely stubby, but what's the issue
> with the ksh: page? Only thing I notice is that the text part hasn't
> got any internal links. But to consider something like that a 'non
> article' like the co: and pl: examples seems harsh in the extreme.
>
>
> --
> Andre Engels, andreengels@...
> ICQ: 6260644 -- Skype: a_engels
>
> _______________________________________________
> foundation-l mailing list
> foundation-l@...
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>



--
Ziko van Dijk
NL-Silvolde

_______________________________________________
foundation-l mailing list
foundation-l@...
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Re: Tragical dynamics: that run for the number of articles

by Harel Cain :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

The depth criterion available here:
http://meta.wikimedia.org/wiki/List_of_wikipedias is a good starting
point. I quote: "The "Depth" column ((Edits/Articles) ×
(Non-Articles/Articles) × (Stub-ratio)) is a rough indicator of a
Wikipedia's quality, showing how frequently its articles are updated."

Note that indeed Volapuek, Polish, Ripuarian and others have very low
depth ranking.


Harel

On Fri, Jun 27, 2008 at 4:49 PM, Ziko van Dijk <zvandijk@...> wrote:

>
> It would be good if the community found a different way to compare or
> to measure it's successes.


--
Quidquid latine dictum sit, altum viditur.

_______________________________________________
foundation-l mailing list
foundation-l@...
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Re: Tragical dynamics: that run for the number of articles

by Andrew Lih :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Jun 27, 2008 at 9:49 PM, Ziko van Dijk <zvandijk@...> wrote:
> Maybe this is not the most popular item, but I do like to comment on
> the news about Japanese and Polish Wikipedias and their 500,000
> articles each. In fact, jp.WP actually has 500,000, but pl.WP does
> not.
> In an attempt to compare Wikipedia language editions I have clicked
> the button "random articles" and with a sample of 50 clicks each I
> have calculated how many articles a language edition really has, minus
> all those pseudo articles.

Yes, it's good to remind folks that "article count" is not a good
metric as it fails to take into account the cultural norms within the
language communities.

For a real startling view of what you are observing, you can see the
wikistats show Ja: (orange) has never had a "bot bump" like pl:, where
all those jagged jumps (yellow) are bot additions, meaning those
articles very likely have never been edited by humans.

http://stats.wikimedia.org/EN/PlotsPngArticlesTotal.htm#p2

-Andrew (User:Fuzheado)

_______________________________________________
foundation-l mailing list
foundation-l@...
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Re: Tragical dynamics: that run for the number of articles

by Ziko van Dijk :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Alas, judging a language edition by Wikimedia Statistics does not work.

Indonesian, Asturian and Volapük WPs have the same "depth" (8), but
id.WP is a very good WP. How comes? There not so many edits per
article in id.WP, because it has translated a lot from English. A
legitimate way to create (good) articles, but it does not need a lot
of edits.

Bot activity: Indeed, "bot bumps" can often easily be detected in stats tables.
Especially the small Wikipedias (I suppose) show (relatively) many bot
activities due to interwiki linking. On the other hand, pseudo
articles can be created by hand (let a script create it outside WP and
then insert it "manually").

Ziko



2008/6/27 Harel Cain <harel.cain@...>:

> The depth criterion available here:
> http://meta.wikimedia.org/wiki/List_of_wikipedias is a good starting
> point. I quote: "The "Depth" column ((Edits/Articles) ×
> (Non-Articles/Articles) × (Stub-ratio)) is a rough indicator of a
> Wikipedia's quality, showing how frequently its articles are updated."
>
> Note that indeed Volapuek, Polish, Ripuarian and others have very low
> depth ranking.
>
>
> Harel
>
> On Fri, Jun 27, 2008 at 4:49 PM, Ziko van Dijk <zvandijk@...> wrote:
>
>>
>> It would be good if the community found a different way to compare or
>> to measure it's successes.
>
>
> --
> Quidquid latine dictum sit, altum viditur.
>
> _______________________________________________
> foundation-l mailing list
> foundation-l@...
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>



--
Ziko van Dijk
NL-Silvolde

_______________________________________________
foundation-l mailing list
foundation-l@...
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Re: Tragical dynamics: that run for the number ofarticles

by Przykuta :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> > Alas, judging a language edition by Wikimedia Statistics does not work.
>
> Indonesian, Asturian and Volapük WPs have the same "depth" (8), but
> id.WP is a very good WP. How comes? There not so many edits per
> article in id.WP, because it has translated a lot from English. A
> legitimate way to create (good) articles, but it does not need a lot
> of edits.
>
> Bot activity: Indeed, "bot bumps" can often easily be detected in stats tables.
> Especially the small Wikipedias (I suppose) show (relatively) many bot
> activities due to interwiki linking. On the other hand, pseudo
> articles can be created by hand (let a script create it outside WP and
> then insert it "manually").
>
> Ziko
>

Hi Ziko. Standard of article in pl wiki is 1,5 kb (in 2008)

http://tools.wikimedia.pl/~warx/dnb/index.xml

500k is nothing for us, but is good event for PR and only it. If we have good PR we have more new users. I can't see "count obsession" in pl wiki. I se other obsessions - copyvio, POV, vandals, trolls, lack of sources etc. Users in pl wiki know how others look at them. Believe me, we try be better, but our community is not huge.

Przykuta

_______________________________________________
foundation-l mailing list
foundation-l@...
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Re: Tragical dynamics: that run for the number of articles

by Tomasz Ganicz :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

2008/6/27 Ziko van Dijk <zvandijk@...>:

> Maybe this is not the most popular item, but I do like to comment on
> the news about Japanese and Polish Wikipedias and their 500,000
> articles each. In fact, jp.WP actually has 500,000, but pl.WP does
> not.
> In an attempt to compare Wikipedia language editions I have clicked
> the button "random articles" and with a sample of 50 clicks each I
> have calculated how many articles a language edition really has, minus
> all those pseudo articles.
>
> A pseudo article is e.g.
> http://pdc.wikipedia.org/wiki/Bikini
> http://co.wikipedia.org/wiki/191
> http://ksh.wikipedia.org/wiki/Varsseveld
> http://pl.wikipedia.org/wiki/Tandil
> http://vo.wikipedia.org/wiki/Poplar_Bluff
>
> Many Wikipedias loose, in my calculation, quite a huge percentage of
> their articles. There is one honourable exception: Japanese Wikipedia,
> which in 50 clicks showed absolutely no pseudo article. If Japanese
> Wikipedia would have such a floppy policy about new articles as many
> others have, jp.WP were already close to one million "articles". Pl.WP
> has for about 300,000 real articles, very respectable, but not what it
> seems to be.
>
> Since the beginnings, Wikipedians report about the number of articles,
> having to tell something about to the media and to be proud about
> their achievements. They rank Wikipedia language editions by the
> number of articles. This has caused tragical dynamics: many
> Wikipedians and Wikipedias are so obsessed with this number that they
> produce rubbish articles to show off. Volapük Wikipedia with more than
> 100,000 pseudo articles created by a single bot using user is only the
> top of the iceberg, and when someone called to close vo.WP, vo.WP was
> supported by a amazing number of users from many language editions:
> cosi fan tutte. Wikipedians could and should use their time for more
> useful article work.
>

Well... Bear in mind that English Wikipedia also contains quite a lot
of bot-created articles and in fact English Wikipedia was the first
one to produce it. The others just followed the idea and started to do
it in order to artifically increase the number of articles. Polish
started to do it, when our rank went down due to mass production of
bot-created articles in Swedish, Italian, French and other Wikipedias.

Comapare:

http://pl.wikipedia.org/wiki/Aignerville

and

http://en.wikipedia.org/wiki/Aignerville

or

http://pl.wikipedia.org/wiki/Is%C3%B2vol

and

http://it.wikipedia.org/wiki/Is%C3%B2vol

http://nl.wikipedia.org/wiki/Eksj%C3%B6_(stad)

and

http://pl.wikipedia.org/wiki/Eksj%C3%B6

http://pl.wikipedia.org/wiki/Dystrykt_Set%C3%BAbal

and

http://nn.wikipedia.org/wiki/Set%C3%BAbal

etc...

Nothing really special with Polish Wikipedia - many others do exactly
the same including English. We had simply more active coders who knew
how to feed bots. But - as you can compare with other Wikipedias they
did sometimes really good job - in a sense that many bot created stubs
in Polish Wikipedia contains more data than their equivalents in for
example Swedish or French Wikipedia.

http://fr.wikipedia.org/wiki/Gr%C3%B3dek

http://fr.wikipedia.org/wiki/Drzewica

http://fr.wikipedia.org/wiki/Pszczyna

http://fr.wikipedia.org/wiki/Jas%C5%82o

etc...


--
Tomek "Polimerek" Ganicz
http://pl.wikimedia.org/wiki/User:Polimerek
http://www.ganicz.pl/poli/
http://www.ptchem.lodz.pl/en/TomaszGanicz.html

_______________________________________________
foundation-l mailing list
foundation-l@...
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Re: Tragical dynamics: that run for the number of articles

by Ziko van Dijk :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Among the Big Wikipedias, the pl.WP has one of the lowest quota of
real articles:

        Artikel (off.) realt. Art.  Artikel W (Quot.)
EN 1400000 1344000 0,96
DE 696000 668160 0,96
FR 613000 514920 0,84
JA 466000 466000 1
IT 408000 301920 0,74
PL 467000 298880 0,64
ES 326000 293400 0,9
NL 404000 274720 0,68
SV 272000 217600 0,8
PT  338000 209560 0,62
RU 233000 195720 0,84
ZH 164000 144320 0,88
(most numbers from jan. 2008, en, de and pt older; estimations should
be rounded, in fact)

Only 64 % real articles in pl.WP, while the much criticized sv.WP has 80%.
But this is not about blaming some Wikipedians, but about finding out
how to compare WPs in a more effective way.
The average size (bytes per article) does not work either. Take the
article "Berlin" in Opper Sorabian (hsb). It has 3740 bytes. Sounds
good, but only 454 bytes (six short sentences) are the actual text.
1823 bytes alone are for the interwikis. This is not a manipulation,
but you see the difficulties when reading Wikimedia statistics. Even a
"geographical stub" with infoboxes, categories and interwikis produces
a lot of bytes.
It takes a human to evaluate.
Ziko

2008/6/27 Tomasz Ganicz <polimerek@...>:

> 2008/6/27 Ziko van Dijk <zvandijk@...>:
>> Maybe this is not the most popular item, but I do like to comment on
>> the news about Japanese and Polish Wikipedias and their 500,000
>> articles each. In fact, jp.WP actually has 500,000, but pl.WP does
>> not.
>> In an attempt to compare Wikipedia language editions I have clicked
>> the button "random articles" and with a sample of 50 clicks each I
>> have calculated how many articles a language edition really has, minus
>> all those pseudo articles.
>>
>> A pseudo article is e.g.
>> http://pdc.wikipedia.org/wiki/Bikini
>> http://co.wikipedia.org/wiki/191
>> http://ksh.wikipedia.org/wiki/Varsseveld
>> http://pl.wikipedia.org/wiki/Tandil
>> http://vo.wikipedia.org/wiki/Poplar_Bluff
>>
>> Many Wikipedias loose, in my calculation, quite a huge percentage of
>> their articles. There is one honourable exception: Japanese Wikipedia,
>> which in 50 clicks showed absolutely no pseudo article. If Japanese
>> Wikipedia would have such a floppy policy about new articles as many
>> others have, jp.WP were already close to one million "articles". Pl.WP
>> has for about 300,000 real articles, very respectable, but not what it
>> seems to be.
>>
>> Since the beginnings, Wikipedians report about the number of articles,
>> having to tell something about to the media and to be proud about
>> their achievements. They rank Wikipedia language editions by the
>> number of articles. This has caused tragical dynamics: many
>> Wikipedians and Wikipedias are so obsessed with this number that they
>> produce rubbish articles to show off. Volapük Wikipedia with more than
>> 100,000 pseudo articles created by a single bot using user is only the
>> top of the iceberg, and when someone called to close vo.WP, vo.WP was
>> supported by a amazing number of users from many language editions:
>> cosi fan tutte. Wikipedians could and should use their time for more
>> useful article work.
>>
>
> Well... Bear in mind that English Wikipedia also contains quite a lot
> of bot-created articles and in fact English Wikipedia was the first
> one to produce it. The others just followed the idea and started to do
> it in order to artifically increase the number of articles. Polish
> started to do it, when our rank went down due to mass production of
> bot-created articles in Swedish, Italian, French and other Wikipedias.
>
> Comapare:
>
> http://pl.wikipedia.org/wiki/Aignerville
>
> and
>
> http://en.wikipedia.org/wiki/Aignerville
>
> or
>
> http://pl.wikipedia.org/wiki/Is%C3%B2vol
>
> and
>
> http://it.wikipedia.org/wiki/Is%C3%B2vol
>
> http://nl.wikipedia.org/wiki/Eksj%C3%B6_(stad)
>
> and
>
> http://pl.wikipedia.org/wiki/Eksj%C3%B6
>
> http://pl.wikipedia.org/wiki/Dystrykt_Set%C3%BAbal
>
> and
>
> http://nn.wikipedia.org/wiki/Set%C3%BAbal
>
> etc...
>
> Nothing really special with Polish Wikipedia - many others do exactly
> the same including English. We had simply more active coders who knew
> how to feed bots. But - as you can compare with other Wikipedias they
> did sometimes really good job - in a sense that many bot created stubs
> in Polish Wikipedia contains more data than their equivalents in for
> example Swedish or French Wikipedia.
>
> http://fr.wikipedia.org/wiki/Gr%C3%B3dek
>
> http://fr.wikipedia.org/wiki/Drzewica
>
> http://fr.wikipedia.org/wiki/Pszczyna
>
> http://fr.wikipedia.org/wiki/Jas%C5%82o
>
> etc...
>
>
> --
> Tomek "Polimerek" Ganicz
> http://pl.wikimedia.org/wiki/User:Polimerek
> http://www.ganicz.pl/poli/
> http://www.ptchem.lodz.pl/en/TomaszGanicz.html
>
> _______________________________________________
> foundation-l mailing list
> foundation-l@...
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>



--
Ziko van Dijk
NL-Silvolde

_______________________________________________
foundation-l mailing list
foundation-l@...
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Re: Tragical dynamics: that run for the number of articles

by Mathias Schindler-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Jun 27, 2008 at 4:51 PM, Andrew Lih <andrew.lih@...> wrote:

> Yes, it's good to remind folks that "article count" is not a good
> metric as it fails to take into account the cultural norms within the
> language communities.

A year ago, some admins at de.wp (including me) tried and failed to
replace the "article count" on the de.wp front page by a rather vague
statement of "hundreds of thousands of articles" along with the
counter for articles with the "featured article" status. Not that it
would be impossible to compromise that number, but at least I felt it
was something worth focusing on to display it to our visitors.

Mathias

_______________________________________________
foundation-l mailing list
foundation-l@...
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Re: Tragical dynamics: that run for the number of articles

by Tomasz Ganicz :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

2008/6/27 Ziko van Dijk <zvandijk@...>:

> Among the Big Wikipedias, the pl.WP has one of the lowest quota of
> real articles:
>
>        Artikel (off.)  realt. Art.  Artikel W (Quot.)
> EN      1400000 1344000 0,96
> DE      696000  668160  0,96
> FR      613000  514920  0,84
> JA      466000  466000  1
> IT      408000  301920  0,74
> PL      467000  298880  0,64
> ES      326000  293400  0,9
> NL      404000  274720  0,68
> SV      272000  217600  0,8
> PT  338000      209560  0,62
> RU      233000  195720  0,84
> ZH      164000  144320  0,88
> (most numbers from jan. 2008, en, de and pt older; estimations should
> be rounded, in fact)
>


Can you explain how this evalution been done? How do you distinguish
between "real" and other articles? Especially I don't believe in
statiscts shown for en Wikipedia. I have a feeing that there is much
more bot created articles in en Wikipedia than your statistcs show.

About a year ago I wanted to evaluate the number of bot created
articles created in Polish Wikipedia, and then evaluate how many of
them were expanded by humans. Unfortunatelly it was impossible to
perform as the bot owners do not keep records of its activity. Anyway
we checked randomly what happened with bot-created articles about
Polish villages and small towns, which was the very first bot
produciton in our Wikikipedia. As I was strongly opposed several years
ago to produce bot-created articles but failed to persuade my fellow
wikipedians, I just wanted to prove that it was indeed bad idea.
However, the study shown that around 70% of them were efectively
expanded by humans. Villagers added quite a lot of useful stuff to
these articles like histories of their villages, pictures of
interesting buildings etc. Can you explain if these articles are
treated "real" or "not real" in your statistics and why?


--
Tomek "Polimerek" Ganicz
http://pl.wikimedia.org/wiki/User:Polimerek
http://www.ganicz.pl/poli/
http://www.ptchem.lodz.pl/en/TomaszGanicz.html

_______________________________________________
foundation-l mailing list
foundation-l@...
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Re: Tragical dynamics: that run for the number of articles

by Andre Engels :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sat, Jun 28, 2008 at 9:47 AM, Tomasz Ganicz <polimerek@...> wrote:

> Can you explain how this evalution been done? How do you distinguish
> between "real" and other articles? Especially I don't believe in
> statiscts shown for en Wikipedia. I have a feeing that there is much
> more bot created articles in en Wikipedia than your statistcs show.

That is described in his first mail: He did 'random article' 50 times
and used that as a sample.


--
Andre Engels, andreengels@...
ICQ: 6260644 -- Skype: a_engels

_______________________________________________
foundation-l mailing list
foundation-l@...
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Re: Tragical dynamics: that run for the number of articles

by Tomasz Ganicz :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

2008/6/28 Andre Engels <andreengels@...>:

> On Sat, Jun 28, 2008 at 9:47 AM, Tomasz Ganicz <polimerek@...> wrote:
>
>> Can you explain how this evalution been done? How do you distinguish
>> between "real" and other articles? Especially I don't believe in
>> statiscts shown for en Wikipedia. I have a feeing that there is much
>> more bot created articles in en Wikipedia than your statistcs show.
>
> That is described in his first mail: He did 'random article' 50 times
> and used that as a sample.
>

Well it is not described - I mean there is no clear criteria of
evaluation mentioned.
Does he speak Japanese or Polish? Is it possible to recognize "real"
and "unreal" articles without understanding them?

Compare:

http://he.wikipedia.org/wiki/%D7%9C%D7%95%D7%93%D7%96%27

Is it "real" or "unreal" article and why? I have a feeling that it is
bot created, but I am no sure about it, as I don't speak Hebrew :-)

And what about this:

http://uk.wikipedia.org/wiki/%D0%A4%D1%96%D0%B3%D1%83%D0%BB%D1%81_%D1%96_%D0%90%D0%BB%D1%96%D0%BD%D1%8C%D1%8F

It is quite long, but I am almost sure that it is bot created and
untouch by any human, because it contains only statistical data and
sentences looking as if they were machine created. I don't speak
Ukrainian well but understand it a little bit. But it is still just my
feelings...

It is funny that this article is longer than similar in es-Wikipedia,
although Spanish one was edited by humans for sure :-)

http://es.wikipedia.org/wiki/F%C3%ADgols_y_Ali%C3%B1%C3%A1

and moreover - if you check all Wikipedias which contain article about
Fígols i Alinyà only Spanish one looks as edited by human (but it is
just my feelings I can be wrong).

And this:

http://ta.wikipedia.org/wiki/%E0%AE%B5%E0%AE%BE%E0%AE%B0%E0%AF%8D%E0%AE%9A%E0%AE%BE

real or not real? I really don't know, probably bot-created :-)

I think if we would like to perform serios evaluation of "real" and
"unreal" articles it should be based on clear, not based on "feelings"
criteria, done on larger samples (at least 500 articles) and by people
who understand what they are reading.


--
Tomek "Polimerek" Ganicz
http://pl.wikimedia.org/wiki/User:Polimerek
http://www.ganicz.pl/poli/
http://www.ptchem.lodz.pl/en/TomaszGanicz.html

_______________________________________________
foundation-l mailing list
foundation-l@...
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Re: Tragical dynamics: that run for the number of articles

by Ziko van Dijk :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

There is Google Translater, and the Interwikis help as well. That
article of he.WP about Lodz I would count as a real article, because
there is information more than in a data base (links to Holocaust
related articles, something about 19th century, economy (textile)).
Indeed, I would like to make a more scientific scheme and apply it to
a larger sample, maybe there will establish a research group about. I
believe that my method does give a reasonable picture; of course,
whether my results say "50.000" real articles or "52.000" is not
really a measurable difference.
Ziko

PS: By the way, it is fun to browse a foreign language Wikipedia with
the help of Google translater - not perfect, but interesting what
others write about.


2008/6/28 Tomasz Ganicz <polimerek@...>:

> 2008/6/28 Andre Engels <andreengels@...>:
>> On Sat, Jun 28, 2008 at 9:47 AM, Tomasz Ganicz <polimerek@...> wrote:
>>
>>> Can you explain how this evalution been done? How do you distinguish
>>> between "real" and other articles? Especially I don't believe in
>>> statiscts shown for en Wikipedia. I have a feeing that there is much
>>> more bot created articles in en Wikipedia than your statistcs show.
>>
>> That is described in his first mail: He did 'random article' 50 times
>> and used that as a sample.
>>
>
> Well it is not described - I mean there is no clear criteria of
> evaluation mentioned.
> Does he speak Japanese or Polish? Is it possible to recognize "real"
> and "unreal" articles without understanding them?
>
> Compare:
>
> http://he.wikipedia.org/wiki/%D7%9C%D7%95%D7%93%D7%96%27
>
> Is it "real" or "unreal" article and why? I have a feeling that it is
> bot created, but I am no sure about it, as I don't speak Hebrew :-)
>
> And what about this:
>
> http://uk.wikipedia.org/wiki/%D0%A4%D1%96%D0%B3%D1%83%D0%BB%D1%81_%D1%96_%D0%90%D0%BB%D1%96%D0%BD%D1%8C%D1%8F
>
> It is quite long, but I am almost sure that it is bot created and
> untouch by any human, because it contains only statistical data and
> sentences looking as if they were machine created. I don't speak
> Ukrainian well but understand it a little bit. But it is still just my
> feelings...
>
> It is funny that this article is longer than similar in es-Wikipedia,
> although Spanish one was edited by humans for sure :-)
>
> http://es.wikipedia.org/wiki/F%C3%ADgols_y_Ali%C3%B1%C3%A1
>
> and moreover - if you check all Wikipedias which contain article about
> Fígols i Alinyà only Spanish one looks as edited by human (but it is
> just my feelings I can be wrong).
>
> And this:
>
> http://ta.wikipedia.org/wiki/%E0%AE%B5%E0%AE%BE%E0%AE%B0%E0%AF%8D%E0%AE%9A%E0%AE%BE
>
> real or not real? I really don't know, probably bot-created :-)
>
> I think if we would like to perform serios evaluation of "real" and
> "unreal" articles it should be based on clear, not based on "feelings"
> criteria, done on larger samples (at least 500 articles) and by people
> who understand what they are reading.
>
>
> --
> Tomek "Polimerek" Ganicz
> http://pl.wikimedia.org/wiki/User:Polimerek
> http://www.ganicz.pl/poli/
> http://www.ptchem.lodz.pl/en/TomaszGanicz.html
>
> _______________________________________________
> foundation-l mailing list
> foundation-l@...
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>



--
Ziko van Dijk
NL-Silvolde

_______________________________________________
foundation-l mailing list
foundation-l@...
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Re: Tragical dynamics: that run for the number of articles

by Tomasz Ganicz :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

2008/6/28 Ziko van Dijk <zvandijk@...>:
> There is Google Translater, and the Interwikis help as well. That
> article of he.WP about Lodz I would count as a real article, because
> there is information more than in a data base (links to Holocaust
> related articles, something about 19th century, economy (textile)).
> Indeed, I would like to make a more scientific scheme and apply it to
> a larger sample, maybe there will establish a research group about. I
> believe that my method does give a reasonable picture; of course,
> whether my results say "50.000" real articles or "52.000" is not
> really a measurable difference.

Sorry about it, but it only shows that your results are not reliable,
because it is based on your feelings and poor quality machine
translations which could change in unpredicta