Wiktionary quality issues

View: New views
9 Messages — Rating Filter:   Alert me  

Wiktionary quality issues

by Gerard Meijssen-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On the Wiktionary <http://wiktionary.org/> project I run the interwiki bot.
The process is simple; when an article exists in another language spelled
exactly the same, I create an "interwiki" link. This allows you to see the
information on another language Wiktionary. This process is an automated
process, it works on all Wiktionaries and it is an unattended process.

I have received a request from the Polish Wiktionary to stop adding
interwiki links for the Russian and for the Vietnamese Wiktionary. The
reason given is one of quality. On the Russian Wiktionary many of the
articles are created by a bot and they do not provide good information. An
example is dispersion, <http://ru.wiktionary.org/wiki/dispersion> there is
nothing really in there. The Vietnamese Wiktionary is more problematic
because a bot was used to generate declension and conjugation tables of
Russian words and they got it wrong.

The Russian Wiktionary has some 81.000 empty shells and refuse to remove it.
The Vietnamese are not willing to remove there incorrect data.

I have been asked to stop including the Russian Wiktionary and the
Vietnamese Wiktionary when I run the interwiki process. To be honest, I run
the bot as a service and I do not think it is the right thing to do. I think
the Vietnamese are wrong not to correct the wrong data that they have. I am
less sure about the Russian approach; in essence it is a stub. However,
creating a Wiktionary in this way is like stamp collecting; you can look at
it but there is not information about it.

Given how the process works, I am not sure that I can exclude either the
Russian or the Vietnamese Wiktionary. The way it works is that I run
explicitly on all Wiktionaries. When I exclude Russian or Vietnamese, I will
probably end up removing all references to these projects. They are the
third and fourth Wiktionary is size.

When I do not exclude the Russian and the Vietnamese Wiktionary, the bot may
end up being blocked on the Polish Wiktionary. This will also kill off the
interwiki process.

From my point of view, using bots to generate content in a Wiktionary only
makes sense when there is at least a link to the word in the base language.
When the initial creation of stubs is followed by the enrichment of these
stubs it is acceptable. For having information that is completely wrong,
there is no excuse.

The question is, will there be a discussion about acceptable practices in
Wiktionary. The question are:

   - Can the Polish demand what they do?
   - Is having a project that consists mainly of stubs acceptable?
   - Is having incorrect data acceptable?

Thanks,
GerardM

PS I copied this from my blog.
_______________________________________________
Wiktionary-l mailing list
Wiktionary-l@...
http://lists.wikimedia.org/mailman/listinfo/wiktionary-l

Re: Wiktionary quality issues

by Muke Tever :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

GerardM <gerard.meijssen@...> wrote:
>    - Can the Polish demand what they do?

Absolutely.  You continue saying your bot is a "service" but a service
works for the people who need it and does what they want; it doesn't (except
perhaps incidentally) work for the person providing it, doing what he wants.

>    - Is having a project that consists mainly of stubs acceptable?

Stubs? Yes.  When I worked with the English Wikipedia it was mainly stubs.

The Russian example, though, is more a project that has been pre-seeded with
templates.  There is nothing wrong with this in itself--though it does inflate
the page count--and we have already gone over the usefulness of knowing a word
exists in a language.

>    - Is having incorrect data acceptable?

Isn't it the point of wiki that one has incorrect and incomplete data, but that
one is building a community who will take the effort to improve it?  In such
a case you would, rather than wanting to hide the links, make the information
_more_ public so, say, Russian visitors curious to see how the Vietnamese handle
their words can contribute to correcting the information.  (After all--this problem,
was brought to your attention by vi.wikt regulars, or those following interwiki
links to it?)


        *Muke!
--
website:     http://frath.net/

_______________________________________________
Wiktionary-l mailing list
Wiktionary-l@...
http://lists.wikimedia.org/mailman/listinfo/wiktionary-l

Re: Wiktionary quality issues

by the dave ross :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

If your bot were blocked on a given wiki all that would happen is that your
bot could no longer edit their entries.  Your bot could still get data from
that wiki, and it could still write that data to all other wikis.  Sounds
like a painless control over the bot, and one that any wiki which doesn't
want that interwiki data should use.  How do you figure that either solution
will actually affect the process as a whole, anyway?
-Dave

On 4/21/07, Muke Tever <muke@...> wrote:

>
> GerardM <gerard.meijssen@...> wrote:
> >    - Can the Polish demand what they do?
>
> Absolutely.  You continue saying your bot is a "service" but a service
> works for the people who need it and does what they want; it doesn't
> (except
> perhaps incidentally) work for the person providing it, doing what he
> wants.
>
> >    - Is having a project that consists mainly of stubs acceptable?
>
> Stubs? Yes.  When I worked with the English Wikipedia it was mainly stubs.
>
> The Russian example, though, is more a project that has been pre-seeded
> with
> templates.  There is nothing wrong with this in itself--though it does
> inflate
> the page count--and we have already gone over the usefulness of knowing a
> word
> exists in a language.
>
> >    - Is having incorrect data acceptable?
>
> Isn't it the point of wiki that one has incorrect and incomplete data, but
> that
> one is building a community who will take the effort to improve it?  In
> such
> a case you would, rather than wanting to hide the links, make the
> information
> _more_ public so, say, Russian visitors curious to see how the Vietnamese
> handle
> their words can contribute to correcting the information.  (After
> all--this problem,
> was brought to your attention by vi.wikt regulars, or those following
> interwiki
> links to it?)
>
>
>         *Muke!
> --
> website:     http://frath.net/
>
> _______________________________________________
> Wiktionary-l mailing list
> Wiktionary-l@...
> http://lists.wikimedia.org/mailman/listinfo/wiktionary-l
>
_______________________________________________
Wiktionary-l mailing list
Wiktionary-l@...
http://lists.wikimedia.org/mailman/listinfo/wiktionary-l

Re: Wiktionary quality issues

by Minh Nguyen-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thanks for bringing this up, Gerard. As I noted on your blog entry,
we're aware of the problem and are working to correct it. Right now,
we're about to blank the existing templates and import new ones directly
from the Russian Wiktionary. [1] There aren't many of us working on the
Vietnamese Wiktionary, as I've said before, and only one of us knows any
Russian (and not that much). That's why it's taken so long for anyone to
notice the mistakes. You're all welcome to join in on our discussion.

I would oppose delisting the Vietnamese Wiktionary on the grounds that
our Vietnamese, English, and French entries -- which make up the vast
majority of our site -- are rather good. In fact, the source that we
used for all our imports is *the* Vietnamese translationary on the Web.
It's just the conjugation tables that PiedBot created that are the
problem. I think having your bot distinguish between the Russian and
non-Russian entries would be more trouble than it's worth.

By the way, you might want to have a look at the Lombard Wikipedia
sometime. They have thousands of articles in English that claim to be in
a variety of Lombard. [2] By comparison, the Russian Wiktionary doesn't
look that bad. :)

[1]
<http://vi.wiktionary.org/wiki/Thảo_luận_Thành_viên:David#Re:_.5B.5BTh.E1.BA.A3o_lu.E1.BA.ADn_Th.C3.A0nh_vi.C3.AAn:Mxn.23Russian_conjugations.7CRussian_conjugations.5D.5D>
[2]
<http://lmo.wikipedia.org/wiki/14th_Street_(IRT_Broadway-Seventh_Avenue_Line)>

GerardM wrote:

> On the Wiktionary <http://wiktionary.org/> project I run the interwiki bot.
> The process is simple; when an article exists in another language spelled
> exactly the same, I create an "interwiki" link. This allows you to see the
> information on another language Wiktionary. This process is an automated
> process, it works on all Wiktionaries and it is an unattended process.
>
> I have received a request from the Polish Wiktionary to stop adding
> interwiki links for the Russian and for the Vietnamese Wiktionary. The
> reason given is one of quality. On the Russian Wiktionary many of the
> articles are created by a bot and they do not provide good information. An
> example is dispersion, <http://ru.wiktionary.org/wiki/dispersion> there is
> nothing really in there. The Vietnamese Wiktionary is more problematic
> because a bot was used to generate declension and conjugation tables of
> Russian words and they got it wrong.
>
> The Russian Wiktionary has some 81.000 empty shells and refuse to remove it.
> The Vietnamese are not willing to remove there incorrect data.
>
> I have been asked to stop including the Russian Wiktionary and the
> Vietnamese Wiktionary when I run the interwiki process. To be honest, I run
> the bot as a service and I do not think it is the right thing to do. I think
> the Vietnamese are wrong not to correct the wrong data that they have. I am
> less sure about the Russian approach; in essence it is a stub. However,
> creating a Wiktionary in this way is like stamp collecting; you can look at
> it but there is not information about it.
>
> Given how the process works, I am not sure that I can exclude either the
> Russian or the Vietnamese Wiktionary. The way it works is that I run
> explicitly on all Wiktionaries. When I exclude Russian or Vietnamese, I will
> probably end up removing all references to these projects. They are the
> third and fourth Wiktionary is size.
>
> When I do not exclude the Russian and the Vietnamese Wiktionary, the bot may
> end up being blocked on the Polish Wiktionary. This will also kill off the
> interwiki process.
>
> From my point of view, using bots to generate content in a Wiktionary only
> makes sense when there is at least a link to the word in the base language.
> When the initial creation of stubs is followed by the enrichment of these
> stubs it is acceptable. For having information that is completely wrong,
> there is no excuse.
>
> The question is, will there be a discussion about acceptable practices in
> Wiktionary. The question are:
>
>    - Can the Polish demand what they do?
>    - Is having a project that consists mainly of stubs acceptable?
>    - Is having incorrect data acceptable?
>
> Thanks,
> GerardM
>
> PS I copied this from my blog.

--
Minh Nguyen <mxn@...>
[[en:User:Mxn]] [[vi:User:Mxn]] [[m:User:Mxn]]
AIM: trycom2000; Jabber: mxn@...; Blog: http://mxn.f2o.org/


_______________________________________________
Wiktionary-l mailing list
Wiktionary-l@...
http://lists.wikimedia.org/mailman/listinfo/wiktionary-l

Re: Wiktionary quality issues

by Dmcdevit :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

GerardM wrote:
> On the Russian Wiktionary many of the
> articles are created by a bot and they do not provide good information. An
> example is dispersion, <http://ru.wiktionary.org/wiki/dispersion> there is
> nothing really in there.
Would it be possible to have the Russian bot creating content-free
articles include some kind of tag, to be removed by a human editor when
adding content, that the interwiki bot could recognize? Ideally, we
should not link to non-content, but that is preferable to not linking to
ruwikt at all.
> The Vietnamese Wiktionary is more problematic
> because a bot was used to generate declension and conjugation tables of
> Russian words and they got it wrong.
>  
I agree with Muke here. Factual inaccuracies are undesirable, but all
Wiktionaries have them to some extent, and it is not another
Wiktionary's job to police them. It is inherent to the wiki process that
there will always be room for improvement; excluding interwiki links for
inaccuracies is unworkable. There are many good viwikt articles, and
there will be more good viwikt articles in the future, regardless of
their problems. At the same time, this is a plwikt local issue, and if
they develop consensus on the matter, I would feel uncomfortable
imposing any outsiders' rules on them.
> Given how the process works, I am not sure that I can exclude either the
> Russian or the Vietnamese Wiktionary. The way it works is that I run
> explicitly on all Wiktionaries. When I exclude Russian or Vietnamese, I will
> probably end up removing all references to these projects. They are the
> third and fourth Wiktionary is size.
>
> When I do not exclude the Russian and the Vietnamese Wiktionary, the bot may
> end up being blocked on the Polish Wiktionary. This will also kill off the
> interwiki process.
Does this mean that you couldn't just have the bot not add Russian and
Vietnamese interwiki links to plwikt only? Even if we don't like the
policy of excluding certain project's interwiki links, it is better than
having no links.

Dominic

_______________________________________________
Wiktionary-l mailing list
Wiktionary-l@...
http://lists.wikimedia.org/mailman/listinfo/wiktionary-l

Re: Wiktionary quality issues

by Gerard Meijssen-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hoi,
When my bot is blocked on one Wiktionary it does not work at all any
more. The way it is configured is that it works on all projects at all
times. This is completely different from how it works on the Wikipedia
projects. It is also why one guy can run this as a service.
Thanks,
    Gerard

the dave ross schreef:

> If your bot were blocked on a given wiki all that would happen is that your
> bot could no longer edit their entries.  Your bot could still get data from
> that wiki, and it could still write that data to all other wikis.  Sounds
> like a painless control over the bot, and one that any wiki which doesn't
> want that interwiki data should use.  How do you figure that either solution
> will actually affect the process as a whole, anyway?
> -Dave
>
> On 4/21/07, Muke Tever <muke@...> wrote:
>  
>> GerardM <gerard.meijssen@...> wrote:
>>    
>>>    - Can the Polish demand what they do?
>>>      
>> Absolutely.  You continue saying your bot is a "service" but a service
>> works for the people who need it and does what they want; it doesn't
>> (except
>> perhaps incidentally) work for the person providing it, doing what he
>> wants.
>>
>>    
>>>    - Is having a project that consists mainly of stubs acceptable?
>>>      
>> Stubs? Yes.  When I worked with the English Wikipedia it was mainly stubs.
>>
>> The Russian example, though, is more a project that has been pre-seeded
>> with
>> templates.  There is nothing wrong with this in itself--though it does
>> inflate
>> the page count--and we have already gone over the usefulness of knowing a
>> word
>> exists in a language.
>>
>>    
>>>    - Is having incorrect data acceptable?
>>>      
>> Isn't it the point of wiki that one has incorrect and incomplete data, but
>> that
>> one is building a community who will take the effort to improve it?  In
>> such
>> a case you would, rather than wanting to hide the links, make the
>> information
>> _more_ public so, say, Russian visitors curious to see how the Vietnamese
>> handle
>> their words can contribute to correcting the information.  (After
>> all--this problem,
>> was brought to your attention by vi.wikt regulars, or those following
>> interwiki
>> links to it?)
>>
>>
>>         *Muke!
>


_______________________________________________
Wiktionary-l mailing list
Wiktionary-l@...
http://lists.wikimedia.org/mailman/listinfo/wiktionary-l

Re: Wiktionary quality issues

by Rovdyr :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,
I would like to draw your attention to one issue. We should think about the
aim of the project. If we allow to create enormous amount of stubs it has a
lot of effects. I definitely agree that it helps to fill in quite difficult
Russian template, and that they have less work to do checking articles
crated by newcomers. However, it might influence the way the Russian
Wiktionary is perceived. It seems obvious to me that all projects aim at
being reliable sources. Users expect that a dictionary saying it has over
100 000 entries actually has them. If 8 out of 11 words they check at a time
are in completely empty articles they probably will not come back to the
dictionary as it is a waste of time for them. The same is with interwiki
links. People often want to compare articles. Russians write them really
fantastic, I like the kind of very specific and precise information they
give, but empty templates are really discouraging. I would even dare to say
that leading people to empty pages through interwiki is like not respecting
them. I write this all because I believe that Wiktionaries are created to
serve people and that they should be as ergonomic as possible.

While looking through posts, here and at Russian Wiktionary, connected with
the topic I came across the idea that templates show how much there is to do
and encourage people to fill them in. Well, if we knew how much not
registered users search through Wiktionaries and add any sort of information
with comparison to ones that do not we could say to what extend it is true,
but as we do not know (or we do?) it is safer to assume that a user rather
searches for information than is willing to share his knowledge.

Somebody also mentioned that templates may function as spell-checker,
articles may just inform you that a word exists as such. But who really
needs that considering the fact that everybody have Windows Word or
OppenOffice Writer with pretty good spell-checker that moreover suggest
correct spelling?

I liked the idea that bot would recognise a mark telling it is an empty
template and then would not link to it. Is it possible to do?

I am so against leaving lacunas, because we have great opportunity to use
Wikipedia's reputation of a very good source of information, and I am
worried that by doing such things as Russians do we may spoil it.


I wonder what do you think.
Helena Polyak / Rovdyr (PL)
_______________________________________________
Wiktionary-l mailing list
Wiktionary-l@...
http://lists.wikimedia.org/mailman/listinfo/wiktionary-l

Re: Wiktionary quality issues

by Ray Saintonge :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Gerard Meijssen wrote:

>Hoi,
>When my bot is blocked on one Wiktionary it does not work at all any
>more. The way it is configured is that it works on all projects at all
>times. This is completely different from how it works on the Wikipedia
>projects. It is also why one guy can run this as a service.
>Thanks,
>    Gerard
>
>the dave ross schreef:
>  
>
>>If your bot were blocked on a given wiki all that would happen is that your
>>bot could no longer edit their entries.  Your bot could still get data from
>>that wiki, and it could still write that data to all other wikis.  Sounds
>>like a painless control over the bot, and one that any wiki which doesn't
>>want that interwiki data should use.  How do you figure that either solution
>>will actually affect the process as a whole, anyway?
>>    
>>
Then reconfigure it so that its operation can be blocked on a project
that doesn't want it.  In the interest of autonomy of projects, a
particular Wiktionary should be able to block it the way it blocks any
other user.

Ec


_______________________________________________
Wiktionary-l mailing list
Wiktionary-l@...
http://lists.wikimedia.org/mailman/listinfo/wiktionary-l

Re: Wiktionary quality issues

by Gerard Meijssen-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hoi,
As there isno independence, this bot can function in the first place. I
cannot reconfigure the bot either. Also when a wiktionary is excluded, it
will be probably removed everywhere. This is imho NOT a good idea. The
Vietnamese have a current problem that will get fixed. Having all their
interwiki links removed is imho a BAD idea.
Thanks,
     GerardM

On 4/24/07, Ray Saintonge <saintonge@...> wrote:

>
> Gerard Meijssen wrote:
>
> >Hoi,
> >When my bot is blocked on one Wiktionary it does not work at all any
> >more. The way it is configured is that it works on all projects at all
> >times. This is completely different from how it works on the Wikipedia
> >projects. It is also why one guy can run this as a service.
> >Thanks,
> >    Gerard
> >
> >the dave ross schreef:
> >
> >
> >>If your bot were blocked on a given wiki all that would happen is that
> your
> >>bot could no longer edit their entries.  Your bot could still get data
> from
> >>that wiki, and it could still write that data to all other
> wikis.  Sounds
> >>like a painless control over the bot, and one that any wiki which
> doesn't
> >>want that interwiki data should use.  How do you figure that either
> solution
> >>will actually affect the process as a whole, anyway?
> >>
> >>
> Then reconfigure it so that its operation can be blocked on a project
> that doesn't want it.  In the interest of autonomy of projects, a
> particular Wiktionary should be able to block it the way it blocks any
> other user.
>
> Ec
>
>
> _______________________________________________
> Wiktionary-l mailing list
> Wiktionary-l@...
> http://lists.wikimedia.org/mailman/listinfo/wiktionary-l
>
_______________________________________________
Wiktionary-l mailing list
Wiktionary-l@...
http://lists.wikimedia.org/mailman/listinfo/wiktionary-l
LightInTheBox - Buy quality products at wholesale price!