« Return to Thread: Wiktionary size, format, long tail of languages

Wiktionary size, format, long tail of languages

by Lars Aronsson :: Rate this Message:

Reply to Author | View in Thread


There's a list of Wiktionaries by raw size at
http://meta.wikimedia.org/wiki/Wiktionary#List_of_Wiktionaries

Do all Wiktionaries follow the same format, with one wiki article
per word, containing sections for language / part of speech /
aspects and then numbered lists for meanings?  E.g.

[[Snow]]
==English==
===Noun===
# The frozen, crystalline state of water
# A shade of white
# Random electrical noise
====Derived terms====
====Translations====
===Verb===
# Weather when snow is falling
# Bluff draw in poker
====Derived terms====
====Translations====

Or is there any Wiktionary that breaks this pattern?  Does this
pattern have a name?  What do you call it when/if some Wiktionary
breaks this pattern?

How did we end up with disambiguation pages on Wikipedia, strictly
keeping one page per meaning of a word, but not on Wiktionary?  
Is that because Wiktionary spun off before disambiguation pages
were invented on Wikipedia, and the news never spread to
Wiktionary?  Or is it because the Oxford English Dictionary
differs from Encyclopaedia Britannica in this respect, and we want
to keep the best practice?  Or why?  One could say that all
meanings of "snow" are the same word (by etymology), and should
logically be in one page.  But this is not true of "pen"
(etymology 1--4) and the keeping of foreign words of similar
spelling in the same page (Norwegian "pen" meaning "fine").  Has
there been a discussion about this, and where can that be found? I
found something from December 2002,
http://en.wiktionary.org/wiki/Wiktionary_talk:Entry_layout_explained/archive_2002 
But the voice of reason, Imran, left the project a year later.
Another discussion took place in December 2005,
http://en.wiktionary.org/wiki/Wiktionary:Beer_parlour_archive/October-December_05#Basic_flaw_in_Wiktionary--What_is_a_.27word.27.3F.3F 
(It appears to be a December issue, so I apologize for bringing it
up a few weeks early this year.)

In the English Wiktionary, what percentage of words are in
English?  And is the "long tail" of foreign languages similar over
all Wiktionaries?  Is there any major Wiktionary that has a higher
concentration of words in the own language?

If the above pattern holds, a simple count of all level-2 headings
from the database dump could give the answer.  For example, in the
dump of the Swedish Wiktionary, having 46500 articles and being
the 13th biggest, these level-2 headings appear most frequently:

   2510 ==Svenska==              Swedish
   1847 ==Tvärspråkligt==        Translingual
    625 ==Engelska==             English
    343 ==Historik==             Etymology
    267 ==Tyska==                German
    245 ==Danska==               Danish
    230 ==Norska==               Norwegian
    217 ==Spanska==              Spanish
    217 ==Franska==              French
    192 ==Italienska==           Italian
    184 ==Nederländska==         Dutch
    169 ==Finska==               Finnish
    152 ==Polska==               Polish
    135 ==Serbiska==             Serbian
    122 ==Rumänska==             Romanian
    116 ==Interlingua==          Interlingua
    109 ==Ungerska==             Hungarian




--
  Lars Aronsson (lars@...)
  Aronsson Datateknik - http://aronsson.se


_______________________________________________
Wiktionary-l mailing list
Wiktionary-l@...
http://lists.wikimedia.org/mailman/listinfo/wiktionary-l

 « Return to Thread: Wiktionary size, format, long tail of languages

LightInTheBox - Buy quality products at wholesale price!