[Mediawiki-i18n] [Wikitech-l] Now live: Shared structured data

mathieu stumpf guntz psychoslave at culture-libre.org
Fri Dec 30 14:26:45 UTC 2016


As to my mind it's a very interesting topic, I searched a bit more.

https://www.w3.org/International/articles/article-text-size.en
     which quotes 
http://www-01.ibm.com/software/globalization/guidelines/a3.html

According to which, for strings in English source that are over 70 
characters, you might expect an 130% average expansion. So, with an 
admittedly very loose inference,  the 400 character limit for all is 
equivalent to a 307 character limit for English. Would you say that it 
would seems ok to have a 307 character limit there?


Le 29/12/2016 à 12:11, mathieu stumpf guntz a écrit :
>
>
> Le 28/12/2016 à 23:08, Yuri Astrakhan a écrit :
>> The 400 chat limit is to be in sync with Wikidata, which has the same
>> limitation. The origins of this limit is to encourage storage of 
>> "values"
>> rather than full strings (sentences).
> Well, that's probably not the best constraints for a glossary then. To 
> my mind, 400 char limit regardless of the language is rather 
> suprising. Surely you can tell much more with a set of 400 ideograms 
> than with, well, whatever the language happen to have the longest 
> average sentence length (any idea?). Also, at least for some 
> translation pairs, there is a tendancy to have translations longer 
> than the original[1].
>
> [1] http://www.sid.ir/en/VEWSSID/J_pdf/53001320130303.pdf
>>   Also, it discourages storage of wiki
>> markup.
> What about disallowing it explicitly? You might even enforce that with 
> a quick parsing that prevent recording, or simply put a reminder when 
> detecting such a string to avoid blocking users in legitimate corner 
> cases.
>
>>
>> On Wed, Dec 28, 2016, 16:45 mathieu stumpf guntz <
>> psychoslave at culture-libre.org> wrote:
>>
>>> Thank you Yuri. Is there some rational explanation behind this 
>>> limits? I
>>> understand the limit over performance concern, and 2Mb seems already
>>> very large for intented glossaries. But 400 chars might be problematic
>>> for some definition I guess, especially since translations can lead to
>>> varying lenght needs.
>>>
>>>
>>> Le 25/12/2016 à 17:03, Yuri Astrakhan a écrit :
>>>> Hi Mathieu, yes, I think you can totally build up this glossary in a
>>>> dataset. Just remember that each string can be no longer then 400 
>>>> chars,
>>>> and total size under 2mb.
>>>>
>>>> On Sun, Dec 25, 2016, 10:45 mathieu stumpf guntz <
>>>> psychoslave at culture-libre.org> wrote:
>>>>
>>>>> Hi Yuri,
>>>>>
>>>>> Seems very interesting. Am I wrong thinking this could helpto create
>>>>> multi-lingual glossary as drafted in
>>>>> https://phabricator.wikimedia.org/T150263#2860014 ?
>>>>>
>>>>>
>>>>> Le 22/12/2016 à 20:30, Yuri Astrakhan a écrit :
>>>>>> Gift season! We have launched structured data on Commons, available
>>> from
>>>>>> all wikis.
>>>>>>
>>>>>> TLDR; One data store. Use everywhere. Upload table data to Commons,
>>> with
>>>>>> localization, and use it to create wiki tables, lists, or use 
>>>>>> directly
>>> in
>>>>>> graphs. Works for GeoJSON maps too. Must be licensed as CC0. Try 
>>>>>> this
>>>>>> per-state GDP map demo, and select multiple years. More demos at the
>>>>> bottom.
>>>>>> US Map state highlight
>>>>>> <https://en.wikipedia.org/wiki/Template:Graph:US_Map_state_highlight> 
>>>>>>
>>>>>>
>>>>>> Data can now be stored as *.tab and *.map pages in the data 
>>>>>> namespace
>>> on
>>>>>> Commons. That data may contain localization, so a table cell 
>>>>>> could be
>>> in
>>>>>> multiple languages. And that data is accessible from any wikis, 
>>>>>> by Lua
>>>>>> scripts, Graphs, and Maps.
>>>>>>
>>>>>> Lua lets you generate wiki tables from the data by filtering,
>>> converting,
>>>>>> mixing, and formatting the raw data. Lua also lets you generate 
>>>>>> lists.
>>> Or
>>>>>> any wiki markup.
>>>>>>
>>>>>> Graphs can use both .tab and .map directly to visualize the data and
>>> let
>>>>>> users interact with it. The GDP demo above uses a map from 
>>>>>> Commons, and
>>>>>> colors each segment with the data based on a data table.
>>>>>>
>>>>>> Kartographer (<maplink>/<mapframe>) can use the .map data as an 
>>>>>> extra
>>>>> layer
>>>>>> on top of the base map. This way we can show endangered species'
>>> habitat.
>>>>>> == Demo ==
>>>>>> * Raw data example
>>>>>> <https://commons.wikimedia.org/wiki/Data:Weather/New_York_City.tab>
>>>>>> * Interactive Weather data
>>>>>> <https://en.wikipedia.org/wiki/Template:Graph:Weather_monthly_history> 
>>>>>>
>>>>>> * Same data in Weather template
>>>>>> <https://en.wikipedia.org/wiki/User:Yurik/WeatherDemo>
>>>>>> * Interactive GDP map
>>>>>> <https://en.wikipedia.org/wiki/Template:Graph:US_Map_state_highlight> 
>>>>>>
>>>>>> * Endangered Jemez Mountains salamander - habitat
>>>>>> <https://en.wikipedia.org/wiki/Jemez_Mountains_salamander#/maplink/0> 
>>>>>>
>>>>>> * Population history
>>>>>> <https://en.wikipedia.org/wiki/Template:Graph:Population_history>
>>>>>> * Line chart <https://en.wikipedia.org/wiki/Template:Graph:Lines>
>>>>>>
>>>>>> == Getting started ==
>>>>>> * Try creating a page at data:Sandbox/<user>.tab on Commons. Don't
>>> forget
>>>>>> the .tab extension, or it won't work.
>>>>>> * Try using some data with the Line chart graph template
>>>>>> A thorough guide is needed, help is welcome!
>>>>>>
>>>>>> == Documentation links ==
>>>>>> * Tabular help <https://www.mediawiki.org/wiki/Help:Tabular_Data>
>>>>>> * Map help <https://www.mediawiki.org/wiki/Help:Map_Data>
>>>>>> If you find a bug, create Phabricator ticket with #tabular-data 
>>>>>> tag, or
>>>>>> comment on the documentation talk pages.
>>>>>>
>>>>>> == FAQ ==
>>>>>> * Relation to Wikidata:  Wikidata is about "facts" (small pieces of
>>>>>> information). Structured data is about "blobs" - large amounts of 
>>>>>> data
>>>>> like
>>>>>> the historical weather or the outline of the state of New York.
>>>>>>
>>>>>> == TODOs ==
>>>>>> * Add a nice "table editor" - editing JSON by hand is cruel. T134618
>>>>>> * "What links here" should track data usage across wikis. Will allow
>>>>>> quicker auto-refresh of the pages too. T153966
>>>>>> * Support data redirects. T153598
>>>>>> * Mega epic: Support external data feeds.
>>>>>> _______________________________________________
>>>>>> Wikitech-l mailing list
>>>>>> Wikitech-l at lists.wikimedia.org
>>>>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>>> _______________________________________________
>>>>> Wikitech-l mailing list
>>>>> Wikitech-l at lists.wikimedia.org
>>>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>> _______________________________________________
>>>> Wikitech-l mailing list
>>>> Wikitech-l at lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>> _______________________________________________
>>> Wikitech-l mailing list
>>> Wikitech-l at lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>> _______________________________________________
>> Wikitech-l mailing list
>> Wikitech-l at lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l




More information about the Mediawiki-i18n mailing list