Re: [Wikitech-l] Now live: Shared structured data

2016-12-22 Thread Brad Jorsch (Anomie)
On Thu, Dec 22, 2016 at 2:30 PM, Yuri Astrakhan 
wrote:

> Gift season! We have launched structured data on Commons, available from
> all wikis.
>

I was momentarily excited, then I read a little farther and discovered this
isn't about https://commons.wikimedia.org/wiki/Commons:Structured_data.


-- 
Brad Jorsch (Anomie)
Senior Software Engineer
Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Now live: Shared structured data

2016-12-22 Thread Yuri Astrakhan
Yes, there seem to have been a bit of a naming collision.  Tabular data and
map data have been jointly known as structured data, but there is also the
Structured Data project, which IMO should be called Structured Metadata
project :)  Naming suggestions are welcome!

P.S. Brad, I'm sorry tabular and map data did not excite you :(

On Thu, Dec 22, 2016 at 2:38 PM Brad Jorsch (Anomie) 
wrote:

> On Thu, Dec 22, 2016 at 2:30 PM, Yuri Astrakhan 
> wrote:
>
> > Gift season! We have launched structured data on Commons, available from
> > all wikis.
> >
>
> I was momentarily excited, then I read a little farther and discovered this
> isn't about https://commons.wikimedia.org/wiki/Commons:Structured_data.
>
>
> --
> Brad Jorsch (Anomie)
> Senior Software Engineer
> Wikimedia Foundation
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Now live: Shared structured data

2016-12-22 Thread David Cuenca Tudela
On Thu, Dec 22, 2016 at 8:38 PM, Brad Jorsch (Anomie)  wrote:

> On Thu, Dec 22, 2016 at 2:30 PM, Yuri Astrakhan 
> wrote:
>
> > Gift season! We have launched structured data on Commons, available from
> > all wikis.
> >
>
> I was momentarily excited, then I read a little farther and discovered this
> isn't about https://commons.wikimedia.org/wiki/Commons:Structured_data.
>

Same here, I think it needs a better name...

What about calling it datasets or structured datasets?

Cheers,
Micru
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Now live: Shared structured data

2016-12-22 Thread Yuri Astrakhan
Micru, thanks, I think Datasets sounds like a good name too!

On Thu, Dec 22, 2016 at 2:44 PM David Cuenca Tudela 
wrote:

> On Thu, Dec 22, 2016 at 8:38 PM, Brad Jorsch (Anomie) <
> bjor...@wikimedia.org
> > wrote:
>
> > On Thu, Dec 22, 2016 at 2:30 PM, Yuri Astrakhan <
> yastrak...@wikimedia.org>
> > wrote:
> >
> > > Gift season! We have launched structured data on Commons, available
> from
> > > all wikis.
> > >
> >
> > I was momentarily excited, then I read a little farther and discovered
> this
> > isn't about https://commons.wikimedia.org/wiki/Commons:Structured_data.
> >
>
> Same here, I think it needs a better name...
>
> What about calling it datasets or structured datasets?
>
> Cheers,
> Micru
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Now live: Shared structured data

2016-12-22 Thread David Cuenca Tudela
Anyway, this is great news! I hope that it gets adopted by the community.
Congratulations, Yuri!

I was going to suggest a Wikidata property, but I see that the data type
for datasets is not there yet:
https://phabricator.wikimedia.org/T151334

On Thu, Dec 22, 2016 at 8:48 PM, Yuri Astrakhan 
wrote:

> Micru, thanks, I think Datasets sounds like a good name too!
>
> On Thu, Dec 22, 2016 at 2:44 PM David Cuenca Tudela 
> wrote:
>
> > On Thu, Dec 22, 2016 at 8:38 PM, Brad Jorsch (Anomie) <
> > bjor...@wikimedia.org
> > > wrote:
> >
> > > On Thu, Dec 22, 2016 at 2:30 PM, Yuri Astrakhan <
> > yastrak...@wikimedia.org>
> > > wrote:
> > >
> > > > Gift season! We have launched structured data on Commons, available
> > from
> > > > all wikis.
> > > >
> > >
> > > I was momentarily excited, then I read a little farther and discovered
> > this
> > > isn't about https://commons.wikimedia.org/wiki/Commons:Structured_data
> .
> > >
> >
> > Same here, I think it needs a better name...
> >
> > What about calling it datasets or structured datasets?
> >
> > Cheers,
> > Micru
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



-- 
Etiamsi omnes, ego non
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Now live: Shared structured data

2016-12-25 Thread mathieu stumpf guntz

Hi Yuri,

Seems very interesting. Am I wrong thinking this could helpto create 
multi-lingual glossary as drafted in 
https://phabricator.wikimedia.org/T150263#2860014 ?



Le 22/12/2016 à 20:30, Yuri Astrakhan a écrit :

Gift season! We have launched structured data on Commons, available from
all wikis.

TLDR; One data store. Use everywhere. Upload table data to Commons, with
localization, and use it to create wiki tables, lists, or use directly in
graphs. Works for GeoJSON maps too. Must be licensed as CC0. Try this
per-state GDP map demo, and select multiple years. More demos at the bottom.
US Map state highlight


Data can now be stored as *.tab and *.map pages in the data namespace on
Commons. That data may contain localization, so a table cell could be in
multiple languages. And that data is accessible from any wikis, by Lua
scripts, Graphs, and Maps.

Lua lets you generate wiki tables from the data by filtering, converting,
mixing, and formatting the raw data. Lua also lets you generate lists. Or
any wiki markup.

Graphs can use both .tab and .map directly to visualize the data and let
users interact with it. The GDP demo above uses a map from Commons, and
colors each segment with the data based on a data table.

Kartographer (/) can use the .map data as an extra layer
on top of the base map. This way we can show endangered species' habitat.

== Demo ==
* Raw data example

* Interactive Weather data

* Same data in Weather template

* Interactive GDP map

* Endangered Jemez Mountains salamander - habitat

* Population history

* Line chart 

== Getting started ==
* Try creating a page at data:Sandbox/.tab on Commons. Don't forget
the .tab extension, or it won't work.
* Try using some data with the Line chart graph template
A thorough guide is needed, help is welcome!

== Documentation links ==
* Tabular help 
* Map help 
If you find a bug, create Phabricator ticket with #tabular-data tag, or
comment on the documentation talk pages.

== FAQ ==
* Relation to Wikidata:  Wikidata is about "facts" (small pieces of
information). Structured data is about "blobs" - large amounts of data like
the historical weather or the outline of the state of New York.

== TODOs ==
* Add a nice "table editor" - editing JSON by hand is cruel. T134618
* "What links here" should track data usage across wikis. Will allow
quicker auto-refresh of the pages too. T153966
* Support data redirects. T153598
* Mega epic: Support external data feeds.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Now live: Shared structured data

2016-12-25 Thread Yuri Astrakhan
Hi Mathieu, yes, I think you can totally build up this glossary in a
dataset. Just remember that each string can be no longer then 400 chars,
and total size under 2mb.

On Sun, Dec 25, 2016, 10:45 mathieu stumpf guntz <
psychosl...@culture-libre.org> wrote:

> Hi Yuri,
>
> Seems very interesting. Am I wrong thinking this could helpto create
> multi-lingual glossary as drafted in
> https://phabricator.wikimedia.org/T150263#2860014 ?
>
>
> Le 22/12/2016 à 20:30, Yuri Astrakhan a écrit :
> > Gift season! We have launched structured data on Commons, available from
> > all wikis.
> >
> > TLDR; One data store. Use everywhere. Upload table data to Commons, with
> > localization, and use it to create wiki tables, lists, or use directly in
> > graphs. Works for GeoJSON maps too. Must be licensed as CC0. Try this
> > per-state GDP map demo, and select multiple years. More demos at the
> bottom.
> > US Map state highlight
> > 
> >
> > Data can now be stored as *.tab and *.map pages in the data namespace on
> > Commons. That data may contain localization, so a table cell could be in
> > multiple languages. And that data is accessible from any wikis, by Lua
> > scripts, Graphs, and Maps.
> >
> > Lua lets you generate wiki tables from the data by filtering, converting,
> > mixing, and formatting the raw data. Lua also lets you generate lists. Or
> > any wiki markup.
> >
> > Graphs can use both .tab and .map directly to visualize the data and let
> > users interact with it. The GDP demo above uses a map from Commons, and
> > colors each segment with the data based on a data table.
> >
> > Kartographer (/) can use the .map data as an extra
> layer
> > on top of the base map. This way we can show endangered species' habitat.
> >
> > == Demo ==
> > * Raw data example
> > 
> > * Interactive Weather data
> > 
> > * Same data in Weather template
> > 
> > * Interactive GDP map
> > 
> > * Endangered Jemez Mountains salamander - habitat
> > 
> > * Population history
> > 
> > * Line chart 
> >
> > == Getting started ==
> > * Try creating a page at data:Sandbox/.tab on Commons. Don't forget
> > the .tab extension, or it won't work.
> > * Try using some data with the Line chart graph template
> > A thorough guide is needed, help is welcome!
> >
> > == Documentation links ==
> > * Tabular help 
> > * Map help 
> > If you find a bug, create Phabricator ticket with #tabular-data tag, or
> > comment on the documentation talk pages.
> >
> > == FAQ ==
> > * Relation to Wikidata:  Wikidata is about "facts" (small pieces of
> > information). Structured data is about "blobs" - large amounts of data
> like
> > the historical weather or the outline of the state of New York.
> >
> > == TODOs ==
> > * Add a nice "table editor" - editing JSON by hand is cruel. T134618
> > * "What links here" should track data usage across wikis. Will allow
> > quicker auto-refresh of the pages too. T153966
> > * Support data redirects. T153598
> > * Mega epic: Support external data feeds.
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Now live: Shared structured data

2016-12-28 Thread mathieu stumpf guntz
Thank you Yuri. Is there some rational explanation behind this limits? I 
understand the limit over performance concern, and 2Mb seems already 
very large for intented glossaries. But 400 chars might be problematic 
for some definition I guess, especially since translations can lead to 
varying lenght needs.



Le 25/12/2016 à 17:03, Yuri Astrakhan a écrit :

Hi Mathieu, yes, I think you can totally build up this glossary in a
dataset. Just remember that each string can be no longer then 400 chars,
and total size under 2mb.

On Sun, Dec 25, 2016, 10:45 mathieu stumpf guntz <
psychosl...@culture-libre.org> wrote:


Hi Yuri,

Seems very interesting. Am I wrong thinking this could helpto create
multi-lingual glossary as drafted in
https://phabricator.wikimedia.org/T150263#2860014 ?


Le 22/12/2016 à 20:30, Yuri Astrakhan a écrit :

Gift season! We have launched structured data on Commons, available from
all wikis.

TLDR; One data store. Use everywhere. Upload table data to Commons, with
localization, and use it to create wiki tables, lists, or use directly in
graphs. Works for GeoJSON maps too. Must be licensed as CC0. Try this
per-state GDP map demo, and select multiple years. More demos at the

bottom.

US Map state highlight


Data can now be stored as *.tab and *.map pages in the data namespace on
Commons. That data may contain localization, so a table cell could be in
multiple languages. And that data is accessible from any wikis, by Lua
scripts, Graphs, and Maps.

Lua lets you generate wiki tables from the data by filtering, converting,
mixing, and formatting the raw data. Lua also lets you generate lists. Or
any wiki markup.

Graphs can use both .tab and .map directly to visualize the data and let
users interact with it. The GDP demo above uses a map from Commons, and
colors each segment with the data based on a data table.

Kartographer (/) can use the .map data as an extra

layer

on top of the base map. This way we can show endangered species' habitat.

== Demo ==
* Raw data example

* Interactive Weather data

* Same data in Weather template

* Interactive GDP map

* Endangered Jemez Mountains salamander - habitat

* Population history

* Line chart 

== Getting started ==
* Try creating a page at data:Sandbox/.tab on Commons. Don't forget
the .tab extension, or it won't work.
* Try using some data with the Line chart graph template
A thorough guide is needed, help is welcome!

== Documentation links ==
* Tabular help 
* Map help 
If you find a bug, create Phabricator ticket with #tabular-data tag, or
comment on the documentation talk pages.

== FAQ ==
* Relation to Wikidata:  Wikidata is about "facts" (small pieces of
information). Structured data is about "blobs" - large amounts of data

like

the historical weather or the outline of the state of New York.

== TODOs ==
* Add a nice "table editor" - editing JSON by hand is cruel. T134618
* "What links here" should track data usage across wikis. Will allow
quicker auto-refresh of the pages too. T153966
* Support data redirects. T153598
* Mega epic: Support external data feeds.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Now live: Shared structured data

2016-12-28 Thread Yuri Astrakhan
The 400 chat limit is to be in sync with Wikidata, which has the same
limitation. The origins of this limit is to encourage storage of "values"
rather than full strings (sentences). Also, it discourages storage of wiki
markup.

On Wed, Dec 28, 2016, 16:45 mathieu stumpf guntz <
psychosl...@culture-libre.org> wrote:

> Thank you Yuri. Is there some rational explanation behind this limits? I
> understand the limit over performance concern, and 2Mb seems already
> very large for intented glossaries. But 400 chars might be problematic
> for some definition I guess, especially since translations can lead to
> varying lenght needs.
>
>
> Le 25/12/2016 à 17:03, Yuri Astrakhan a écrit :
> > Hi Mathieu, yes, I think you can totally build up this glossary in a
> > dataset. Just remember that each string can be no longer then 400 chars,
> > and total size under 2mb.
> >
> > On Sun, Dec 25, 2016, 10:45 mathieu stumpf guntz <
> > psychosl...@culture-libre.org> wrote:
> >
> >> Hi Yuri,
> >>
> >> Seems very interesting. Am I wrong thinking this could helpto create
> >> multi-lingual glossary as drafted in
> >> https://phabricator.wikimedia.org/T150263#2860014 ?
> >>
> >>
> >> Le 22/12/2016 à 20:30, Yuri Astrakhan a écrit :
> >>> Gift season! We have launched structured data on Commons, available
> from
> >>> all wikis.
> >>>
> >>> TLDR; One data store. Use everywhere. Upload table data to Commons,
> with
> >>> localization, and use it to create wiki tables, lists, or use directly
> in
> >>> graphs. Works for GeoJSON maps too. Must be licensed as CC0. Try this
> >>> per-state GDP map demo, and select multiple years. More demos at the
> >> bottom.
> >>> US Map state highlight
> >>> 
> >>>
> >>> Data can now be stored as *.tab and *.map pages in the data namespace
> on
> >>> Commons. That data may contain localization, so a table cell could be
> in
> >>> multiple languages. And that data is accessible from any wikis, by Lua
> >>> scripts, Graphs, and Maps.
> >>>
> >>> Lua lets you generate wiki tables from the data by filtering,
> converting,
> >>> mixing, and formatting the raw data. Lua also lets you generate lists.
> Or
> >>> any wiki markup.
> >>>
> >>> Graphs can use both .tab and .map directly to visualize the data and
> let
> >>> users interact with it. The GDP demo above uses a map from Commons, and
> >>> colors each segment with the data based on a data table.
> >>>
> >>> Kartographer (/) can use the .map data as an extra
> >> layer
> >>> on top of the base map. This way we can show endangered species'
> habitat.
> >>>
> >>> == Demo ==
> >>> * Raw data example
> >>> 
> >>> * Interactive Weather data
> >>> 
> >>> * Same data in Weather template
> >>> 
> >>> * Interactive GDP map
> >>> 
> >>> * Endangered Jemez Mountains salamander - habitat
> >>> 
> >>> * Population history
> >>> 
> >>> * Line chart 
> >>>
> >>> == Getting started ==
> >>> * Try creating a page at data:Sandbox/.tab on Commons. Don't
> forget
> >>> the .tab extension, or it won't work.
> >>> * Try using some data with the Line chart graph template
> >>> A thorough guide is needed, help is welcome!
> >>>
> >>> == Documentation links ==
> >>> * Tabular help 
> >>> * Map help 
> >>> If you find a bug, create Phabricator ticket with #tabular-data tag, or
> >>> comment on the documentation talk pages.
> >>>
> >>> == FAQ ==
> >>> * Relation to Wikidata:  Wikidata is about "facts" (small pieces of
> >>> information). Structured data is about "blobs" - large amounts of data
> >> like
> >>> the historical weather or the outline of the state of New York.
> >>>
> >>> == TODOs ==
> >>> * Add a nice "table editor" - editing JSON by hand is cruel. T134618
> >>> * "What links here" should track data usage across wikis. Will allow
> >>> quicker auto-refresh of the pages too. T153966
> >>> * Support data redirects. T153598
> >>> * Mega epic: Support external data feeds.
> >>> ___
> >>> Wikitech-l mailing list
> >>> Wikitech-l@lists.wikimedia.org
> >>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >> ___
> >> Wikitech-l mailing list
> >> Wikitech-l@lists.wikimedia.org
> >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.or

Re: [Wikitech-l] Now live: Shared structured data

2016-12-29 Thread mathieu stumpf guntz



Le 28/12/2016 à 23:08, Yuri Astrakhan a écrit :

The 400 chat limit is to be in sync with Wikidata, which has the same
limitation. The origins of this limit is to encourage storage of "values"
rather than full strings (sentences).
Well, that's probably not the best constraints for a glossary then. To 
my mind, 400 char limit regardless of the language is rather suprising. 
Surely you can tell much more with a set of 400 ideograms than with, 
well, whatever the language happen to have the longest average sentence 
length (any idea?). Also, at least for some translation pairs, there is 
a tendancy to have translations longer than the original[1].


[1] http://www.sid.ir/en/VEWSSID/J_pdf/53001320130303.pdf

  Also, it discourages storage of wiki
markup.
What about disallowing it explicitly? You might even enforce that with a 
quick parsing that prevent recording, or simply put a reminder when 
detecting such a string to avoid blocking users in legitimate corner cases.




On Wed, Dec 28, 2016, 16:45 mathieu stumpf guntz <
psychosl...@culture-libre.org> wrote:


Thank you Yuri. Is there some rational explanation behind this limits? I
understand the limit over performance concern, and 2Mb seems already
very large for intented glossaries. But 400 chars might be problematic
for some definition I guess, especially since translations can lead to
varying lenght needs.


Le 25/12/2016 à 17:03, Yuri Astrakhan a écrit :

Hi Mathieu, yes, I think you can totally build up this glossary in a
dataset. Just remember that each string can be no longer then 400 chars,
and total size under 2mb.

On Sun, Dec 25, 2016, 10:45 mathieu stumpf guntz <
psychosl...@culture-libre.org> wrote:


Hi Yuri,

Seems very interesting. Am I wrong thinking this could helpto create
multi-lingual glossary as drafted in
https://phabricator.wikimedia.org/T150263#2860014 ?


Le 22/12/2016 à 20:30, Yuri Astrakhan a écrit :

Gift season! We have launched structured data on Commons, available

from

all wikis.

TLDR; One data store. Use everywhere. Upload table data to Commons,

with

localization, and use it to create wiki tables, lists, or use directly

in

graphs. Works for GeoJSON maps too. Must be licensed as CC0. Try this
per-state GDP map demo, and select multiple years. More demos at the

bottom.

US Map state highlight


Data can now be stored as *.tab and *.map pages in the data namespace

on

Commons. That data may contain localization, so a table cell could be

in

multiple languages. And that data is accessible from any wikis, by Lua
scripts, Graphs, and Maps.

Lua lets you generate wiki tables from the data by filtering,

converting,

mixing, and formatting the raw data. Lua also lets you generate lists.

Or

any wiki markup.

Graphs can use both .tab and .map directly to visualize the data and

let

users interact with it. The GDP demo above uses a map from Commons, and
colors each segment with the data based on a data table.

Kartographer (/) can use the .map data as an extra

layer

on top of the base map. This way we can show endangered species'

habitat.

== Demo ==
* Raw data example

* Interactive Weather data

* Same data in Weather template

* Interactive GDP map

* Endangered Jemez Mountains salamander - habitat

* Population history

* Line chart 

== Getting started ==
* Try creating a page at data:Sandbox/.tab on Commons. Don't

forget

the .tab extension, or it won't work.
* Try using some data with the Line chart graph template
A thorough guide is needed, help is welcome!

== Documentation links ==
* Tabular help 
* Map help 
If you find a bug, create Phabricator ticket with #tabular-data tag, or
comment on the documentation talk pages.

== FAQ ==
* Relation to Wikidata:  Wikidata is about "facts" (small pieces of
information). Structured data is about "blobs" - large amounts of data

like

the historical weather or the outline of the state of New York.

== TODOs ==
* Add a nice "table editor" - editing JSON by hand is cruel. T134618
* "What links here" should track data usage across wikis. Will allow
quicker auto-refresh of the pages too. T153966
* Support data redirects. T153598
* Mega epic: Support external data feeds.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_

Re: [Wikitech-l] Now live: Shared structured data

2016-12-30 Thread mathieu stumpf guntz


As to my mind it's a very interesting topic, I searched a bit more.

https://www.w3.org/International/articles/article-text-size.en
which quotes 
http://www-01.ibm.com/software/globalization/guidelines/a3.html


According to which, for strings in English source that are over 70 
characters, you might expect an 130% average expansion. So, with an 
admittedly very loose inference,  the 400 character limit for all is 
equivalent to a 307 character limit for English. Would you say that it 
would seems ok to have a 307 character limit there?



Le 29/12/2016 à 12:11, mathieu stumpf guntz a écrit :



Le 28/12/2016 à 23:08, Yuri Astrakhan a écrit :

The 400 chat limit is to be in sync with Wikidata, which has the same
limitation. The origins of this limit is to encourage storage of 
"values"

rather than full strings (sentences).
Well, that's probably not the best constraints for a glossary then. To 
my mind, 400 char limit regardless of the language is rather 
suprising. Surely you can tell much more with a set of 400 ideograms 
than with, well, whatever the language happen to have the longest 
average sentence length (any idea?). Also, at least for some 
translation pairs, there is a tendancy to have translations longer 
than the original[1].


[1] http://www.sid.ir/en/VEWSSID/J_pdf/53001320130303.pdf

  Also, it discourages storage of wiki
markup.
What about disallowing it explicitly? You might even enforce that with 
a quick parsing that prevent recording, or simply put a reminder when 
detecting such a string to avoid blocking users in legitimate corner 
cases.




On Wed, Dec 28, 2016, 16:45 mathieu stumpf guntz <
psychosl...@culture-libre.org> wrote:

Thank you Yuri. Is there some rational explanation behind this 
limits? I

understand the limit over performance concern, and 2Mb seems already
very large for intented glossaries. But 400 chars might be problematic
for some definition I guess, especially since translations can lead to
varying lenght needs.


Le 25/12/2016 à 17:03, Yuri Astrakhan a écrit :

Hi Mathieu, yes, I think you can totally build up this glossary in a
dataset. Just remember that each string can be no longer then 400 
chars,

and total size under 2mb.

On Sun, Dec 25, 2016, 10:45 mathieu stumpf guntz <
psychosl...@culture-libre.org> wrote:


Hi Yuri,

Seems very interesting. Am I wrong thinking this could helpto create
multi-lingual glossary as drafted in
https://phabricator.wikimedia.org/T150263#2860014 ?


Le 22/12/2016 à 20:30, Yuri Astrakhan a écrit :

Gift season! We have launched structured data on Commons, available

from

all wikis.

TLDR; One data store. Use everywhere. Upload table data to Commons,

with
localization, and use it to create wiki tables, lists, or use 
directly

in
graphs. Works for GeoJSON maps too. Must be licensed as CC0. Try 
this

per-state GDP map demo, and select multiple years. More demos at the

bottom.

US Map state highlight
 



Data can now be stored as *.tab and *.map pages in the data 
namespace

on
Commons. That data may contain localization, so a table cell 
could be

in
multiple languages. And that data is accessible from any wikis, 
by Lua

scripts, Graphs, and Maps.

Lua lets you generate wiki tables from the data by filtering,

converting,
mixing, and formatting the raw data. Lua also lets you generate 
lists.

Or

any wiki markup.

Graphs can use both .tab and .map directly to visualize the data and

let
users interact with it. The GDP demo above uses a map from 
Commons, and

colors each segment with the data based on a data table.

Kartographer (/) can use the .map data as an 
extra

layer

on top of the base map. This way we can show endangered species'

habitat.

== Demo ==
* Raw data example

* Interactive Weather data
 


* Same data in Weather template

* Interactive GDP map
 


* Endangered Jemez Mountains salamander - habitat
 


* Population history

* Line chart 

== Getting started ==
* Try creating a page at data:Sandbox/.tab on Commons. Don't

forget

the .tab extension, or it won't work.
* Try using some data with the Line chart graph template
A thorough guide is needed, help is welcome!

== Documentation links ==
* Tabular help 
* Map help 
If you find a bug, create Phabricator ticket with #tabular-data 
tag, or

comment on the documentation talk pages.

== FAQ ==
* Relation to Wikidata:  Wikidata is about "facts" (small