Dear Pierpaolo,

This thread was only about Julian and Gregorian calendar dates. If and how other calendar models should be supported in some future is another (potentially big) discussion. As you said, there are many issues there. Let's first make sure that we handle the "easy" 99.9% of cases correctly before discussing any more complicated options.

Best regards,

Markus

On 01.07.2015 01:11, Pierpaolo Bernardi wrote:
Please also keep in mind that not all calendars set the start of day
at the same time.  This is not a problem if you only have Julian and
Gregorian, but it certainly is if you introduce other calendars.

Two events may happen in the same day in one calendar, and on two
different days in another calendar.

Also, there surely exist ancient calendars whose exact delta from
Gregorian is not known with certainty.

All dates must have their associated calendar, as converting between
calendars may be extremely difficult, or impossible.

P.



On Tue, Jun 30, 2015 at 11:32 PM, Markus Krötzsch
<mar...@semantic-mediawiki.org> wrote:
Hi everyone,

Thanks to Lydia and the team for containing this issue and providing the
necessary documentation for fixing it. For all of you who wonder what the
scale of the issue is (a.k.a. "How bad is it?"), here are some numbers.

The most important years for better understanding:

1582: Gregorian Calendar is introduced; some countries switch quickly
1753: most countries have made the switch (Sweden and UK quite late),
       only Greece and Russia continue to use Julian dates
1923: Greece finally switches to Gregorian, all countries switched

Only dates with day-level precision are affected (the shift between
calendars was never more than half a month). This said, the following
numbers (state 22 June 2015) should make some sense:

* Dates in Wikipedia overall: 12,549,662 (100%)
* Dates that are precise at least to the day: 11.216.635 (89%):
** Since 1923: 10,406,630 (83%)
** 1753-1922:     716,711 (5.7%)
** 1582-1752:      64,325 (0.5%)
** Until 1581:     28,969 (0.2%)

In other words, disregarding Greece and Russia, at most 0.7% of Wikidata
dates are affected. This still makes a notable number of over 93,000 dates,
though the 64,325 from 1582-1752 are an overestimate (many countries had
already switched).

It is not easy to say how many potentially Julian dates are among the 5.7%
that happened before Greece introduced Gregorian, but it seems likely that a
good majority did not occur in Russia or Greece.

These numbers were as expected for me. What I was more surprised by is the
rare use of "Julian calendar" as a calendar model in the data:

* Dates in Wikipedia overall: 12,549,662 (100%)
** Gregorian dates: 12,529,635 (99.8%)
** Julian dates:        20,027 (0.15%)

This means that not even the 0.2% that happened before the invention of
Gregorian Calendar are tagged as Julian. Now some dates before 1582 may
correctly use Gregorian calendar (e.g., no calendar model makes sense for
the beginning of the universe, so we might as well leave Gregorian there).
However, I would expect that basically all dates with day-precision before
1582 are from historic records and should therefore use Julian. So it seems
that there is some work to do there.

It might be good to find out where these many historic dates came from. When
entering dates in the UI, it will suggest Julian for these times, so it
seems unlikely that users have entered most of them. If they came through a
bot, it would be good to find out what the bot author was doing. If you
upload historic dates (maybe birth dates), they should come in Julian too.
As Lydia explained, there are two things that the bot author might have
thought:

(1) "I should use the calendar model setting to tell Wikidata which calendar
model my dates are in"
(2) "I should upload Gregorian dates and use the calendar model setting to
tell Wikidata which calendar model my dates should be displayed in"

In either case, the natural choice would be to use Julian. Why could a bot
author possibly have specified "Gregorian" for a date before 1582? A bot
author might convert Julian dates to Gregorian if (s)he would expect option
(2) to be correct (since this would require all dates to be sent in
Gregorian). But in this case, the bot would still set "Julian" as the
calendar model to use for display.

Whatever way I look at it, it seems likely that our historic dates need some
validation anyway. Maybe the calendar model confusion Lydia explained is not
the only issue here. And adding more references would also be very useful on
its own right.

Best regards,

Markus




On 30.06.2015 19:38, Lydia Pintscher wrote:

Hi everyone,

I have some bad news. We screwed up. I’m really sorry about this. I’d
really appreciate everyone’s help with fixing it.

TLDR: We have a bad mixup of calendar models for the dates in Wikidata
and we need to fix them.

==== What happened? ====
Wikidata dates have a calendar model. This can be Julian or Gregorian
and the plan is to support more in the future. There are two ways to
interpret this calendar model:
# the given date is in this calendar model
# the given date is Gregorian and this calendar model says if the date
should be displayed in Gregorian or Julian in the user interface

Unfortunately both among the developers as well as bot operators there
was confusion about which of those is to be used. This lead to
inconsistencies in the backend/frontend code as well as different bot
authors treating the calendar model differently. In addition the user
interface had problematic defaults. We now have a number of dates with
a potentially wrong calendar model. The biggest issue started when we
moved code from the frontend to the backend in Mid 2014 in order to
improve performance. Prior to the move, the user interface used to
make the conversion from one model to the other. After the move, the
conversion was not done anywhere anymore - but the calendar model was
still displayed. We made one part better but in the process broke
another part badly :(

==== What now? ====
* Going forward the date data value will be given in both the
normalized proleptic Gregorian calendar as well as in the calendar
model explicitly given (which currently supports, as said, proleptic
Gregorian and proleptic Julian).
* The user interface will again indicate which calendar model the date
is given in. We will improve documentation around this to make sure
there is no confusion from now on.
* We made a flowchart to help decide what the correct calendar model
for a date should be to help with the clean up.
* We are improving the user interface to make it easier to understand
what is going on and by default do the right thing.
* We are providing a list of dates that need to be checked and
potentially fixed.
* How are we making sure it doesn’t happen again?
* We are improving documentation around dates and will look for other
potential ambiguous concepts we have.

==== How can we fix it? ====
We have created a list of all dates that potentially need checking. We
can either provide this as a list on some wiki page or run a bot to
add “instance of: date needing calendar model check“ or something
similar as a qualifier to the respective dates. What do you prefer?
The list probably contains dates we can batch-change or approve but
we’d need your help with figuring out which those are.
We also created a flowchart that should help with making the decision
which calendar model to pick for a given date:

https://commons.wikimedia.org/wiki/File:Wikidata_Calendar_Model_Decision_Tree.svg

Thank you to everyone who helped us investigate and get to the bottom
of the issue. Sorry again this has happened and is causing work. I
feel miserable about this and if there is anything more we can do to
help with the cleanup please do let me know.


Let's please keep further discussion about this in one place on-wiki
at
https://www.wikidata.org/wiki/Wikidata:Project_chat#calendar_model_screwup


Cheers
Lydia



_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata



_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to