[Wikidata] Re: Timezone, before, and after fields in JSON dump

2022-01-10 Thread Mitar
Hi!

On Mon, Jan 10, 2022 at 4:50 PM Lydia Pintscher
 wrote:
> Thanks for checking. Do you have a few examples so we can have a closer look?

There are many. Few cases (many seems to about a reference with P813):

Q5198 P21 reference 1 snaks P813 0: timezone 60
Q5664 P20 reference 1 snaks P813 0: timezone 60
Q5721 P106 reference 0 snaks P813 0: timezone 60
Q5869 P194 reference 0 snaks P813 0: timezone -5
Q5826 P194 reference 0 snaks P813 0: timezone -5
Q5816 P106 reference 0 snaks P813 0: timezone 60
Q11618 P4632 reference 0 snaks P813 0: before 1
Q12018 P4632 reference 0 snaks P813 0: before 1
Q12773 P106 reference 0 snaks P813 0: timezone 60
Q12773 P106 reference 0 snaks P813 0: timezone 60
Q13283 P1999 reference 0 snaks P813 0: after 1
Q13293 P1999 reference 0 snaks P813 0: after 1
Q13307 P1999 reference 0 snaks P813 0: after 1
Q13334 P355 reference 0 snaks P813 0: before 1
Q13353 P2853 reference 0 snaks P813 0: timezone 120
Q13361 P1999 reference 0 snaks P813 0: after 1
Q14430 P106 reference 0 snaks P813 0: timezone 60
Q14524 P106 reference 1 snaks P813 0: timezone 60
Q15174 P194 reference 0 snaks P813 0: timezone -5
Q16019 P21 reference 0 snaks P813 0: timezone 60
Q16285 P106 reference 1 snaks P813 0: timezone 60
Q16285 P106 reference 1 snaks P813 0: timezone 60
Q16389 P106 reference 0 snaks P813 0: timezone 60
Q16403 P4632 reference 0 snaks P813 0: before 1
Q16572 P194 reference 0 snaks P813 0: timezone -5
Q16967 P194 reference 0 snaks P813 0: timezone -5
Q18809 P106 reference 0 snaks P813 0: timezone 60
Q18809 P106 reference 0 snaks P813 0: timezone 60
Q19214 P1001 reference 0 snaks P813 0: timezone -5
Q20456 P4632 reference 0 snaks P813 0: before 1
Q22432 P4632 reference 0 snaks P813 0: before 1

You cannot see it in web UI, but you can see them in JSON, e.g.:

https://www.wikidata.org/wiki/Special:EntityData/Q5198.json

Few, which are not related to P813 are:

Q28287 P2046 qualifier P585: timezone 1
Q38573 P166 qualifier P585: after 1
Q54764 P2046 qualifier P585: timezone 1
Q82986 P580: after 1

There are really many of them. I can produce the whole list if you need that.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
Wikidata mailing list -- wikidata@lists.wikimedia.org
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org


[Wikidata] Re: +0000-00-00T00:00:00Z in JSON dump

2022-01-10 Thread Mitar
Hi!

I took some time and went over all cases I found and all of them were
simply bad data. I suspect that most of them were added using some
automatic way which was passing this timestamp in when there was no
data. So I cleaned them up or fixed them (in few cases the right value
was "unknown" with a range, in some cases it was 1 BCE, but in most
cases I just removed the claim because not only that it is false, it
simply invalid, it is not even a valid timestamp).

You can see examples in my recent changes [1].

At this point I would ask more about how this got in (why it is not
denied at insertion time) and even more interesting: the web UI does
not show any warning about those values. For many other cases you get
various warnings about possibly invalid data, but not here. So maybe
adding a warning that if such a timestamp is a value, a warning should
be shown next to it, that would be great. Of course, even better would
be to prevent insertion (because in 99% it means somebody is blindly
inserting a default zero value).

[1] 
https://www.wikidata.org/w/index.php?title=Special:Contributions/Mitar&offset=&limit=500&target=Mitar


Mitar

On Mon, Jan 10, 2022 at 4:50 PM Lydia Pintscher
 wrote:
>
> Hey Mitar,
>
> Also here a few examples would help to better understand what's going on.
>
>
> Cheers
> Lydia
>
> On Sun, Jan 9, 2022 at 9:52 AM Mitar  wrote:
> >
> > Hi!
> >
> > I have been processing a recent Wikidata JSON dump. I have noticed
> > that some claims have +-00-00T00:00:00Z as the time value. My
> > understanding is that those are invalid values for time, at least
> > according to [1]. I think they can be safely removed, yes?
> >
> > [1] https://doc.wikimedia.org/Wikibase/master/php/md_docs_topics_json.html
> >
> >
> > Mitar
> >
> > --
> > http://mitar.tnode.com/
> > https://twitter.com/mitar_m
> > ___
> > Wikidata mailing list -- wikidata@lists.wikimedia.org
> > To unsubscribe send an email to wikidata-le...@lists.wikimedia.org
>
>
>
> --
> Lydia Pintscher - http://about.me/lydia.pintscher
> Product Manager for Wikidata
>
> Wikimedia Deutschland e.V.
> Tempelhofer Ufer 23-24
> 10963 Berlin
> www.wikimedia.de
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
> Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
> ___
> Wikidata mailing list -- wikidata@lists.wikimedia.org
> To unsubscribe send an email to wikidata-le...@lists.wikimedia.org



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
Wikidata mailing list -- wikidata@lists.wikimedia.org
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org


[Wikidata] +0000-00-00T00:00:00Z in JSON dump

2022-01-09 Thread Mitar
Hi!

I have been processing a recent Wikidata JSON dump. I have noticed
that some claims have +-00-00T00:00:00Z as the time value. My
understanding is that those are invalid values for time, at least
according to [1]. I think they can be safely removed, yes?

[1] https://doc.wikimedia.org/Wikibase/master/php/md_docs_topics_json.html


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
Wikidata mailing list -- wikidata@lists.wikimedia.org
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org


[Wikidata] Timezone, before, and after fields in JSON dump

2022-01-09 Thread Mitar
Hi!

I have been processing a recent Wikidata JSON dump. According to
documentation [1], time datavalue has timezone, before and after
fields, which are documented as currently not used. But I noticed that
in the dump some claims do have them set. What should be done about
them? Are they errors? Are they information? Can they be safely
ignored? Should those claims be updated in Wikidata to remove those
fields?

I can provide a list of those if anyone is interested.

[1] https://doc.wikimedia.org/Wikibase/master/php/md_docs_topics_json.html


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
Wikidata mailing list -- wikidata@lists.wikimedia.org
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org