For "casual matching", try the game mode:
https://tools.wmflabs.org/mix-n-match/#/random/473

On Mon, Jun 19, 2017 at 10:16 AM Osma Suominen <osma.suomi...@helsinki.fi>
wrote:

> Hi Magnus!
>
> It's even higher now - 45%. Thanks a lot! This helps a lot with the
> verifying.
>
> Also matching of names with parenthetical qualifiers works better now. I
> see that "Ala-Malmi (Helsinki)" was automatched to "Ala-Malmi". However,
> "Ahjo (Kerava)" was not matched to "Ahjo (Kerava)" (Q11849902) but to
> Q1368573 (which is "Ahjo" in Finnish but means a type of metalworking
> workshop, not a specific place). Neither Wikidata entity has a type
> statement, the latter has "subclass-of <workshop>" statement.
>
> In any case, I think this is now good enough for serious work, so we
> will start verifying the suggested matches. 2.5% (173) already done...
>
> -Osma
>
>
> Magnus Manske kirjoitti 19.06.2017 klo 12:02:
> > I fiddled with it a bit, now 35% automatched.
> >
> > Will try some more, but there are some sanity constraints on the
> > matching. If it finds more than one match for the name, it does not set
> > any match, because random matches on the same name were annoying in the
> > past. There is also a type constraint, which might skip some Wikidata
> > items without appropriate instance/subclass.
> >
> > On Mon, Jun 19, 2017 at 8:09 AM Osma Suominen <osma.suomi...@helsinki.fi
> > <mailto:osma.suomi...@helsinki.fi>> wrote:
> >
> >     Hi Magnus, all,
> >
> >     I've been looking a bit closer at the YSO places catalog [1] in
> >     Mix'n'match and I'm wondering why only 20% of the places were
> >     automatically matched.
> >
> >     For example, Nepal (http://www.yso.fi/onto/yso/p107682) was
> >     automatically matched to Nepal (Q837).
> >
> >     But:
> >
> >     Accra (http://www.yso.fi/onto/yso/p138653) was not matched to Accra
> >     (Q3761).
> >
> >     Aceh (http://www.yso.fi/onto/yso/p147889) was not matched to Aceh
> >     (Q1823).
> >
> >     Akkunusjoki (http://www.yso.fi/onto/yso/p109251) was not matched to
> >     Akkunusjoki (Q12253027).
> >
> >     There are many more cases like this. So the precision of the
> automatic
> >     matching seems good (all but one were correct so far), but the
> recall is
> >     rather low, and even in cases where the label is identical a match
> has
> >     not been suggested. Is there anything that could be done about this?
> >
> >
> >     Somewhat related to this, it seems that none of the places with
> >     parenthetical qualifiers in their names were matched. For example
> "Ahjo
> >     (Kerava)" could have been matched to Q11849902 (which has a Finnish
> >     label that is identical) and "Ala-Malmi (Helsinki)" could have been
> >     matched to Q2829441 ("Ala-Malmi"). Since almost 60% of the place
> names
> >     include parenthetical qualifiers - to make them unique despite
> different
> >     places having identical names - this means that a lot of potential
> >     matches are missing. Could something be done to improve the
> situation?
> >
> >
> >     If Mix'n'match is incapable of automatically matching cases like
> this,
> >     would it help if I did an automatic matching externally using some
> other
> >     tool, and then gave the potential matches as e.g. a CSV file that
> could
> >     then be imported into Mix'n'match so that they can be verified there?
> >
> >     -Osma
> >
> >     [1] https://tools.wmflabs.org/mix-n-match/#/catalog/473
> >
> >
> >     Osma Suominen kirjoitti 17.06.2017 klo 13:13:
> >      > Hi Magnus,
> >      >
> >      > Thanks a lot, that was fast! And the results look very good!
> >      >
> >      > I confirmed a couple dozen automated mapping and fixed an
> >     incorrect one
> >      > ("Amerikka" was matched to USA, but I changed it to "Americas").
> >     Then I
> >      > started hitting rate limit errors. I guess it would be possible
> >     to avoid
> >      > those with some extra permissions?
> >      >
> >      > About 20% of the places were automatically matched. Probably most
> >     of the
> >      > remaining ones - around 5000 - do not exist in Wikidata because
> >     they are
> >      > e.g. towns and villages in Finland. Would it be fair game to
> >     create all
> >      > of them in Wikidata?
> >      >
> >      > -Osma
> >      >
> >
> >     --
> >     Osma Suominen
> >     D.Sc. (Tech), Information Systems Specialist
> >     National Library of Finland
> >     P.O. Box 26 (Kaikukatu 4)
> >     00014 HELSINGIN YLIOPISTO
> >     Tel. +358 50 3199529 <+358%2050%203199529> <tel:+358%2050%203199529>
> >     osma.suomi...@helsinki.fi <mailto:osma.suomi...@helsinki.fi>
> >     http://www.nationallibrary.fi
> >
> >     _______________________________________________
> >     Wikidata mailing list
> >     Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
> >     https://lists.wikimedia.org/mailman/listinfo/wikidata
> >
> >
> >
> > _______________________________________________
> > Wikidata mailing list
> > Wikidata@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikidata
> >
>
>
> --
> Osma Suominen
> D.Sc. (Tech), Information Systems Specialist
> National Library of Finland
> P.O. Box 26 (Kaikukatu 4)
> 00014 HELSINGIN YLIOPISTO
> Tel. +358 50 3199529 <+358%2050%203199529>
> osma.suomi...@helsinki.fi
> http://www.nationallibrary.fi
>
> _______________________________________________
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to