For "casual matching", try the game mode: https://tools.wmflabs.org/mix-n-match/#/random/473
On Mon, Jun 19, 2017 at 10:16 AM Osma Suominen <osma.suomi...@helsinki.fi> wrote: > Hi Magnus! > > It's even higher now - 45%. Thanks a lot! This helps a lot with the > verifying. > > Also matching of names with parenthetical qualifiers works better now. I > see that "Ala-Malmi (Helsinki)" was automatched to "Ala-Malmi". However, > "Ahjo (Kerava)" was not matched to "Ahjo (Kerava)" (Q11849902) but to > Q1368573 (which is "Ahjo" in Finnish but means a type of metalworking > workshop, not a specific place). Neither Wikidata entity has a type > statement, the latter has "subclass-of <workshop>" statement. > > In any case, I think this is now good enough for serious work, so we > will start verifying the suggested matches. 2.5% (173) already done... > > -Osma > > > Magnus Manske kirjoitti 19.06.2017 klo 12:02: > > I fiddled with it a bit, now 35% automatched. > > > > Will try some more, but there are some sanity constraints on the > > matching. If it finds more than one match for the name, it does not set > > any match, because random matches on the same name were annoying in the > > past. There is also a type constraint, which might skip some Wikidata > > items without appropriate instance/subclass. > > > > On Mon, Jun 19, 2017 at 8:09 AM Osma Suominen <osma.suomi...@helsinki.fi > > <mailto:osma.suomi...@helsinki.fi>> wrote: > > > > Hi Magnus, all, > > > > I've been looking a bit closer at the YSO places catalog [1] in > > Mix'n'match and I'm wondering why only 20% of the places were > > automatically matched. > > > > For example, Nepal (http://www.yso.fi/onto/yso/p107682) was > > automatically matched to Nepal (Q837). > > > > But: > > > > Accra (http://www.yso.fi/onto/yso/p138653) was not matched to Accra > > (Q3761). > > > > Aceh (http://www.yso.fi/onto/yso/p147889) was not matched to Aceh > > (Q1823). > > > > Akkunusjoki (http://www.yso.fi/onto/yso/p109251) was not matched to > > Akkunusjoki (Q12253027). > > > > There are many more cases like this. So the precision of the > automatic > > matching seems good (all but one were correct so far), but the > recall is > > rather low, and even in cases where the label is identical a match > has > > not been suggested. Is there anything that could be done about this? > > > > > > Somewhat related to this, it seems that none of the places with > > parenthetical qualifiers in their names were matched. For example > "Ahjo > > (Kerava)" could have been matched to Q11849902 (which has a Finnish > > label that is identical) and "Ala-Malmi (Helsinki)" could have been > > matched to Q2829441 ("Ala-Malmi"). Since almost 60% of the place > names > > include parenthetical qualifiers - to make them unique despite > different > > places having identical names - this means that a lot of potential > > matches are missing. Could something be done to improve the > situation? > > > > > > If Mix'n'match is incapable of automatically matching cases like > this, > > would it help if I did an automatic matching externally using some > other > > tool, and then gave the potential matches as e.g. a CSV file that > could > > then be imported into Mix'n'match so that they can be verified there? > > > > -Osma > > > > [1] https://tools.wmflabs.org/mix-n-match/#/catalog/473 > > > > > > Osma Suominen kirjoitti 17.06.2017 klo 13:13: > > > Hi Magnus, > > > > > > Thanks a lot, that was fast! And the results look very good! > > > > > > I confirmed a couple dozen automated mapping and fixed an > > incorrect one > > > ("Amerikka" was matched to USA, but I changed it to "Americas"). > > Then I > > > started hitting rate limit errors. I guess it would be possible > > to avoid > > > those with some extra permissions? > > > > > > About 20% of the places were automatically matched. Probably most > > of the > > > remaining ones - around 5000 - do not exist in Wikidata because > > they are > > > e.g. towns and villages in Finland. Would it be fair game to > > create all > > > of them in Wikidata? > > > > > > -Osma > > > > > > > -- > > Osma Suominen > > D.Sc. (Tech), Information Systems Specialist > > National Library of Finland > > P.O. Box 26 (Kaikukatu 4) > > 00014 HELSINGIN YLIOPISTO > > Tel. +358 50 3199529 <+358%2050%203199529> <tel:+358%2050%203199529> > > osma.suomi...@helsinki.fi <mailto:osma.suomi...@helsinki.fi> > > http://www.nationallibrary.fi > > > > _______________________________________________ > > Wikidata mailing list > > Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> > > https://lists.wikimedia.org/mailman/listinfo/wikidata > > > > > > > > _______________________________________________ > > Wikidata mailing list > > Wikidata@lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/wikidata > > > > > -- > Osma Suominen > D.Sc. (Tech), Information Systems Specialist > National Library of Finland > P.O. Box 26 (Kaikukatu 4) > 00014 HELSINGIN YLIOPISTO > Tel. +358 50 3199529 <+358%2050%203199529> > osma.suomi...@helsinki.fi > http://www.nationallibrary.fi > > _______________________________________________ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata >
_______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata