Actually, my suggestion would be to switch on Primary Sources as a default tool for everyone. That should increase exposure and turnover, without compromising quality of data.
On Mon, Sep 28, 2015 at 2:23 PM Denny Vrandečić <vrande...@google.com> wrote: > Hi Gerard, > > given the statistics you cite from > > https://tools.wmflabs.org/wikidata-primary-sources/status.html > > I see that 19.6k statements have been approved through the tool, and 5.1k > statements have been rejected - which means that about 1 in 5 statements is > deemed unsuitable by the users of primary sources. > > Given that there are 12.4M statements in the tool, this means that about > 2.5M statements will turn out to be unsuitable for inclusion in Wikidata > (if the current ratio holds). Are you suggesting to upload all of these > statements to Wikidata? > > Tpt already did upload pieces of the data which have sufficient quality > outside the primary sources tool, and more is planned. But for the data > where the suitability for Wikidata seems questionable, I would not know > what other approach to use. Do you have a suggestion? > > Once you have a suggestion and there is community consensus in doing it, > no one will stand in the way of implementing that suggestion. > > Cheers, > Denny > > > On Mon, Sep 28, 2015 at 1:19 PM John Erling Blad <jeb...@gmail.com> wrote: > >> Another; make a kind of worklist on Wikidata that reflect the watchlist >> on the clients (Wikipedias) but then, we often have items on our watchlist >> that we don't know much about. (Digression: Somehow we should be able to >> sort out those things we know (the place we live, the persons we have meet) >> from those things we have done (edited, copy-pasted).) >> >> I been trying to get some interest in the past for worklists on >> Wikipedia, it isn't much interest to make them. It would speed up tedious >> tasks of finding the next page to edit after a given edit is completed. It >> is the same problem with imports from Freebase on Wikidata, locate the next >> item on Wikidata with the same queued statement from Freebase, but within >> some worklist that the user has some knowledge about. >> >> Imagine "municipalities within a county" or "municipalities that is also >> on the users watchlist", and combine that with available unhandled >> Freebase-statements. >> >> On Mon, Sep 28, 2015 at 10:09 PM, John Erling Blad <jeb...@gmail.com> >> wrote: >> >>> Could it be possible to create some kind of info (notification?) in a >>> wikipedia article that additional data is available in a queue ("freebase") >>> somewhere? >>> >>> If you have the article on your watch-list, then you will get a warning >>> that says "You lazy boy, get your ass over here and help us out!" Or >>> perhaps slightly rephrased. >>> >>> On Mon, Sep 28, 2015 at 4:52 PM, Markus Krötzsch < >>> mar...@semantic-mediawiki.org> wrote: >>> >>>> Hi Gerard, hi all, >>>> >>>> The key misunderstanding here is that the main issue with the Freebase >>>> import would be data quality. It is actually community support. The goal of >>>> the current slow import process is for the Wikidata community to "adopt" >>>> the Freebase data. It's not about "storing" the data somewhere, but about >>>> finding a way to maintain it in the future. >>>> >>>> The import statistics show that Wikidata does not currently have enough >>>> community power for a quick import. This is regrettable, but not something >>>> that we can fix by dumping in more data that will then be orphaned. >>>> >>>> Freebase people: this is not a small amount of data for our young >>>> community. We really need your help to digest this huge amount of data! I >>>> am absolutely convinced from the emails I saw here that none of the former >>>> Freebase editors on this list would support low quality standards. They >>>> have fought hard to fix errors and avoid issues coming into their data for >>>> a long time. >>>> >>>> Nobody believes that either Freebase or Wikidata can ever be free of >>>> errors, and this is really not the point of this discussion at all [1]. The >>>> experienced community managers among us know that it is not about the >>>> amount of data you have. Data is cheap and easy to get, even free data with >>>> very high quality. But the value proposition of Wikidata is not that it can >>>> provide storage space for lot of data -- it is that we have a functioning >>>> community that can maintain it. For the Freebase data donation, we do not >>>> seem to have this community yet. We need to find a way to engage people to >>>> do this. Ideas are welcome. >>>> >>>> What I can see from the statistics, however, is that some users (and I >>>> cannot say if they are "Freebase users" or "Wikidata users" ;-) are putting >>>> a lot of effort into integrating the data already. This is great, and we >>>> should thank these people because they are the ones who are now working on >>>> what we are just talking about here. In addition, we should think about >>>> ways of engaging more community in this. Some ideas: >>>> >>>> (1) Find a way to clean and import some statements using bots. Maybe >>>> there are cases where Freebase already had a working import infrastructure >>>> that could be migrated to Wikidata? This would also solve the community >>>> support problem in one way. We just need to import the maintenance >>>> infrastructure together with the data. >>>> >>>> (2) Find a way to expose specific suggestions to more people. The >>>> Wikidata Games have attracted so many contributions. Could some of the >>>> Freebase data be solved in this way, with a dedicated UI? >>>> >>>> (3) Organise Freebase edit-a-thons where people come together to work >>>> through a bunch of suggested statements. >>>> >>>> (4) Form wiki projects that discuss a particular topic domain in >>>> Freebase and how it could be imported faster using (1)-(3) or any other >>>> idea. >>>> >>>> (5) Connect to existing Wiki projects to make them aware of valuable >>>> data they might take from Freebase. >>>> >>>> Freebase is a much better resource than many other data resources we >>>> are already using with similar approaches as (1)-(5) above, and yet it >>>> seems many people are waiting for Google alone to come up with a solution. >>>> >>>> Cheers, >>>> >>>> Markus >>>> >>>> [1] Gerard, if you think otherwise, please let us know which error >>>> rates you think are typical or acceptable for Freebase and Wikidata, >>>> respectively. Without giving actual numbers you just produce empty strawman >>>> arguments (for example: claiming that anyone would think that Wikidata is >>>> better quality than Freebase and then refuting this point, which nobody is >>>> trying to make). See https://en.wikipedia.org/wiki/Straw_man >>>> >>>> >>>> On 26.09.2015 18:31, Gerard Meijssen wrote: >>>> >>>>> Hoi, >>>>> When you analyse the statistics, it shows how bad the current state of >>>>> affairs is. Slightly over one in a thousanths of the content of the >>>>> primary sources tool has been included. >>>>> >>>>> Markus, Lydia and myself agree that the content of Freebase may be >>>>> improved. Where we differ is that the same can be said for Wikidata. It >>>>> is not much better and by including the data from Freebase we have a >>>>> much improved coverage of facts. The same can be said for the content >>>>> of >>>>> DBpedia probably other sources as well. >>>>> >>>>> I seriously hate this procrastination and the denial of the efforts of >>>>> others. It is one type of discrimination that is utterly deplorable. >>>>> >>>>> We should concentrate on comparing Wikidata with other sources that are >>>>> maintained. We should do this repeatedly and concentrate on workflows >>>>> that seek the differences and provide workflows that help our community >>>>> to improve what we have. What we have is the sum of all available >>>>> knowledge and by splitting it up, we are weakened as a result. >>>>> Thanks, >>>>> GerardM >>>>> >>>>> On 26 September 2015 at 03:32, Thad Guidry <thadgui...@gmail.com >>>>> <mailto:thadgui...@gmail.com>> wrote: >>>>> >>>>> Also, Freebase users themselves who did daily, weekly work.... some >>>>> where passing users, some tried harder, but made lots of erroneous >>>>> entries (battling against our Experts at times). We could probably >>>>> provide a list of those sorta community blacklisted users who's >>>>> data >>>>> submissions should probably not be trusted. >>>>> >>>>> +1 for looking at better maintained specific properties. >>>>> +1 for being cautious for some Freebase usernames and their >>>>> entries. >>>>> +1 for trusting wholesale all of the Freebase Experts submissions. >>>>> We policed each other quite well. >>>>> >>>>> >>>>> >>>>> Thad >>>>> +ThadGuidry <https://www.google.com/+ThadGuidry> >>>>> >>>>> On Fri, Sep 25, 2015 at 11:45 AM, Jason Douglas >>>>> <jasondoug...@google.com <mailto:jasondoug...@google.com>> wrote: >>>>> >>>>> > It would indeed be interesting to see which percentage of >>>>> proposals are >>>>> > being approved (and stay in Wikidata after a while), and >>>>> whether there >>>>> > is a pattern (100% approval on some type of fact that could >>>>> then be >>>>> > merged more quickly; or very low approval on something else >>>>> that would >>>>> > maybe better revisited for mapping errors or other >>>>> systematic problems). >>>>> >>>>> +1, I think that's your best bet. Specific properties were much >>>>> better maintained than others -- identify those that meet the >>>>> bar for wholesale import and leave the rest to the primary >>>>> sources tool. >>>>> >>>>> On Thu, Sep 24, 2015 at 4:03 PM Markus Krötzsch >>>>> <mar...@semantic-mediawiki.org >>>>> <mailto:mar...@semantic-mediawiki.org>> wrote: >>>>> >>>>> On 24.09.2015 23:48, James Heald wrote: >>>>> > Has anybody actually done an assessment on Freebase and >>>>> its reliability? >>>>> > >>>>> > Is it *really* too unreliable to import wholesale? >>>>> >>>>> From experience with the Primary Sources tool proposals, >>>>> the quality is >>>>> mixed. Some things it proposes are really very valuable, >>>>> but >>>>> other >>>>> things are also just wrong. I added a few very useful facts >>>>> and fitting >>>>> references based on the suggestions, but I also rejected >>>>> others. Not >>>>> sure what the success rate is for the cases I looked at, >>>>> but >>>>> my feeling >>>>> is that some kind of "supervised import" approach is really >>>>> needed when >>>>> considering the total amount of facts. >>>>> >>>>> An issue is that it is often fairly hard to tell if a >>>>> suggestion is true >>>>> or not (mainly in cases where no references are suggested >>>>> to >>>>> check). In >>>>> other cases, I am just not sure if a fact is correct for >>>>> the >>>>> property >>>>> used. For example, I recently ended up accepting >>>>> "architect: >>>>> Charles >>>>> Husband" for Lovell Telescope (Q555130), but to be honest I >>>>> am not sure >>>>> that this is correct: he was the leading engineer >>>>> contracted >>>>> to design >>>>> the telescope, which seems different from an architect; no >>>>> official web >>>>> site uses the word "architect" it seems; I could not find a >>>>> better >>>>> property though, and it seemed "good enough" to accept it >>>>> (as opposed to >>>>> the post code of the location of this structure, which >>>>> apparently was >>>>> just wrong). >>>>> >>>>> > >>>>> > Are there any stats/progress graphs as to how the actual >>>>> import is in >>>>> > fact going? >>>>> >>>>> It would indeed be interesting to see which percentage of >>>>> proposals are >>>>> being approved (and stay in Wikidata after a while), and >>>>> whether there >>>>> is a pattern (100% approval on some type of fact that could >>>>> then be >>>>> merged more quickly; or very low approval on something else >>>>> that would >>>>> maybe better revisited for mapping errors or other >>>>> systematic problems). >>>>> >>>>> Markus >>>>> >>>>> >>>>> > >>>>> > -- James. >>>>> > >>>>> > >>>>> > On 24/09/2015 19:35, Lydia Pintscher wrote: >>>>> >> On Thu, Sep 24, 2015 at 8:31 PM, Tom Morris >>>>> <tfmor...@gmail.com <mailto:tfmor...@gmail.com>> wrote: >>>>> >>>> This is to add MusicBrainz to the primary source >>>>> tool, >>>>> not anything >>>>> >>>> else? >>>>> >>> >>>>> >>> >>>>> >>> It's apparently worse than that (which I hadn't >>>>> realized until I >>>>> >>> re-read the >>>>> >>> transcript). It sounds like it's just going to >>>>> generate little warning >>>>> >>> icons for "bad" facts and not lead to the recording of >>>>> any new facts >>>>> >>> at all. >>>>> >>> >>>>> >>> 17:22:33 <Lydia_WMDE> we'll also work on getting the >>>>> extension >>>>> >>> deployed that >>>>> >>> will help with checking against 3rd party databases >>>>> >>> 17:23:33 <Lydia_WMDE> the result of constraint checks >>>>> and checks >>>>> >>> against 3rd >>>>> >>> party databases will then be used to display little >>>>> indicators next to a >>>>> >>> statement in case it is problematic >>>>> >>> 17:23:47 <Lydia_WMDE> i hope this way more people >>>>> become aware of >>>>> >>> issues and >>>>> >>> can help fix them >>>>> >>> 17:24:35 <sjoerddebruin> Do you have any names of >>>>> databases that are >>>>> >>> supported? :) >>>>> >>> 17:24:59 <Lydia_WMDE> sjoerddebruin: in the first >>>>> version the german >>>>> >>> national library. it can be extended later >>>>> >>> >>>>> >>> >>>>> >>> I know Freebase is deemed to be nasty and unreliable, >>>>> but is MusicBrainz >>>>> >>> considered trustworthy enough to import directly or >>>>> will its facts >>>>> >>> need to >>>>> >>> be dripped through the primary source soda straw one >>>>> at >>>>> a time too? >>>>> >> >>>>> >> The primary sources tool and the extension that helps >>>>> us >>>>> check against >>>>> >> other databases are two independent things. >>>>> >> Imports from Musicbrainz have been happening since a >>>>> very long time >>>>> >> already. >>>>> >> >>>>> >> >>>>> >> Cheers >>>>> >> Lydia >>>>> >> >>>>> > >>>>> > >>>>> > _______________________________________________ >>>>> > Wikidata mailing list >>>>> > Wikidata@lists.wikimedia.org >>>>> <mailto:Wikidata@lists.wikimedia.org> >>>>> > https://lists.wikimedia.org/mailman/listinfo/wikidata >>>>> >>>>> >>>>> _______________________________________________ >>>>> Wikidata mailing list >>>>> Wikidata@lists.wikimedia.org >>>>> <mailto:Wikidata@lists.wikimedia.org> >>>>> https://lists.wikimedia.org/mailman/listinfo/wikidata >>>>> >>>>> >>>>> _______________________________________________ >>>>> Wikidata mailing list >>>>> Wikidata@lists.wikimedia.org <mailto: >>>>> Wikidata@lists.wikimedia.org> >>>>> https://lists.wikimedia.org/mailman/listinfo/wikidata >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Wikidata mailing list >>>>> Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> >>>>> https://lists.wikimedia.org/mailman/listinfo/wikidata >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Wikidata mailing list >>>>> Wikidata@lists.wikimedia.org >>>>> https://lists.wikimedia.org/mailman/listinfo/wikidata >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Wikidata mailing list >>>> Wikidata@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/wikidata >>>> >>> >>> >> _______________________________________________ >> Wikidata mailing list >> Wikidata@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wikidata >> > _______________________________________________ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata >
_______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata