Re: [Wikidata] Importing Freebase (Was: next Wikidata office hour)

Denny Vrandečić Mon, 28 Sep 2015 14:31:38 -0700

Actually, my suggestion would be to switch on Primary Sources as a default
tool for everyone. That should increase exposure and turnover, without
compromising quality of data.




On Mon, Sep 28, 2015 at 2:23 PM Denny Vrandečić <vrande...@google.com>
wrote:

> Hi Gerard,
>
> given the statistics you cite from
>
> https://tools.wmflabs.org/wikidata-primary-sources/status.html
>
> I see that 19.6k statements have been approved through the tool, and 5.1k
> statements have been rejected - which means that about 1 in 5 statements is
> deemed unsuitable by the users of primary sources.
>
> Given that there are 12.4M statements in the tool, this means that about
> 2.5M statements will turn out to be unsuitable for inclusion in Wikidata
> (if the current ratio holds). Are you suggesting to upload all of these
> statements to Wikidata?
>
> Tpt already did upload pieces of the data which have sufficient quality
> outside the primary sources tool, and more is planned. But for the data
> where the suitability for Wikidata seems questionable, I would not know
> what other approach to use. Do you have a suggestion?
>
> Once you have a suggestion and there is community consensus in doing it,
> no one will stand in the way of implementing that suggestion.
>
> Cheers,
> Denny
>
>
> On Mon, Sep 28, 2015 at 1:19 PM John Erling Blad <jeb...@gmail.com> wrote:
>
>> Another; make a kind of worklist on Wikidata that reflect the watchlist
>> on the clients (Wikipedias) but then, we often have items on our watchlist
>> that we don't know much about. (Digression: Somehow we should be able to
>> sort out those things we know (the place we live, the persons we have meet)
>> from those things we have done (edited, copy-pasted).)
>>
>> I been trying to get some interest in the past for worklists on
>> Wikipedia, it isn't much interest to make them. It would speed up tedious
>> tasks of finding the next page to edit after a given edit is completed. It
>> is the same problem with imports from Freebase on Wikidata, locate the next
>> item on Wikidata with the same queued statement from Freebase, but within
>> some worklist that the user has some knowledge about.
>>
>> Imagine "municipalities within a county" or "municipalities that is also
>> on the users watchlist", and combine that with available unhandled
>> Freebase-statements.
>>
>> On Mon, Sep 28, 2015 at 10:09 PM, John Erling Blad <jeb...@gmail.com>
>> wrote:
>>
>>> Could it be possible to create some kind of info (notification?) in a
>>> wikipedia article that additional data is available in a queue ("freebase")
>>> somewhere?
>>>
>>> If you have the article on your watch-list, then you will get a warning
>>> that says "You lazy boy, get your ass over here and help us out!" Or
>>> perhaps slightly rephrased.
>>>
>>> On Mon, Sep 28, 2015 at 4:52 PM, Markus Krötzsch <
>>> mar...@semantic-mediawiki.org> wrote:
>>>
>>>> Hi Gerard, hi all,
>>>>
>>>> The key misunderstanding here is that the main issue with the Freebase
>>>> import would be data quality. It is actually community support. The goal of
>>>> the current slow import process is for the Wikidata community to "adopt"
>>>> the Freebase data. It's not about "storing" the data somewhere, but about
>>>> finding a way to maintain it in the future.
>>>>
>>>> The import statistics show that Wikidata does not currently have enough
>>>> community power for a quick import. This is regrettable, but not something
>>>> that we can fix by dumping in more data that will then be orphaned.
>>>>
>>>> Freebase people: this is not a small amount of data for our young
>>>> community. We really need your help to digest this huge amount of data! I
>>>> am absolutely convinced from the emails I saw here that none of the former
>>>> Freebase editors on this list would support low quality standards. They
>>>> have fought hard to fix errors and avoid issues coming into their data for
>>>> a long time.
>>>>
>>>> Nobody believes that either Freebase or Wikidata can ever be free of
>>>> errors, and this is really not the point of this discussion at all [1]. The
>>>> experienced community managers among us know that it is not about the
>>>> amount of data you have. Data is cheap and easy to get, even free data with
>>>> very high quality. But the value proposition of Wikidata is not that it can
>>>> provide storage space for lot of data -- it is that we have a functioning
>>>> community that can maintain it. For the Freebase data donation, we do not
>>>> seem to have this community yet. We need to find a way to engage people to
>>>> do this. Ideas are welcome.
>>>>
>>>> What I can see from the statistics, however, is that some users (and I
>>>> cannot say if they are "Freebase users" or "Wikidata users" ;-) are putting
>>>> a lot of effort into integrating the data already. This is great, and we
>>>> should thank these people because they are the ones who are now working on
>>>> what we are just talking about here. In addition, we should think about
>>>> ways of engaging more community in this. Some ideas:
>>>>
>>>> (1) Find a way to clean and import some statements using bots. Maybe
>>>> there are cases where Freebase already had a working import infrastructure
>>>> that could be migrated to Wikidata? This would also solve the community
>>>> support problem in one way. We just need to import the maintenance
>>>> infrastructure together with the data.
>>>>
>>>> (2) Find a way to expose specific suggestions to more people. The
>>>> Wikidata Games have attracted so many contributions. Could some of the
>>>> Freebase data be solved in this way, with a dedicated UI?
>>>>
>>>> (3) Organise Freebase edit-a-thons where people come together to work
>>>> through a bunch of suggested statements.
>>>>
>>>> (4) Form wiki projects that discuss a particular topic domain in
>>>> Freebase and how it could be imported faster using (1)-(3) or any other
>>>> idea.
>>>>
>>>> (5) Connect to existing Wiki projects to make them aware of valuable
>>>> data they might take from Freebase.
>>>>
>>>> Freebase is a much better resource than many other data resources we
>>>> are already using with similar approaches as (1)-(5) above, and yet it
>>>> seems many people are waiting for Google alone to come up with a solution.
>>>>
>>>> Cheers,
>>>>
>>>> Markus
>>>>
>>>> [1] Gerard, if you think otherwise, please let us know which error
>>>> rates you think are typical or acceptable for Freebase and Wikidata,
>>>> respectively. Without giving actual numbers you just produce empty strawman
>>>> arguments (for example: claiming that anyone would think that Wikidata is
>>>> better quality than Freebase and then refuting this point, which nobody is
>>>> trying to make). See https://en.wikipedia.org/wiki/Straw_man
>>>>
>>>>
>>>> On 26.09.2015 18:31, Gerard Meijssen wrote:
>>>>
>>>>> Hoi,
>>>>> When you analyse the statistics, it shows how bad the current state of
>>>>> affairs is. Slightly over one in a thousanths of the content of the
>>>>> primary sources tool has been included.
>>>>>
>>>>> Markus, Lydia and myself agree that the content of Freebase may be
>>>>> improved. Where we differ is that the same can be said for Wikidata. It
>>>>> is not much better and by including the data from Freebase we have a
>>>>> much improved coverage of facts. The same can be said for the content
>>>>> of
>>>>> DBpedia probably other sources as well.
>>>>>
>>>>> I seriously hate this procrastination and the denial of the efforts of
>>>>> others. It is one type of discrimination that is utterly deplorable.
>>>>>
>>>>> We should concentrate on comparing Wikidata with other sources that are
>>>>> maintained. We should do this repeatedly and concentrate on workflows
>>>>> that seek the differences and provide workflows that help our community
>>>>> to improve what we have. What we have is the sum of all available
>>>>> knowledge and by splitting it up, we are weakened as a result.
>>>>> Thanks,
>>>>>        GerardM
>>>>>
>>>>> On 26 September 2015 at 03:32, Thad Guidry <thadgui...@gmail.com
>>>>> <mailto:thadgui...@gmail.com>> wrote:
>>>>>
>>>>>     Also, Freebase users themselves who did daily, weekly work.... some
>>>>>     where passing users, some tried harder, but made lots of erroneous
>>>>>     entries (battling against our Experts at times).  We could probably
>>>>>     provide a list of those sorta community blacklisted users who's
>>>>> data
>>>>>     submissions should probably not be trusted.
>>>>>
>>>>>     +1 for looking at better maintained specific properties.
>>>>>     +1 for being cautious for some Freebase usernames and their
>>>>> entries.
>>>>>     +1 for trusting wholesale all of the Freebase Experts submissions.
>>>>>     We policed each other quite well.
>>>>>
>>>>>
>>>>>
>>>>>     Thad
>>>>>     +ThadGuidry <https://www.google.com/+ThadGuidry>
>>>>>
>>>>>     On Fri, Sep 25, 2015 at 11:45 AM, Jason Douglas
>>>>>     <jasondoug...@google.com <mailto:jasondoug...@google.com>> wrote:
>>>>>
>>>>>         > It would indeed be interesting to see which percentage of
>>>>> proposals are
>>>>>         > being approved (and stay in Wikidata after a while), and
>>>>> whether there
>>>>>         > is a pattern (100% approval on some type of fact that could
>>>>> then be
>>>>>         > merged more quickly; or very low approval on something else
>>>>> that would
>>>>>         > maybe better revisited for mapping errors or other
>>>>> systematic problems).
>>>>>
>>>>>         +1, I think that's your best bet. Specific properties were much
>>>>>         better maintained than others -- identify those that meet the
>>>>>         bar for wholesale import and leave the rest to the primary
>>>>>         sources tool.
>>>>>
>>>>>         On Thu, Sep 24, 2015 at 4:03 PM Markus Krötzsch
>>>>>         <mar...@semantic-mediawiki.org
>>>>>         <mailto:mar...@semantic-mediawiki.org>> wrote:
>>>>>
>>>>>             On 24.09.2015 23:48, James Heald wrote:
>>>>>              > Has anybody actually done an assessment on Freebase and
>>>>>             its reliability?
>>>>>              >
>>>>>              > Is it *really* too unreliable to import wholesale?
>>>>>
>>>>>               From experience with the Primary Sources tool proposals,
>>>>>             the quality is
>>>>>             mixed. Some things it proposes are really very valuable,
>>>>> but
>>>>>             other
>>>>>             things are also just wrong. I added a few very useful facts
>>>>>             and fitting
>>>>>             references based on the suggestions, but I also rejected
>>>>>             others. Not
>>>>>             sure what the success rate is for the cases I looked at,
>>>>> but
>>>>>             my feeling
>>>>>             is that some kind of "supervised import" approach is really
>>>>>             needed when
>>>>>             considering the total amount of facts.
>>>>>
>>>>>             An issue is that it is often fairly hard to tell if a
>>>>>             suggestion is true
>>>>>             or not (mainly in cases where no references are suggested
>>>>> to
>>>>>             check). In
>>>>>             other cases, I am just not sure if a fact is correct for
>>>>> the
>>>>>             property
>>>>>             used. For example, I recently ended up accepting
>>>>> "architect:
>>>>>             Charles
>>>>>             Husband" for Lovell Telescope (Q555130), but to be honest I
>>>>>             am not sure
>>>>>             that this is correct: he was the leading engineer
>>>>> contracted
>>>>>             to design
>>>>>             the telescope, which seems different from an architect; no
>>>>>             official web
>>>>>             site uses the word "architect" it seems; I could not find a
>>>>>             better
>>>>>             property though, and it seemed "good enough" to accept it
>>>>>             (as opposed to
>>>>>             the post code of the location of this structure, which
>>>>>             apparently was
>>>>>             just wrong).
>>>>>
>>>>>              >
>>>>>              > Are there any stats/progress graphs as to how the actual
>>>>>             import is in
>>>>>              > fact going?
>>>>>
>>>>>             It would indeed be interesting to see which percentage of
>>>>>             proposals are
>>>>>             being approved (and stay in Wikidata after a while), and
>>>>>             whether there
>>>>>             is a pattern (100% approval on some type of fact that could
>>>>>             then be
>>>>>             merged more quickly; or very low approval on something else
>>>>>             that would
>>>>>             maybe better revisited for mapping errors or other
>>>>>             systematic problems).
>>>>>
>>>>>             Markus
>>>>>
>>>>>
>>>>>              >
>>>>>              >    -- James.
>>>>>              >
>>>>>              >
>>>>>              > On 24/09/2015 19:35, Lydia Pintscher wrote:
>>>>>              >> On Thu, Sep 24, 2015 at 8:31 PM, Tom Morris
>>>>>             <tfmor...@gmail.com <mailto:tfmor...@gmail.com>> wrote:
>>>>>              >>>> This is to add MusicBrainz to the primary source
>>>>> tool,
>>>>>             not anything
>>>>>              >>>> else?
>>>>>              >>>
>>>>>              >>>
>>>>>              >>> It's apparently worse than that (which I hadn't
>>>>>             realized until I
>>>>>              >>> re-read the
>>>>>              >>> transcript).  It sounds like it's just going to
>>>>>             generate little warning
>>>>>              >>> icons for "bad" facts and not lead to the recording of
>>>>>             any new facts
>>>>>              >>> at all.
>>>>>              >>>
>>>>>              >>> 17:22:33 <Lydia_WMDE> we'll also work on getting the
>>>>>             extension
>>>>>              >>> deployed that
>>>>>              >>> will help with checking against 3rd party databases
>>>>>              >>> 17:23:33 <Lydia_WMDE> the result of constraint checks
>>>>>             and checks
>>>>>              >>> against 3rd
>>>>>              >>> party databases will then be used to display little
>>>>>             indicators next to a
>>>>>              >>> statement in case it is problematic
>>>>>              >>> 17:23:47 <Lydia_WMDE> i hope this way more people
>>>>>             become aware of
>>>>>              >>> issues and
>>>>>              >>> can help fix them
>>>>>              >>> 17:24:35 <sjoerddebruin> Do you have any names of
>>>>>             databases that are
>>>>>              >>> supported? :)
>>>>>              >>> 17:24:59 <Lydia_WMDE> sjoerddebruin: in the first
>>>>>             version the german
>>>>>              >>> national library. it can be extended later
>>>>>              >>>
>>>>>              >>>
>>>>>              >>> I know Freebase is deemed to be nasty and unreliable,
>>>>>             but is MusicBrainz
>>>>>              >>> considered trustworthy enough to import directly or
>>>>>             will its facts
>>>>>              >>> need to
>>>>>              >>> be dripped through the primary source soda straw one
>>>>> at
>>>>>             a time too?
>>>>>              >>
>>>>>              >> The primary sources tool and the extension that helps
>>>>> us
>>>>>             check against
>>>>>              >> other databases are two independent things.
>>>>>              >> Imports from Musicbrainz have been happening since a
>>>>>             very long time
>>>>>              >> already.
>>>>>              >>
>>>>>              >>
>>>>>              >> Cheers
>>>>>              >> Lydia
>>>>>              >>
>>>>>              >
>>>>>              >
>>>>>              > _______________________________________________
>>>>>              > Wikidata mailing list
>>>>>              > Wikidata@lists.wikimedia.org
>>>>>             <mailto:Wikidata@lists.wikimedia.org>
>>>>>              > https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>>>
>>>>>
>>>>>             _______________________________________________
>>>>>             Wikidata mailing list
>>>>>             Wikidata@lists.wikimedia.org
>>>>>             <mailto:Wikidata@lists.wikimedia.org>
>>>>>             https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>>>
>>>>>
>>>>>         _______________________________________________
>>>>>         Wikidata mailing list
>>>>>         Wikidata@lists.wikimedia.org <mailto:
>>>>> Wikidata@lists.wikimedia.org>
>>>>>         https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>>>
>>>>>
>>>>>
>>>>>     _______________________________________________
>>>>>     Wikidata mailing list
>>>>>     Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
>>>>>     https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Wikidata mailing list
>>>>> Wikidata@lists.wikimedia.org
>>>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Wikidata mailing list
>>>> Wikidata@lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>>
>>>
>>>
>> _______________________________________________
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
> _______________________________________________
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Importing Freebase (Was: next Wikidata office hour)

Reply via email to