Hi!

Thank you Thad for your support!

First some pieces of news about the current progress:

The work on Primary Sources and the Freebase mapping is currently on hold since 
the last day of my Google internship (in late August). We have already a lot 
(13.7M) statements in the Primary Sources tool and I think that we should maybe 
try to make Wikidata adopt them before creating some other ones. 

Some answer:

> First ... it looks like you REALLY need my help to finish the Freebase 
> mapping ? Hardly anything looks done...and I have the time and knowledge to 
> fill it all in completely...  
> https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase/Class_mapping

This page is an attempt to map Freebase types to Wikidata classes. But it seems 
to me that it won't lead to any big addition of new good statements: the class 
hierarchy of Wikidata is very different from the Freebase type hierarchy making 
the mapping difficult. I have already done something for people by creating a 
file with the Qids of Wikidata items mapped to a /people/person but without P31 
Q5. Something like an half of these were not, in fact, items about a person 
(it's a wet finger estimation) so I decided not to add these data into Primary 
Sources. But I have given this file to Magnus who has imported them into his 
"person" game (thank you Magnus :-)).

> It looks like TPT had another page where the WD Properties were being mapped 
> to Freebase here: 
> https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase/Mapping Do you 
> need help in filling that out more ?

I believe that the top properties are now mapped (we have 360 properties 
mapped). For example, if I take the dataset of facts tagged as reviewed in the 
dump [1] that have as subject a mapped topic, I am able to map 92% of them to 
Wikidata claims. So, if you have time to improve the mapping it would be a very 
nice task but I don't think it'll be the most rewarding. I believe that a task 
to improve the mapping between Freebase topics and Wikidata item will lead to 
far more additions (the mapping used to create the current content of the 
Primary Sources tool has only 4.56M connections).

>  This is great, and we should thank these people because they are the ones 
> who are now working on what we are just talking about here. In addition, we 
> should think about ways of engaging more community in this. Some ideas:

Thank you very much for all these ideas. I am currently working on these two 
sides in order to move forward the importation of the already mapped statements:

1. Import some "good" datasets using my bot. I have already done it for the 
"simple" facts about humans (birth date, birth place...) that are tagged as 
reviewed in the Freebase dump [1]. I have created a wiki page to coordinate 
this work: 
https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase/Good_datasets

2. Optimize the Primary Sources tool in order to make it more usable. I have 
done some work in order to decrease the load time and my aim is now to try to 
avoid the unneeded page reloads.

Cheers,

Thomas

[1] See http://www.freebase.com/freebase/valuenotation/is_reviewed


> Le 28 sept. 2015 à 21:36, Markus Krötzsch <mar...@semantic-mediawiki.org> a 
> écrit :
> 
> Gerard,
> 
> Why do you spend so much energy on criticising the work of other volunteers 
> and companies that want to help Wikidata? Switching off Primary Sources would 
> not achieve any progress towards what you want. I have made some proposals in 
> my email on what else could be done to speed things up. You could work on 
> realising some of these ideas, you could propose other activities to the 
> community, or you could just help elsewhere on Wikidata. Focussing on a tool 
> you don't like and don't want to use will not make you (or the rest of us) 
> happy.
> 
> Markus
> 
> 
> On 28.09.2015 20:01, Gerard Meijssen wrote:
>> Hoi,
>> 
>> Sorry I disagree with your analysis. The fundamental issue is not
>> quality and it is not the size of our community. The issue is that we
>> have our priorities wrong. As far as I am concerned the "primary sources
>> tool" is a wrong approach for a dataset like Freebase or DBpedia.
>> 
>> What we should concentrate on is find likely issues that exist in
>> Wikidata. Make people aware of them and have a proper workflow that will
>> point people to the things they care about. When I care about "polders"
>> show me content where another source disagrees with what we have. As I
>> care about "polders" I will spend time on it BECAUSE I care and am
>> invited to resolve issues. I will be challenged because every item I
>> touch has an issue. I do not mind to do this when the data in Wikidata
>> differs from DBpedia, Freebase or whatever.. My time is well spend. THAT
>> is why I will be challenged, that is why I will be willing to work on this.
>> 
>> I will not do this for new data in the primary sources tool. At most I
>> will give it a glance and accept it. I would only do this where data in
>> the primary sources tool differs. That however is exactly the same
>> scenario that I just described.
>> 
>> I am not willing to look at data in Wikidata Freebase or DBpedia in the
>> primary sources tool one item/statement at a time; we know that they are
>> of a similar quality as Wikidata. The percentages make it a waste of
>> time. With iterative comparisons of other sources we will find the
>> booboos easy enough. We will spend the time of our communities
>> effectively and we will increase quality and quality and community.
>> 
>> The approach of the primary sources tool is wrong. It should only be
>> about linking data and define how this is done.
>> 
>> The problem is indeed with the community. Its time is wasted and it is
>> much more effective for me to add new data than work on data that is
>> already in the primary sources tool.
>> Thanks,
>>        GerardM
>> 
>> On 28 September 2015 at 16:52, Markus Krötzsch
>> <mar...@semantic-mediawiki.org <mailto:mar...@semantic-mediawiki.org>>
>> wrote:
>> 
>>    Hi Gerard, hi all,
>> 
>>    The key misunderstanding here is that the main issue with the
>>    Freebase import would be data quality. It is actually community
>>    support. The goal of the current slow import process is for the
>>    Wikidata community to "adopt" the Freebase data. It's not about
>>    "storing" the data somewhere, but about finding a way to maintain it
>>    in the future.
>> 
>>    The import statistics show that Wikidata does not currently have
>>    enough community power for a quick import. This is regrettable, but
>>    not something that we can fix by dumping in more data that will then
>>    be orphaned.
>> 
>>    Freebase people: this is not a small amount of data for our young
>>    community. We really need your help to digest this huge amount of
>>    data! I am absolutely convinced from the emails I saw here that none
>>    of the former Freebase editors on this list would support low
>>    quality standards. They have fought hard to fix errors and avoid
>>    issues coming into their data for a long time.
>> 
>>    Nobody believes that either Freebase or Wikidata can ever be free of
>>    errors, and this is really not the point of this discussion at all
>>    [1]. The experienced community managers among us know that it is not
>>    about the amount of data you have. Data is cheap and easy to get,
>>    even free data with very high quality. But the value proposition of
>>    Wikidata is not that it can provide storage space for lot of data --
>>    it is that we have a functioning community that can maintain it. For
>>    the Freebase data donation, we do not seem to have this community
>>    yet. We need to find a way to engage people to do this. Ideas are
>>    welcome.
>> 
>>    What I can see from the statistics, however, is that some users (and
>>    I cannot say if they are "Freebase users" or "Wikidata users" ;-)
>>    are putting a lot of effort into integrating the data already. This
>>    is great, and we should thank these people because they are the ones
>>    who are now working on what we are just talking about here. In
>>    addition, we should think about ways of engaging more community in
>>    this. Some ideas:
>> 
>>    (1) Find a way to clean and import some statements using bots. Maybe
>>    there are cases where Freebase already had a working import
>>    infrastructure that could be migrated to Wikidata? This would also
>>    solve the community support problem in one way. We just need to
>>    import the maintenance infrastructure together with the data.
>> 
>>    (2) Find a way to expose specific suggestions to more people. The
>>    Wikidata Games have attracted so many contributions. Could some of
>>    the Freebase data be solved in this way, with a dedicated UI?
>> 
>>    (3) Organise Freebase edit-a-thons where people come together to
>>    work through a bunch of suggested statements.
>> 
>>    (4) Form wiki projects that discuss a particular topic domain in
>>    Freebase and how it could be imported faster using (1)-(3) or any
>>    other idea.
>> 
>>    (5) Connect to existing Wiki projects to make them aware of valuable
>>    data they might take from Freebase.
>> 
>>    Freebase is a much better resource than many other data resources we
>>    are already using with similar approaches as (1)-(5) above, and yet
>>    it seems many people are waiting for Google alone to come up with a
>>    solution.
>> 
>>    Cheers,
>> 
>>    Markus
>> 
>>    [1] Gerard, if you think otherwise, please let us know which error
>>    rates you think are typical or acceptable for Freebase and Wikidata,
>>    respectively. Without giving actual numbers you just produce empty
>>    strawman arguments (for example: claiming that anyone would think
>>    that Wikidata is better quality than Freebase and then refuting this
>>    point, which nobody is trying to make). See
>>    https://en.wikipedia.org/wiki/Straw_man
>> 
>> 
>>    On 26.09.2015 18:31, Gerard Meijssen wrote:
>> 
>>        Hoi,
>>        When you analyse the statistics, it shows how bad the current
>>        state of
>>        affairs is. Slightly over one in a thousanths of the content of the
>>        primary sources tool has been included.
>> 
>>        Markus, Lydia and myself agree that the content of Freebase may be
>>        improved. Where we differ is that the same can be said for
>>        Wikidata. It
>>        is not much better and by including the data from Freebase we have a
>>        much improved coverage of facts. The same can be said for the
>>        content of
>>        DBpedia probably other sources as well.
>> 
>>        I seriously hate this procrastination and the denial of the
>>        efforts of
>>        others. It is one type of discrimination that is utterly deplorable.
>> 
>>        We should concentrate on comparing Wikidata with other sources
>>        that are
>>        maintained. We should do this repeatedly and concentrate on
>>        workflows
>>        that seek the differences and provide workflows that help our
>>        community
>>        to improve what we have. What we have is the sum of all available
>>        knowledge and by splitting it up, we are weakened as a result.
>>        Thanks,
>>                GerardM
>> 
>>        On 26 September 2015 at 03:32, Thad Guidry <thadgui...@gmail.com
>>        <mailto:thadgui...@gmail.com>
>>        <mailto:thadgui...@gmail.com <mailto:thadgui...@gmail.com>>> wrote:
>> 
>>             Also, Freebase users themselves who did daily, weekly
>>        work.... some
>>             where passing users, some tried harder, but made lots of
>>        erroneous
>>             entries (battling against our Experts at times).  We could
>>        probably
>>             provide a list of those sorta community blacklisted users
>>        who's data
>>             submissions should probably not be trusted.
>> 
>>             +1 for looking at better maintained specific properties.
>>             +1 for being cautious for some Freebase usernames and their
>>        entries.
>>             +1 for trusting wholesale all of the Freebase Experts
>>        submissions.
>>             We policed each other quite well.
>> 
>> 
>> 
>>             Thad
>>             +ThadGuidry <https://www.google.com/+ThadGuidry>
>> 
>>             On Fri, Sep 25, 2015 at 11:45 AM, Jason Douglas
>>             <jasondoug...@google.com <mailto:jasondoug...@google.com>
>>        <mailto:jasondoug...@google.com
>>        <mailto:jasondoug...@google.com>>> wrote:
>> 
>>                 > It would indeed be interesting to see which
>>        percentage of proposals are
>>                 > being approved (and stay in Wikidata after a while),
>>        and whether there
>>                 > is a pattern (100% approval on some type of fact that
>>        could then be
>>                 > merged more quickly; or very low approval on
>>        something else that would
>>                 > maybe better revisited for mapping errors or other
>>        systematic problems).
>> 
>>                 +1, I think that's your best bet. Specific properties
>>        were much
>>                 better maintained than others -- identify those that
>>        meet the
>>                 bar for wholesale import and leave the rest to the primary
>>                 sources tool.
>> 
>>                 On Thu, Sep 24, 2015 at 4:03 PM Markus Krötzsch
>>                 <mar...@semantic-mediawiki.org
>>        <mailto:mar...@semantic-mediawiki.org>
>>                 <mailto:mar...@semantic-mediawiki.org
>>        <mailto:mar...@semantic-mediawiki.org>>> wrote:
>> 
>>                     On 24.09.2015 23:48, James Heald wrote:
>>                      > Has anybody actually done an assessment on
>>        Freebase and
>>                     its reliability?
>>                      >
>>                      > Is it *really* too unreliable to import wholesale?
>> 
>>                       From experience with the Primary Sources tool
>>        proposals,
>>                     the quality is
>>                     mixed. Some things it proposes are really very
>>        valuable, but
>>                     other
>>                     things are also just wrong. I added a few very
>>        useful facts
>>                     and fitting
>>                     references based on the suggestions, but I also
>>        rejected
>>                     others. Not
>>                     sure what the success rate is for the cases I
>>        looked at, but
>>                     my feeling
>>                     is that some kind of "supervised import" approach
>>        is really
>>                     needed when
>>                     considering the total amount of facts.
>> 
>>                     An issue is that it is often fairly hard to tell if a
>>                     suggestion is true
>>                     or not (mainly in cases where no references are
>>        suggested to
>>                     check). In
>>                     other cases, I am just not sure if a fact is
>>        correct for the
>>                     property
>>                     used. For example, I recently ended up accepting
>>        "architect:
>>                     Charles
>>                     Husband" for Lovell Telescope (Q555130), but to be
>>        honest I
>>                     am not sure
>>                     that this is correct: he was the leading engineer
>>        contracted
>>                     to design
>>                     the telescope, which seems different from an
>>        architect; no
>>                     official web
>>                     site uses the word "architect" it seems; I could
>>        not find a
>>                     better
>>                     property though, and it seemed "good enough" to
>>        accept it
>>                     (as opposed to
>>                     the post code of the location of this structure, which
>>                     apparently was
>>                     just wrong).
>> 
>>                      >
>>                      > Are there any stats/progress graphs as to how
>>        the actual
>>                     import is in
>>                      > fact going?
>> 
>>                     It would indeed be interesting to see which
>>        percentage of
>>                     proposals are
>>                     being approved (and stay in Wikidata after a
>>        while), and
>>                     whether there
>>                     is a pattern (100% approval on some type of fact
>>        that could
>>                     then be
>>                     merged more quickly; or very low approval on
>>        something else
>>                     that would
>>                     maybe better revisited for mapping errors or other
>>                     systematic problems).
>> 
>>                     Markus
>> 
>> 
>>                      >
>>                      >    -- James.
>>                      >
>>                      >
>>                      > On 24/09/2015 19:35, Lydia Pintscher wrote:
>>                      >> On Thu, Sep 24, 2015 at 8:31 PM, Tom Morris
>>                     <tfmor...@gmail.com <mailto:tfmor...@gmail.com>
>>        <mailto:tfmor...@gmail.com <mailto:tfmor...@gmail.com>>> wrote:
>>                      >>>> This is to add MusicBrainz to the primary
>>        source tool,
>>                     not anything
>>                      >>>> else?
>>                      >>>
>>                      >>>
>>                      >>> It's apparently worse than that (which I hadn't
>>                     realized until I
>>                      >>> re-read the
>>                      >>> transcript).  It sounds like it's just going to
>>                     generate little warning
>>                      >>> icons for "bad" facts and not lead to the
>>        recording of
>>                     any new facts
>>                      >>> at all.
>>                      >>>
>>                      >>> 17:22:33 <Lydia_WMDE> we'll also work on
>>        getting the
>>                     extension
>>                      >>> deployed that
>>                      >>> will help with checking against 3rd party
>>        databases
>>                      >>> 17:23:33 <Lydia_WMDE> the result of constraint
>>        checks
>>                     and checks
>>                      >>> against 3rd
>>                      >>> party databases will then be used to display
>>        little
>>                     indicators next to a
>>                      >>> statement in case it is problematic
>>                      >>> 17:23:47 <Lydia_WMDE> i hope this way more people
>>                     become aware of
>>                      >>> issues and
>>                      >>> can help fix them
>>                      >>> 17:24:35 <sjoerddebruin> Do you have any names of
>>                     databases that are
>>                      >>> supported? :)
>>                      >>> 17:24:59 <Lydia_WMDE> sjoerddebruin: in the first
>>                     version the german
>>                      >>> national library. it can be extended later
>>                      >>>
>>                      >>>
>>                      >>> I know Freebase is deemed to be nasty and
>>        unreliable,
>>                     but is MusicBrainz
>>                      >>> considered trustworthy enough to import
>>        directly or
>>                     will its facts
>>                      >>> need to
>>                      >>> be dripped through the primary source soda
>>        straw one at
>>                     a time too?
>>                      >>
>>                      >> The primary sources tool and the extension that
>>        helps us
>>                     check against
>>                      >> other databases are two independent things.
>>                      >> Imports from Musicbrainz have been happening
>>        since a
>>                     very long time
>>                      >> already.
>>                      >>
>>                      >>
>>                      >> Cheers
>>                      >> Lydia
>>                      >>
>>                      >
>>                      >
>>                      > _______________________________________________
>>                      > Wikidata mailing list
>>                      > Wikidata@lists.wikimedia.org
>>        <mailto:Wikidata@lists.wikimedia.org>
>>                     <mailto:Wikidata@lists.wikimedia.org
>>        <mailto:Wikidata@lists.wikimedia.org>>
>>                      >
>>        https://lists.wikimedia.org/mailman/listinfo/wikidata
>> 
>> 
>>                     _______________________________________________
>>                     Wikidata mailing list
>>        Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
>>                     <mailto:Wikidata@lists.wikimedia.org
>>        <mailto:Wikidata@lists.wikimedia.org>>
>>        https://lists.wikimedia.org/mailman/listinfo/wikidata
>> 
>> 
>>                 _______________________________________________
>>                 Wikidata mailing list
>>        Wikidata@lists.wikimedia.org
>>        <mailto:Wikidata@lists.wikimedia.org>
>>        <mailto:Wikidata@lists.wikimedia.org
>>        <mailto:Wikidata@lists.wikimedia.org>>
>>        https://lists.wikimedia.org/mailman/listinfo/wikidata
>> 
>> 
>> 
>>             _______________________________________________
>>             Wikidata mailing list
>>        Wikidata@lists.wikimedia.org
>>        <mailto:Wikidata@lists.wikimedia.org>
>>        <mailto:Wikidata@lists.wikimedia.org
>>        <mailto:Wikidata@lists.wikimedia.org>>
>>        https://lists.wikimedia.org/mailman/listinfo/wikidata
>> 
>> 
>> 
>> 
>>        _______________________________________________
>>        Wikidata mailing list
>>        Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
>>        https://lists.wikimedia.org/mailman/listinfo/wikidata
>> 
>> 
>> 
>>    _______________________________________________
>>    Wikidata mailing list
>>    Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
>>    https://lists.wikimedia.org/mailman/listinfo/wikidata
>> 
>> 
>> 
>> 
>> _______________________________________________
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>> 
> 
> 
> _______________________________________________
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata


_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to