Amir, Thanks for your work! I like this one showing how our Sum-of-all-Paintings project is doing compared to sculptures (which have many copyright issues, but you could still put the data on Wikidata) http://tools.wmflabs.org/wd-analyst/index.php?p=p31&q=Q3305213%7CQ860861
Jane On Wed, Dec 16, 2015 at 12:23 PM, Amir Ladsgroup <ladsgr...@gmail.com> wrote: > Hey, > Thanks for your feedback. That's exactly what I'm looking for. > > On Mon, Dec 14, 2015 at 5:29 PM Paul Houle <ontolo...@gmail.com> wrote: > >> It's a step in the right direction, but it took a very long time to load >> on my computer. >> > It's maybe related to labs recent issues. Now I get reasonable time: > http://tools.pingdom.com/fpt/#!/eq1i3s/http://tools.wmflabs.org/wd-analyst/index.php > >> >> After the initial load, it was pretty peppy, then I ran the default >> example that is grayed in but not active (I had to retype it) >> > > I made some modifications that might help; > >> Then I get the page that says "results are ready" and how cool they are, >> then it takes me a while to figure out what I am looking at and finally >> realize it is a comparison of data quality metrics (which I think are all >> fact counts) between all of the P31 predicates and the Q5. >> > I made some changes so you can see things easier. I appreciate if you > suggest some words I put in the description; > > >> The use of the graphic on the first row complicated this for me. >> >> Please sugest something I write there for people :); > >> There are a lot of broken links on this page too such as >> >> http://tools.wmflabs.org/wd-analyst/sitelink.php >> https://www.wikidata.org/wiki/P31 >> > > The property broken should be fixed by now and sitelink is broken because > It's not there yet. I'll make it very soon; > >> >> >> and of course no merged in documentation about what P31 and Q5 are. >> Opaque identifiers are necessary for your project, but >> >> Also some way to find the P's and Q's hooked up to this would be most >> welcome. >> >> Done, Now we have label for everything; > >> It's a great start and is completely in the right direction but it could >> take many sprints of improvement. >> >> On Wed, Dec 9, 2015 at 4:36 AM, Gerard Meijssen < >> gerard.meijs...@gmail.com> wrote: >> >>> Hoi, >>> What would be nice is to have an option to understand progress from one >>> dump to the next like you can with the Statistics by Magnus. Magnus also >>> has data on sources but this is more global. >>> Thanks, >>> GerardM >>> >>> On 8 December 2015 at 21:41, Markus Krötzsch < >>> mar...@semantic-mediawiki.org> wrote: >>> >>>> Hi Amir, >>>> >>>> Very nice, thanks! I like the general approach of having a stand-alone >>>> tool for analysing the data, and maybe pointing you to issues. Like a >>>> dashboard for Wikidata editors. >>>> >>>> What backend technology are you using to produce these results? Is this >>>> live data or dumped data? One could also get those numbers from the SPARQL >>>> endpoint, but performance might be problematic (since you compute averages >>>> over all items; a custom approach would of course be much faster but then >>>> you have the data update problem). >>>> >>>> An obvious feature request would be to display entity ids as links to >>>> the appropriate page, and maybe with their labels (in a language of your >>>> choice). >>>> >>>> But overall very nice. >>>> >>>> Regards, >>>> >>>> Markus >>>> >>>> >>>> On 08.12.2015 18:48, Amir Ladsgroup wrote: >>>> >>>>> Hey, >>>>> There has been several discussion regarding quality of information in >>>>> Wikidata. I wanted to work on quality of wikidata but we don't have any >>>>> source of good information to see where we are ahead and where we are >>>>> behind. So I thought the best thing I can do is to make something to >>>>> show people how exactly sourced our data is with details. So here we >>>>> have *http://tools.wmflabs.org/wd-analyst/index.php* >>>>> >>>>> You can give only a property (let's say P31) and it gives you the four >>>>> most used values + analyze of sources and quality in overall (check >>>>> this >>>>> out <http://tools.wmflabs.org/wd-analyst/index.php?p=P31>) >>>>> and then you can see about ~33% of them are sources which 29.1% of >>>>> them are based on Wikipedia. >>>>> You can give a property and multiple values you want. Let's say you >>>>> want >>>>> to compare P27:Q183 (Country of citizenship: Germany) and P27:Q30 (US) >>>>> Check this out >>>>> <http://tools.wmflabs.org/wd-analyst/index.php?p=P27&q=Q30|Q183>. And >>>>> you can see US biographies are more abundant (300K over 200K) but >>>>> German >>>>> biographies are more descriptive (3.8 description per item over 3.2 >>>>> description over item) >>>>> >>>>> One important note: Compare P31:Q5 (a trivial statement) 46% of them >>>>> are >>>>> not sourced at all and 49% of them are based on Wikipedia **but* *get >>>>> this statistics for population properties (P1082 >>>>> <http://tools.wmflabs.org/wd-analyst/index.php?p=P1082>) It's not a >>>>> trivial statement and we need to be careful about them. It turns out >>>>> there are slightly more than one reference per statement and only 4% of >>>>> them are based on Wikipedia. So we can relax and enjoy these >>>>> highly-sourced data. >>>>> >>>>> Requests: >>>>> >>>>> * Please tell me whether do you want this tool at all >>>>> * Please suggest more ways to analyze and catch unsourced materials >>>>> >>>>> Future plan (if you agree to keep using this tool): >>>>> >>>>> * Support more datatypes (e.g. date of birth based on year, >>>>> coordinates) >>>>> * Sitelink-based and reference-based analysis (to check how much of >>>>> articles of, let's say, Chinese Wikipedia are unsourced) >>>>> >>>>> * Free-style analysis: There is a database for this tool that can be >>>>> used for way more applications. You can get the most unsourced >>>>> statements of P31 and then you can go to fix them. I'm trying to >>>>> build a playground for this kind of tasks) >>>>> >>>>> I hope you like this and rock on! >>>>> <http://tools.wmflabs.org/wd-analyst/index.php?p=P136&q=Q11399> >>>>> Best >>>>> >>>>> >>>>> _______________________________________________ >>>>> Wikidata mailing list >>>>> Wikidata@lists.wikimedia.org >>>>> https://lists.wikimedia.org/mailman/listinfo/wikidata >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Wikidata mailing list >>>> Wikidata@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/wikidata >>>> >>> >>> >>> _______________________________________________ >>> Wikidata mailing list >>> Wikidata@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/wikidata >>> >>> >> >> >> -- >> Paul Houle >> >> *Applying Schemas for Natural Language Processing, Distributed Systems, >> Classification and Text Mining and Data Lakes* >> >> (607) 539 6254 paul.houle on Skype ontolo...@gmail.com >> >> :BaseKB -- Query Freebase Data With SPARQL >> http://basekb.com/gold/ >> >> Legal Entity Identifier Lookup >> https://legalentityidentifier.info/lei/lookup/ >> <http://legalentityidentifier.info/lei/lookup/> >> >> Join our Data Lakes group on LinkedIn >> https://www.linkedin.com/grp/home?gid=8267275 >> >> _______________________________________________ >> Wikidata mailing list >> Wikidata@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wikidata >> > > _______________________________________________ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata > >
_______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata