Re: [Wikidata] Aggregate info on Wikidata items

2016-08-27 Thread Markus Kroetzsch

On 27.08.2016 07:18, Sumit Asthana wrote:

Hi,

I'm trying to use offline wikidata dump
 but when I
run an example from Wikidata Toolkit - EntityStatisticsProcessor
,
I hit the following error - https://dpaste.de/TNpd.

Apparently it is unable to parse the dump but I can't seem to figure it
out. Help would be appreciated :)


This happens if your dump download was incomplete. It seems that 
(recently) the download is sometimes interrupted and needs to be resumed 
to get the whole file. Our implementation is not smart enough to fix 
this and ends up with an incomplete dump.


You can download the dump in any way you like, including using a browser 
with "safe as". I prefer to use wget. You just need to put it into the 
right directory where WDTK also puts dumps. When you start WDTK, it 
reports the file to be downloaded and the place where it puts the 
download, so this is one way to find out.


Dump files are the ones found at 
https://dumps.wikimedia.org/other/wikidata/ (with the file names used 
there). They go into the directory named like 
./dumpfiles/wikidatawiki/json-20160801 (for the dump 
https://dumps.wikimedia.org/other/wikidata/20160801.json.gz). The 
dumpfiles directory is under the directory from where you run your program.


Best,

Markus




-Thanks,
Sumit


On Sat, Aug 27, 2016 at 1:18 AM, Stas Malyshev mailto:smalys...@wikimedia.org>> wrote:

Hi!

> For example "I want to know the number of statements on an average with
> dead external reference links".

Since there are over a million links in references, you probably may
want to use dump - either JSON or RDF, and looking for references there.
It would be relatively easy to find those in reference statements.
However, checking a million links might require some careful planning :)
--
Stas Malyshev
smalys...@wikimedia.org 

___
Wikidata mailing list
Wikidata@lists.wikimedia.org 
https://lists.wikimedia.org/mailman/listinfo/wikidata





___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata




___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Aggregate info on Wikidata items

2016-08-26 Thread Sumit Asthana
Hi,

I'm trying to use offline wikidata dump
 but when I
run an example from Wikidata Toolkit - EntityStatisticsProcessor
,
I hit the following error - https://dpaste.de/TNpd.

Apparently it is unable to parse the dump but I can't seem to figure it
out. Help would be appreciated :)

-Thanks,
Sumit


On Sat, Aug 27, 2016 at 1:18 AM, Stas Malyshev 
wrote:

> Hi!
>
> > For example "I want to know the number of statements on an average with
> > dead external reference links".
>
> Since there are over a million links in references, you probably may
> want to use dump - either JSON or RDF, and looking for references there.
> It would be relatively easy to find those in reference statements.
> However, checking a million links might require some careful planning :)
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Aggregate info on Wikidata items

2016-08-26 Thread Stas Malyshev
Hi!

> For example "I want to know the number of statements on an average with
> dead external reference links".

Since there are over a million links in references, you probably may
want to use dump - either JSON or RDF, and looking for references there.
It would be relatively easy to find those in reference statements.
However, checking a million links might require some careful planning :)
-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Aggregate info on Wikidata items

2016-08-26 Thread Sumit Asthana
Hi,

I'm working on a project to assess Wikidata item quality. As part of this,
to begin with, I'm trying to get basic statistics on Wikidata items, which
requires work both on item level and over several items together.

For example "I want to know the number of statements on an average with
dead external reference links".

What would be the best way(programatically) to do so: scanning all the
items using api, or a subset of them or downloading the dump offline and
working on it?


-Thanks,
Sumit Asthana,
B.Tech Final Year,
Dept. of CSE,
IIT Patna
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata