Ok, mystery solved:
WDTK checks online which dump files are available. The only way of doing
this is to fetch the Wikimedia-provided listings of the file
directories. These listings are in HTML format, so to find out which
files are available, we need to parse the HTML. It seems that the layout
of this HTML file has changed last week (must have been last week since
I got the Jan 12 dump in the usual way). As a result, WDTK no longer
finds any files on those pages. If you already have downloaded a file
before, it will just use that, but if you have no files either, then the
"most recent dump" will be null -- leading to an exception downstream.
It is easy to fix this (though I will not fix it tonight, but tomorrow)
by just adjusting the HTML strings we parse for. We should also improve
our error reporting for this case, obviously. I don't know how stable
this will be: if the HTML pages change layout again, it will break again.
Question to the MW folks: Is there any machine-readable API to get the
list of available dump files?
Cheers,
Markus
On 17.01.2015 22:53, Egon Willighagen wrote:
OK, thanks! Please let me know if I can test anything on my system.
Egon
On 17 Jan 2015 22:50, "Markus Krötzsch" <mar...@semantic-mediawiki.org
<mailto:mar...@semantic-mediawiki.org>> wrote:
On 17.01.2015 22:43, Egon Willighagen wrote:
This last test from the cmd line is already with master from
GitHub...
Thanks, we will investigate. I created a bug report at
https://github.com/Wikidata/__Wikidata-Toolkit/issues/114
<https://github.com/Wikidata/Wikidata-Toolkit/issues/114>
Markus
Egon
On 17 Jan 2015 22:40, "Markus Krötzsch"
<mar...@semantic-mediawiki.org
<mailto:mar...@semantic-mediawiki.org>
<mailto:markus@semantic-__mediawiki.org
<mailto:mar...@semantic-mediawiki.org>>> wrote:
Hi Egon,
WDTK 0.3.0 is rather old and we are about to prepare a new
release
(there are other issues with 0.3.0: the JSON format has changed
since its release and it won't read the files anyway).
Could you try
if the problem occurs with the current development code at
github?
Cheers,
Markus
On 17.01.2015 16:59, Egon Willighagen wrote:
Hi all,
I have been trying today to get the Java library
Wikidata-Toolkit
going, but about to give up... I keep running with both
0.3.0 and
current master into a NullPointerException... I thought
it was how I
called the code, and did add several System.out calls,
and in
the end
just tried to get it running from the command line... I
tried the
example from the website (though replaced the Dump
examples, which I
don't see in master; btw, "mvn test" runs fine) using a
pristine
master:
$ cd wdtk-examples/
$ mvn compile
$ mvn exec:java
-Dexec.mainClass="org.____wikidata.wdtk.examples.____EntityStatisticsProcessor"
In doing so, I get the same NPE:
******************************____****************************__**__********
*** Wikidata Toolkit: EntityStatisticsProcessor
***
*** This program will download and process dumps from
Wikidata.
*** It will print progress information and some simple
statistics.
*** Results about property usage will be stored in a
CSV file.
*** See source code for further details.
******************************____****************************__**__********
2015-01-17 16:53:00 INFO - Using download directory
/home/egonw/var/Projects/____GitHub/Wikidata-Toolkit/wdtk-____examples/dumpfiles/____wikidatawiki
[WARNING]
java.lang.reflect.____InvocationTargetException
at
sun.reflect.____NativeMethodAccessorImpl.____invoke0(Native Method)
at
sun.reflect.____NativeMethodAccessorImpl.____invoke(____NativeMethodAccessorImpl.java:____57)
at
sun.reflect.____DelegatingMethodAccessorImpl.____invoke(____DelegatingMethodAccessorImpl.____java:43)
at
java.lang.reflect.Method.____invoke(Method.java:606)
at
org.codehaus.mojo.exec.____ExecJavaMojo$1.run(____ExecJavaMojo.java:293)
at java.lang.Thread.run(Thread.____java:745)
Caused by: java.lang.NullPointerException
at
org.wikidata.wdtk.dumpfiles.____DumpProcessingController.____processDumpFile(____DumpProcessingController.java:____470)
at
org.wikidata.wdtk.dumpfiles.____DumpProcessingController.____processMostRecentDump(____DumpProcessingController.java:____456)
at
org.wikidata.wdtk.dumpfiles.____DumpProcessingController.____processMostRecentJsonDump(____DumpProcessingController.java:____426)
at
org.wikidata.wdtk.examples.____ExampleHelpers.____processEntitiesFromWikidataDum____p(ExampleHelpers.java:158)
at
org.wikidata.wdtk.examples.____EntityStatisticsProcessor.____main(____EntityStatisticsProcessor.____java:88)
... 6 more
I tried finding what goes wrong, but cannot grasp all
the magic that
is going on... the directory it reports was created,
but is empty...
$ mvn --version
Apache Maven 3.0.5
Maven home: /usr/share/maven
Java version: 1.7.0_65, vendor: Oracle Corporation
Java home: /usr/lib/jvm/java-7-openjdk-____i386/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "3.16.0-4-686-pae", arch:
"i386",
family: "unix"
Can someone give me some pointers where and how it is
testing of
dump
files exist? Is this problem something platform dependent?
Thanks,
Egon
___________________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
<mailto:Wikidata-l@lists.wikimedia.org>
<mailto:Wikidata-l@lists.__wikimedia.org
<mailto:Wikidata-l@lists.wikimedia.org>>
https://lists.wikimedia.org/____mailman/listinfo/wikidata-l
<https://lists.wikimedia.org/__mailman/listinfo/wikidata-l>
<https://lists.wikimedia.org/__mailman/listinfo/wikidata-l
<https://lists.wikimedia.org/mailman/listinfo/wikidata-l>>
_________________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
<mailto:Wikidata-l@lists.wikimedia.org>
https://lists.wikimedia.org/__mailman/listinfo/wikidata-l
<https://lists.wikimedia.org/mailman/listinfo/wikidata-l>
_________________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org>
https://lists.wikimedia.org/__mailman/listinfo/wikidata-l
<https://lists.wikimedia.org/mailman/listinfo/wikidata-l>
_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l
_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l