Ok, mystery solved:

WDTK checks online which dump files are available. The only way of doing this is to fetch the Wikimedia-provided listings of the file directories. These listings are in HTML format, so to find out which files are available, we need to parse the HTML. It seems that the layout of this HTML file has changed last week (must have been last week since I got the Jan 12 dump in the usual way). As a result, WDTK no longer finds any files on those pages. If you already have downloaded a file before, it will just use that, but if you have no files either, then the "most recent dump" will be null -- leading to an exception downstream.

It is easy to fix this (though I will not fix it tonight, but tomorrow) by just adjusting the HTML strings we parse for. We should also improve our error reporting for this case, obviously. I don't know how stable this will be: if the HTML pages change layout again, it will break again.

Question to the MW folks: Is there any machine-readable API to get the list of available dump files?

Cheers,

Markus

On 17.01.2015 22:53, Egon Willighagen wrote:
OK, thanks! Please let me know if I can test anything on my system.

Egon

On 17 Jan 2015 22:50, "Markus Krötzsch" <mar...@semantic-mediawiki.org
<mailto:mar...@semantic-mediawiki.org>> wrote:

    On 17.01.2015 22:43, Egon Willighagen wrote:

        This last test from the cmd line is already with master from
        GitHub...


    Thanks, we will investigate. I created a bug report at

    https://github.com/Wikidata/__Wikidata-Toolkit/issues/114
    <https://github.com/Wikidata/Wikidata-Toolkit/issues/114>

    Markus


        Egon

        On 17 Jan 2015 22:40, "Markus Krötzsch"
        <mar...@semantic-mediawiki.org
        <mailto:mar...@semantic-mediawiki.org>
        <mailto:markus@semantic-__mediawiki.org
        <mailto:mar...@semantic-mediawiki.org>>> wrote:

             Hi Egon,

             WDTK 0.3.0 is rather old and we are about to prepare a new
        release
             (there are other issues with 0.3.0: the JSON format has changed
             since its release and it won't read the files anyway).
        Could you try
             if the problem occurs with the current development code at
        github?

             Cheers,

             Markus

             On 17.01.2015 16:59, Egon Willighagen wrote:

                 Hi all,

                 I have been trying today to get the Java library
        Wikidata-Toolkit
                 going, but about to give up... I keep running with both
        0.3.0 and
                 current master into a NullPointerException... I thought
        it was how I
                 called the code, and did add several System.out calls,
        and in
                 the end
                 just tried to get it running from the command line... I
        tried the
                 example from the website (though replaced the Dump
        examples, which I
                 don't see in master; btw, "mvn test" runs fine) using a
        pristine
                 master:

                 $ cd wdtk-examples/
                 $ mvn compile
                 $ mvn exec:java

        
-Dexec.mainClass="org.____wikidata.wdtk.examples.____EntityStatisticsProcessor"

                 In doing so, I get the same NPE:


        
******************************____****************************__**__********
                 *** Wikidata Toolkit: EntityStatisticsProcessor
                 ***
                 *** This program will download and process dumps from
        Wikidata.
                 *** It will print progress information and some simple
        statistics.
                 *** Results about property usage will be stored in a
        CSV file.
                 *** See source code for further details.

        
******************************____****************************__**__********
                 2015-01-17 16:53:00 INFO  - Using download directory

        
/home/egonw/var/Projects/____GitHub/Wikidata-Toolkit/wdtk-____examples/dumpfiles/____wikidatawiki
                 [WARNING]
                 java.lang.reflect.____InvocationTargetException
                           at

        sun.reflect.____NativeMethodAccessorImpl.____invoke0(Native Method)
                           at

        
sun.reflect.____NativeMethodAccessorImpl.____invoke(____NativeMethodAccessorImpl.java:____57)
                           at

        
sun.reflect.____DelegatingMethodAccessorImpl.____invoke(____DelegatingMethodAccessorImpl.____java:43)
                           at
        java.lang.reflect.Method.____invoke(Method.java:606)
                           at

        org.codehaus.mojo.exec.____ExecJavaMojo$1.run(____ExecJavaMojo.java:293)
                           at java.lang.Thread.run(Thread.____java:745)
                 Caused by: java.lang.NullPointerException
                           at

        
org.wikidata.wdtk.dumpfiles.____DumpProcessingController.____processDumpFile(____DumpProcessingController.java:____470)
                           at

        
org.wikidata.wdtk.dumpfiles.____DumpProcessingController.____processMostRecentDump(____DumpProcessingController.java:____456)
                           at

        
org.wikidata.wdtk.dumpfiles.____DumpProcessingController.____processMostRecentJsonDump(____DumpProcessingController.java:____426)
                           at

        
org.wikidata.wdtk.examples.____ExampleHelpers.____processEntitiesFromWikidataDum____p(ExampleHelpers.java:158)
                           at

        
org.wikidata.wdtk.examples.____EntityStatisticsProcessor.____main(____EntityStatisticsProcessor.____java:88)
                           ... 6 more


                 I tried finding what goes wrong, but cannot grasp all
        the magic that
                 is going on... the directory it reports was created,
        but is empty...

                 $ mvn --version
                 Apache Maven 3.0.5
                 Maven home: /usr/share/maven
                 Java version: 1.7.0_65, vendor: Oracle Corporation
                 Java home: /usr/lib/jvm/java-7-openjdk-____i386/jre
                 Default locale: en_US, platform encoding: UTF-8
                 OS name: "linux", version: "3.16.0-4-686-pae", arch:
        "i386",
                 family: "unix"

                 Can someone give me some pointers where and how it is
        testing of
                 dump
                 files exist? Is this problem something platform dependent?

                 Thanks,

                 Egon



             ___________________________________________________
             Wikidata-l mailing list
        Wikidata-l@lists.wikimedia.org
        <mailto:Wikidata-l@lists.wikimedia.org>
        <mailto:Wikidata-l@lists.__wikimedia.org
        <mailto:Wikidata-l@lists.wikimedia.org>>
        https://lists.wikimedia.org/____mailman/listinfo/wikidata-l
        <https://lists.wikimedia.org/__mailman/listinfo/wikidata-l>
             <https://lists.wikimedia.org/__mailman/listinfo/wikidata-l
        <https://lists.wikimedia.org/mailman/listinfo/wikidata-l>>



        _________________________________________________
        Wikidata-l mailing list
        Wikidata-l@lists.wikimedia.org
        <mailto:Wikidata-l@lists.wikimedia.org>
        https://lists.wikimedia.org/__mailman/listinfo/wikidata-l
        <https://lists.wikimedia.org/mailman/listinfo/wikidata-l>



    _________________________________________________
    Wikidata-l mailing list
    Wikidata-l@lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org>
    https://lists.wikimedia.org/__mailman/listinfo/wikidata-l
    <https://lists.wikimedia.org/mailman/listinfo/wikidata-l>



_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l



_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Reply via email to