Addshore closed this task as "Invalid".
Addshore added a comment.


  Run an empty blazegraph container.
  
    docker run -d -p 9999:9999 --env WIKIBASE_SCHEME=https --env 
WIKIBASE_HOST=intentionally-empty.wiki.opencura.com --env WDQS_HOST=localhost 
--env WDQS_PORT=9999 --name demo-wdqs wikibase/wdqs:0.3.40 /runBlazegraph.sh
  
  Wait for the service to come up, and make sure it is empty
  
    curl 
"localhost:9999/bigdata/sparql?query=SELECT%20%2A%20WHERE%20%7B%3Fa%20%3Fb%20%3Fc%7D"
  
  You should see something like this
  
    <?xml version='1.0' encoding='UTF-8'?>
    <sparql xmlns='http://www.w3.org/2005/sparql-results#'>
            <head>
                    <variable name='a'/>
                    <variable name='b'/>
                    <variable name='c'/>
            </head>
            <results>
            </results>
    </sparql>
  
  Run the updater once pointing to some wikibase, and the query service we just 
made
  
    docker exec demo-wdqs /runUpdate.sh
  
  You should see something like this, and you can kill / stop it after a few 
loops (Ctrl+C)
  
    wait-for-it.sh: waiting 300 seconds for 
intentionally-empty.wiki.opencura.com:80
    wait-for-it.sh: intentionally-empty.wiki.opencura.com:80 is available after 
0 seconds
    wait-for-it.sh: waiting 300 seconds for localhost:9999
    wait-for-it.sh: localhost:9999 is available after 0 seconds
    Updating via http://localhost:9999/bigdata/namespace/wdq/sparql
    #logback.classic pattern: %d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - 
%msg%n
    18:00:17.284 [main] INFO  org.wikidata.query.rdf.tool.Update - Starting 
Updater 0.3.40 (a115a80eec974454d140389e1f52aad0e54913f9)
    18:00:18.959 [main] INFO  o.w.q.r.t.change.ChangeSourceContext - Checking 
where we left off
    18:00:18.960 [main] INFO  o.w.query.rdf.tool.rdf.RdfRepository - Checking 
for left off time from the updater
    18:00:19.267 [main] INFO  o.w.query.rdf.tool.rdf.RdfRepository - Checking 
for left off time from the dump
    18:00:19.333 [main] INFO  o.w.q.r.t.change.ChangeSourceContext - Defaulting 
start time to 30 days ago: 2021-02-15T18:00:19.333Z
    18:00:20.452 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got no 
real changes
    18:00:20.780 [main] INFO  org.wikidata.query.rdf.tool.Updater - Polled up 
to 2021-02-15T18:00:19.333Z at (0.0, 0.0, 0.0) updates per second and (0.0, 
0.0, 0.0) milliseconds per second
    18:00:21.066 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got no 
real changes
    18:00:21.067 [main] INFO  org.wikidata.query.rdf.tool.Updater - Sleeping 
for 10 secs
    18:00:31.661 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got no 
real changes
    18:00:31.662 [main] INFO  org.wikidata.query.rdf.tool.Updater - Sleeping 
for 10 secs
  
  Run the updater again.
  
    docker exec demo-wdqs /runUpdate.sh
  
  This time you should see the error
  
    wait-for-it.sh: waiting 300 seconds for 
intentionally-empty.wiki.opencura.com:80
    wait-for-it.sh: intentionally-empty.wiki.opencura.com:80 is available after 
0 seconds
    wait-for-it.sh: waiting 300 seconds for localhost:9999
    wait-for-it.sh: localhost:9999 is available after 0 seconds
    Updating via http://localhost:9999/bigdata/namespace/wdq/sparql
    #logback.classic pattern: %d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - 
%msg%n
    18:00:55.545 [main] INFO  org.wikidata.query.rdf.tool.Update - Starting 
Updater 0.3.40 (a115a80eec974454d140389e1f52aad0e54913f9)
    18:00:57.495 [main] INFO  o.w.q.r.t.change.ChangeSourceContext - Checking 
where we left off
    18:00:57.496 [main] INFO  o.w.query.rdf.tool.rdf.RdfRepository - Checking 
for left off time from the updater
    18:00:57.996 [main] INFO  o.w.query.rdf.tool.rdf.RdfRepository - Found left 
off time from the updater
    18:00:58.000 [main] ERROR org.wikidata.query.rdf.tool.Update - Error during 
initialization.
    java.lang.IllegalStateException: RDF store reports the last update time is 
before the minimum safe poll time.  You will have to reload from scratch or you 
might have missing data.
            at 
org.wikidata.query.rdf.tool.change.ChangeSourceContext.getStartTime(ChangeSourceContext.java:100)
            at org.wikidata.query.rdf.tool.Update.initialize(Update.java:145)
            at org.wikidata.query.rdf.tool.Update.main(Update.java:98)
    Exception in thread "main" java.lang.IllegalStateException: RDF store 
reports the last update time is before the minimum safe poll time.  You will 
have to reload from scratch or you might have missing data.
            at 
org.wikidata.query.rdf.tool.change.ChangeSourceContext.getStartTime(ChangeSourceContext.java:100)
            at org.wikidata.query.rdf.tool.Update.initialize(Update.java:145)
            at org.wikidata.query.rdf.tool.Update.main(Update.java:98)
  
  This is because the timestamp recording where updates are has been set, and 
is no longer "safe".
  
  This can be seen as a triple, and is by default 30 days ago.
  
    curl 
"localhost:9999/bigdata/sparql?query=SELECT%20%2A%20WHERE%20%7B%3Fa%20%3Fb%20%3Fc%7D"
  
  
  
    <?xml version='1.0' encoding='UTF-8'?>
    <sparql xmlns='http://www.w3.org/2005/sparql-results#'>
            <head>
                    <variable name='a'/>
                    <variable name='b'/>
                    <variable name='c'/>
            </head>
            <results>
                    <result>
                            <binding name='a'>
                                    
<uri>https://intentionally-empty.wiki.opencura.com</uri>
                            </binding>
                            <binding name='b'>
                                    <uri>http://schema.org/dateModified</uri>
                            </binding>
                            <binding name='c'>
                                    <literal 
datatype='http://www.w3.org/2001/XMLSchema#dateTime'>2021-02-15T18:00:18Z</literal>
                            </binding>
                    </result>
            </results>
    </sparql>
  
  If everything is safe to update, and you're not going to end up missing data, 
you can reset this time, to a date in the last 30 days.
  (Overriding what is normally done 
https://github.com/wmde/wikibase-docker/blob/0c561dd6c17a918323b44c7282b5e5acccfd4e45/wdqs/0.3.40/runUpdate.sh#L9)
  
    docker exec demo-wdqs bash -c '/wdqs/runUpdate.sh -h 
http://${WDQS_HOST}:${WDQS_PORT} -- --wikibaseUrl 
${WIKIBASE_SCHEME}://${WIKIBASE_HOST} --conceptUri 
${WIKIBASE_SCHEME}://${WIKIBASE_HOST} --entityNamespaces 
${WDQS_ENTITY_NAMESPACES} --init --start 20210301010101'
  
  The date is now updated
  
    curl 
"localhost:9999/bigdata/sparql?query=SELECT%20%2A%20WHERE%20%7B%3Fa%20%3Fb%20%3Fc%7D"
  
  Should show something like
  
    <?xml version='1.0' encoding='UTF-8'?>
    <sparql xmlns='http://www.w3.org/2005/sparql-results#'>
            <head>
                    <variable name='a'/>
                    <variable name='b'/>
                    <variable name='c'/>
            </head>
            <results>
                    <result>
                            <binding name='a'>
                                    
<uri>https://intentionally-empty.wiki.opencura.com</uri>
                            </binding>
                            <binding name='b'>
                                    <uri>http://schema.org/dateModified</uri>
                            </binding>
                            <binding name='c'>
                                    <literal 
datatype='http://www.w3.org/2001/XMLSchema#dateTime'>2021-03-01T01:01:00Z</literal>
                            </binding>
                    </result>
            </results>
    </sparql>
  
  I'm going to close this ticket now as the scope of it is rather unclear.
  The case mentioned above should not really be happening during regular 
operation of a wikibase, but perhaps we need to make the last step here 
(resetting the timestamp) more resilient, and perhaps the default behaviour 
when using an empty wikibase a bit better.
  This would need some collaboration between wmde and the wikidata query 
service team.
  If people have individual bugs or feature requests then new tickets are 
welcome!

TASK DETAIL
  https://phabricator.wikimedia.org/T186161

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: RShigapov, danshick-wmde, Samantha_Alipio_WMDE, darthmon_wmde, WMDE-leszek, 
Superraptor123, Tinyttt, Louperivois, Jsamwrites, Considering.Different.Routes, 
DarTar, Addshore, Andrawaag, Aklapper, maantietaja, Akuckartz, Jelabra, 
Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Asahiko, abian, despens, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to