Re: [Wikidata-tech] Found some missing wdt properties -- evidence of a bug?

2017-08-10 Thread Eric Scott
Ah. Thanks. I was going to ask why this ranking thing wasn't exposed in 
the interface, but now I see those little arrows over on the far left.


On 08/10/2017 07:03 AM, Marius Hoch wrote:

Hi Eric,

this is because the values that are not present are not the "best 
statements". That means that there are statements of a higher rank 
(https://www.wikidata.org/wiki/Wikidata:Glossary#Rank).


It's still possible to access these values, please have a look at 
https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Truthy_statements. 



Cheers,
Marius

On 08/10/2017 08:55 AM, Eric Scott wrote:


I know of two cases where there are statements in the wikibase which 
have not been translated to wdt's.


One is for spouses of Bradd Pitt, which retrieves wdt  for Angelina 
Jolie, but poor Jennifer Anniston is left out in the cold:


Select * Where
{
  wd:Q35332 ?p ?o.
  Filter Regex(Str(?p), "P26$")

 }
  ==>
  p o
<http://www.wikidata.org/prop/direct/P26> 
<http://www.wikidata.org/entity/Q13909>
<http://www.wikidata.org/prop/P26> 
<http://www.wikidata.org/entity/statement/Q35332-9167b7b7-422a-6027-0449-e7d2096900e5>
<http://www.wikidata.org/prop/P26> 
<http://www.wikidata.org/entity/statement/q35332-E60D2807-57AF-40D3-90D8-BC89F5AC3140>


This second wdt is also absent from the ttl cache: 
https://www.wikidata.org/wiki/Special:EntityData/Q35332.ttl


The analogous query for Humphrey Bogart retrieves all four of his 
marriages, each of which has beginning and end dates, just as Brad's 
marriages do.



Another example is Barack Obama's birthplace : 
http://tinyurl.com/ya7f948a (returns only the hospital he was born 
at, and ignores Honolulu)


Contrast this with more complete coverage of Putin's birthplace: 
http://tinyurl.com/ya7f948a


Is this evidence of a bug?

Thanks,




___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech






___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


[Wikidata-tech] Found some missing wdt properties -- evidence of a bug?

2017-08-10 Thread Eric Scott


I know of two cases where there are statements in the wikibase which 
have not been translated to wdt's.


One is for spouses of Bradd Pitt, which retrieves wdt  for Angelina 
Jolie, but poor Jennifer Anniston is left out in the cold:


Select * Where
{
  wd:Q35332 ?p ?o.
  Filter Regex(Str(?p), "P26$")

 }
  ==>
  p o
 

 

 



This second wdt is also absent from the ttl cache: 
https://www.wikidata.org/wiki/Special:EntityData/Q35332.ttl


The analogous query for Humphrey Bogart retrieves all four of his 
marriages, each of which has beginning and end dates, just as Brad's 
marriages do.



Another example is Barack Obama's birthplace : 
http://tinyurl.com/ya7f948a (returns only the hospital he was born at, 
and ignores Honolulu)


Contrast this with more complete coverage of Putin's birthplace: 
http://tinyurl.com/ya7f948a


Is this evidence of a bug?

Thanks,




___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


Re: [Wikidata-tech] Does a rollback also roll back revision history?

2017-07-31 Thread Eric Scott
My apologies. I was looking at the wrong Q-number for revision. The 
pertinent Q-number to check here was 
https://www.wikidata.org/w/index.php?title=Q4925477=history (The 
name "John"). My mistake.




On 07/31/2017 08:37 AM, Daniel Kinzler wrote:

Am 31.07.2017 um 17:01 schrieb Eric Scott:

* Is is indeed the case that rollbacks also roll back the revision history?

No. All edits are visible in the page history, including rollback, revert,
restore, undo, etc. The only kind of edit that is not recorded is a "null edit"
- an edit that changes nothing compared to the previous version (so it's not
actually an edit). This is sometimes used to rebuild cached derived data.


* Is there some other place we could look that records such rollbacks?

No. The page history is authoritative. It reflects all changes to the page
content. If you could find a way to trigger this kind of behavior, that would be
a HUGE bug. Let us know.

Note that for wikitext content, this doesn't mean that it contains all changes
to the visible rendering: when a transcluded template is changed, this changes
the rendering, but is not visible in the page's history (but it is instead
visible in the template's history). However, no transclusion mechanism exists
for Wikidata entities.




___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


[Wikidata-tech] org.wikidata.query.rdf.tool.Update-Contained error syncing. Giving up on Qxxxxx

2017-06-15 Thread Eric Scott
Starting last Tuesday, I've been getting a persistent error running the 
runUpdate.sh script from the Wikidata stand-alone facility.


Example:

06:34:20.402 [update 6] WARN  org.wikidata.query.rdf.tool.Update - 
Contained error syncing.  Giving up on Q13873706


org.wikidata.query.rdf.tool.exception.ContainedException: Unexpected 
status code fetching RDF for 
https://www.wikidata.org/wiki/Special:EntityData/Q13873706.ttl?nocache=1497533660269=dump: 
429
at 
org.wikidata.query.rdf.tool.wikibase.WikibaseRepository.fetchRdfForEntity(WikibaseRepository.java:267) 
~[wikidata-query-tools-0.2.3-SNAPSHOT-jar-with-dependencies.jar:na]
at org.wikidata.query.rdf.tool.Update.handleChange(Update.java:474) 
~[wikidata-query-tools-0.2.3-SNAPSHOT-jar-with-dependencies.jar:na]
at org.wikidata.query.rdf.tool.Update.access$0(Update.java:472) 
~[wikidata-query-tools-0.2.3-SNAPSHOT-jar-with-dependencies.jar:na]
at org.wikidata.query.rdf.tool.Update$1.run(Update.java:370) 
~[wikidata-query-tools-0.2.3-SNAPSHOT-jar-with-dependencies.jar:na]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[na:1.8.0_131]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[na:1.8.0_131]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_131]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_131]

at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]


Does anyone have an explanation for this?

Thanks,



___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


[Wikidata-tech] Synchronizing multiple copies of Wikibase to the WD mothership efficiently

2017-06-13 Thread Eric Scott

Greetings  -

We have three installations of Blazegraph/Wikibase sitting behind a load 
balancer. We synchronize each of these servers to the main Wikidata 
repository through the wikidata stand-alone facility 
(https://www.mediawiki.org/wiki/Wikidata_query_service/User_Manual#Standalone_service).


We would like to keep all three servers in sync and current without 
duplicating three separate sequences of update requests to the main 
Wikidata API. Could anyone provide us with guidance as to how best to do 
this? For example is there a straightforward way to set up a complete 
installation of the WD API on one of our servers, synchronize that one 
to the main WD service, then synchronize our other two Wikibase servers 
to our local installation? Alternatively, has anyone done this using 
some kind of proxy scheme whose details they could share?


Any help would be appreciated.

Thanks,

Eric Scott

___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


Re: [Wikidata-tech] lagging runUpdate.sh on wikidata stand-alone

2016-11-04 Thread Eric Scott
Thanks for your response.  Actually the systems it's being run on are 
pretty well equipped with multiple cores and plenty of memory.


I believe the problem arose from the fact that the rccontinue parameter 
is not being carried forward from previous calls to the wikibase API. 
Refactoring the code to do so seems to have fixed the problem.


Cheers,

On 11/02/2016 11:57 AM, Stas Malyshev wrote:

Hi!


We've been using a locally installed wikidata stand-alone service
(https://www.mediawiki.org/wiki/Wikidata_query_service/User_Manual#Standalone_service)
for several months now. Recently the service went down for a significant
amount of time, and when we ran runUpdate.sh -n wdq, instead of catching
up to real time as it usually does, the update process lagged, failing
even to keep parity with real time.

Hmm... This usually means that the Blazegaph install is underpowered and
the queries for update can't run in time. Try increasing batch size,
maybe, but usually that doesn't change much, if the host is not
performant enough to keep with the data.


INFO org.wikidata.query.rdf.tool.RdfRepository - HTTP request failed:
org.apache.http.NoHttpResponseException: wikidata.cb.ntent.com:
failed to respond, retrying in 2175 ms.

Do you have any other exceptions surrounding it, or any accompanying
exceptions on Blazegraph side?


This problem started about 3 days ago, and we're now polling up to a
point in time 18 hours earlier than real time.

It also can happen if the edit volume spikes, and then it should catch
up when the spike passes. But if that's not the case, I'd try to run
Blazegraph on stronger machine.


Also: is this an appropriate list to write to with such problems? Are
there more appropriate places?

Blazegraph list could help too, for BG-specific questions:
bigdata-develop...@lists.sourceforge.net
There is a good platform to discuss performance/optimization questions.




___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


[Wikidata-tech] lagging runUpdate.sh on wikidata stand-alone

2016-10-28 Thread Eric Scott

Hi all -

We've been using a locally installed wikidata stand-alone service 
(https://www.mediawiki.org/wiki/Wikidata_query_service/User_Manual#Standalone_service) 
for several months now. Recently the service went down for a significant 
amount of time, and when we ran runUpdate.sh -n wdq, instead of catching 
up to real time as it usually does, the update process lagged, failing 
even to keep parity with real time.


Example output from the log:

09:30:39.805 [main] INFO  org.wikidata.query.rdf.tool.Update - Polled up 
to 2016-10-24T23:01:05Z at (0.0, 0.0, 0.0) updates per second and 
(271.8, 56.2, 18.8) milliseconds per second


This is normal when starting the update of course, but the system never 
seems to find its feet, and continues to stumble and lag. Restarting 
both the blazegraph process and the update process has no lasting effect.


From time to time, a message like this will appear:

INFO org.wikidata.query.rdf.tool.RdfRepository - HTTP request failed: 
org.apache.http.NoHttpResponseException: wikidata.cb.ntent.com: 
failed to respond, retrying in 2175 ms.


I have experienced this effect in the past, and had success replacing an 
old journal which was the product of a long update process with a new 
journal rebuilt  from the latest dump. This strategy did not work. I 
tried rebuilding with the latest git pull from origin and rebuilding the 
journal, again with no effect.


This problem started about 3 days ago, and we're now polling up to a 
point in time 18 hours earlier than real time.


I would appreciate any guidance.

Also: is this an appropriate list to write to with such problems? Are 
there more appropriate places?


Thanks,

Eric Scott


___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


[Wikidata-tech] Wikidata stand-alone service peformance problems and the Blazegraph multi-GPU architecture

2016-10-27 Thread Eric Scott

Hi all -

We've been using a locally installed wikidata stand-alone service 
(https://www.mediawiki.org/wiki/Wikidata_query_service/User_Manual#Standalone_service) 
for several months now. We're becoming increasingly plagued by 
performance issues, and are wondering if one approach to the problem 
might be to adopt the Blazegraph multi-GPU architecture 
(https://www.blazegraph.com/product/gpu-accelerated/).


Could anyone provide guidance as to how much pain would be involved in 
making such a transition?


Thanks,

Eric Scott


___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


[Wikidata-tech] Enabled federated queries on my local stand-alone, but bindings don't 'take'

2016-04-11 Thread Eric Scott
I posted this last week to wikid...@lists.wikimedia.org, but perhaps 
this is a better forum for this problem. Re-posting it here.


Federated queries are disabled on the stand-alone code out-of-the-box.

I followed the advice on this post to un-disable federated queries for 
my local installation: 
https://lists.wikimedia.org/pipermail/wikidata/2016-March/008444.html


(Thanks Mr. Malyshev!)

I did have one problem with the build: I found myself having to remove 
the checkstyle plugin from the pom.xml file to get the maven build to go 
to completion.


Now on my local installation I can get this query to work just fine:

(Current namespace: wdq)
Select *
Where
{
  Service 
  {
Bind ("Algeria"@en as ?countryLabel)
  }
}

Also this:

(current namespace: test)
prefix rdfs: 
Select *
Where
{
  Service 
  {
Bind ("Algeria"@en as ?countryLabel)
?country rdfs:label ?countryLabel.
  }
}


But this query (and every variation I could think of) unfortunately 
times out:

(current namespace: wdq)
prefix rdfs: 
Select *
Where
{
  {
Select ?countryLabel
{
  Service 
  {
Bind ("Algeria"@en as ?countryLabel)
  }
}
  }
  ?country rdfs:label ?countryLabel.
}

This is also the case when addressing a Jena service I have set up on a 
different port.


I would appreciate any guidance you could give me.

Thanks and regards,

___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech