Re: [Wikidata] Wikidata - short biographies

2016-02-02 Thread Hampton Snowball
Okay, thanks!

On Tue, Feb 2, 2016 at 8:55 AM, Edgard Marx 
wrote:

> Hey,
>
> I recommend you to not post doubts related with third part systems or
> softwares that are not related with Wikidata or Wikimida here.
> In case of RDFSlice there is a page called issues (
> https://bitbucket.org/emarx/rdfslice/issues),
> where you can open an issue and someone will answer you.
>
> I also advise you to post your command line or a error, so the developers
> can better understand it and quickly fix it (if there is a problem).
>
> best regards,
> Edgard
>
> On Tue, Feb 2, 2016 at 7:18 AM, Hampton Snowball <
> hamptonsnowb...@gmail.com> wrote:
>
>> I was able to semi-successfully use RDFSlice with the dump using Windows
>> command prompt.  Only, maybe because it's a 5gb dump file I am getting java
>> errors line after line as it goes through the file
>> (java.lang.StringIndexOutOfBoundsException: String index out of range - 1.
>> Sometimes the last number changes).
>>
>> I thought it might might be a memory issue.  Increasing memory with the
>> -Xmx2G command (or 3G, 4G) I haven't had luck with.  Any tips would be
>> appreciated.
>>
>> Thanks
>>
>> On Mon, Feb 1, 2016 at 7:28 PM, Hampton Snowball <
>> hamptonsnowb...@gmail.com> wrote:
>>
>>> Of course I meant sorry if this is a dumb question :)
>>>
>>>
>>>
>>> On Mon, Feb 1, 2016 at 7:13 PM, Hampton Snowball <
>>> hamptonsnowb...@gmail.com> wrote:
>>>
 Sorry if this is a dump question (I'm not a developer).  To run the
 command on the rdfslice program in mentions (" java -jar rdfslice.jar
 -source | -patterns  -out  -order
  -debug ), can this be done with windows
 command prompt? or do I need some special developer version of 
 java/console?

 Thanks for the tool.

 On Sun, Jan 31, 2016 at 3:53 PM, Edgard Marx <
 m...@informatik.uni-leipzig.de> wrote:

> Hey,
> you can simple use RDFSlice (
> https://bitbucket.org/emarx/rdfslice/overview) directly on the dump
> file (https://dumps.wikimedia.org/wikidatawiki/entities/20160125/)
>
> best,
> Edgard
>
> On Sun, Jan 31, 2016 at 7:43 PM, Hampton Snowball <
> hamptonsnowb...@gmail.com> wrote:
>
>> Hello,
>>
>> I am interested in a subset of wikidata and I am trying to find the
>> best way to get it without getting a larger dataset then necessary.
>>
>> Is there a way to just get the "bios" that appear on the wikidata
>> pages below the name of the person/organization, as well as the link to 
>> the
>> english wikipedia page / or all wikipedia pages?
>>
>> For example from: https://www.wikidata.org/wiki/Q1652291";
>>
>> "Turkish female given name"
>> https://en.wikipedia.org/wiki/H%C3%BClya
>> and optionally https://de.wikipedia.org/wiki/H%C3%BClya
>>
>> I know there is SPARQL which previously this list helped me construct
>> a query, but I know some requests seem to timeout when looking at a large
>> amount of data so I am not sure this would work.
>>
>> The dumps I know are the full dataset, but I am not sure if there's
>> any other subset dumps available or better way of grabbing this data
>>
>> Thanks in advance,
>> HS
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>

>>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata - short biographies

2016-02-02 Thread Edgard Marx
Hey,

I recommend you to not post doubts related with third part systems or
softwares that are not related with Wikidata or Wikimida here.
In case of RDFSlice there is a page called issues (
https://bitbucket.org/emarx/rdfslice/issues),
where you can open an issue and someone will answer you.

I also advise you to post your command line or a error, so the developers
can better understand it and quickly fix it (if there is a problem).

best regards,
Edgard

On Tue, Feb 2, 2016 at 7:18 AM, Hampton Snowball 
wrote:

> I was able to semi-successfully use RDFSlice with the dump using Windows
> command prompt.  Only, maybe because it's a 5gb dump file I am getting java
> errors line after line as it goes through the file
> (java.lang.StringIndexOutOfBoundsException: String index out of range - 1.
> Sometimes the last number changes).
>
> I thought it might might be a memory issue.  Increasing memory with the
> -Xmx2G command (or 3G, 4G) I haven't had luck with.  Any tips would be
> appreciated.
>
> Thanks
>
> On Mon, Feb 1, 2016 at 7:28 PM, Hampton Snowball <
> hamptonsnowb...@gmail.com> wrote:
>
>> Of course I meant sorry if this is a dumb question :)
>>
>>
>>
>> On Mon, Feb 1, 2016 at 7:13 PM, Hampton Snowball <
>> hamptonsnowb...@gmail.com> wrote:
>>
>>> Sorry if this is a dump question (I'm not a developer).  To run the
>>> command on the rdfslice program in mentions (" java -jar rdfslice.jar
>>> -source | -patterns  -out  -order
>>>  -debug ), can this be done with windows command
>>> prompt? or do I need some special developer version of java/console?
>>>
>>> Thanks for the tool.
>>>
>>> On Sun, Jan 31, 2016 at 3:53 PM, Edgard Marx <
>>> m...@informatik.uni-leipzig.de> wrote:
>>>
 Hey,
 you can simple use RDFSlice (
 https://bitbucket.org/emarx/rdfslice/overview) directly on the dump
 file (https://dumps.wikimedia.org/wikidatawiki/entities/20160125/)

 best,
 Edgard

 On Sun, Jan 31, 2016 at 7:43 PM, Hampton Snowball <
 hamptonsnowb...@gmail.com> wrote:

> Hello,
>
> I am interested in a subset of wikidata and I am trying to find the
> best way to get it without getting a larger dataset then necessary.
>
> Is there a way to just get the "bios" that appear on the wikidata
> pages below the name of the person/organization, as well as the link to 
> the
> english wikipedia page / or all wikipedia pages?
>
> For example from: https://www.wikidata.org/wiki/Q1652291";
>
> "Turkish female given name"
> https://en.wikipedia.org/wiki/H%C3%BClya
> and optionally https://de.wikipedia.org/wiki/H%C3%BClya
>
> I know there is SPARQL which previously this list helped me construct
> a query, but I know some requests seem to timeout when looking at a large
> amount of data so I am not sure this would work.
>
> The dumps I know are the full dataset, but I am not sure if there's
> any other subset dumps available or better way of grabbing this data
>
> Thanks in advance,
> HS
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>

 ___
 Wikidata mailing list
 Wikidata@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata


>>>
>>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata - short biographies

2016-02-02 Thread Hampton Snowball
I was able to semi-successfully use RDFSlice with the dump using Windows
command prompt.  Only, maybe because it's a 5gb dump file I am getting java
errors line after line as it goes through the file
(java.lang.StringIndexOutOfBoundsException: String index out of range - 1.
Sometimes the last number changes).

I thought it might might be a memory issue.  Increasing memory with the
-Xmx2G command (or 3G, 4G) I haven't had luck with.  Any tips would be
appreciated.

Thanks

On Mon, Feb 1, 2016 at 7:28 PM, Hampton Snowball 
wrote:

> Of course I meant sorry if this is a dumb question :)
>
>
>
> On Mon, Feb 1, 2016 at 7:13 PM, Hampton Snowball <
> hamptonsnowb...@gmail.com> wrote:
>
>> Sorry if this is a dump question (I'm not a developer).  To run the
>> command on the rdfslice program in mentions (" java -jar rdfslice.jar
>> -source | -patterns  -out  -order
>>  -debug ), can this be done with windows command
>> prompt? or do I need some special developer version of java/console?
>>
>> Thanks for the tool.
>>
>> On Sun, Jan 31, 2016 at 3:53 PM, Edgard Marx <
>> m...@informatik.uni-leipzig.de> wrote:
>>
>>> Hey,
>>> you can simple use RDFSlice (
>>> https://bitbucket.org/emarx/rdfslice/overview) directly on the dump
>>> file (https://dumps.wikimedia.org/wikidatawiki/entities/20160125/)
>>>
>>> best,
>>> Edgard
>>>
>>> On Sun, Jan 31, 2016 at 7:43 PM, Hampton Snowball <
>>> hamptonsnowb...@gmail.com> wrote:
>>>
 Hello,

 I am interested in a subset of wikidata and I am trying to find the
 best way to get it without getting a larger dataset then necessary.

 Is there a way to just get the "bios" that appear on the wikidata pages
 below the name of the person/organization, as well as the link to the
 english wikipedia page / or all wikipedia pages?

 For example from: https://www.wikidata.org/wiki/Q1652291";

 "Turkish female given name"
 https://en.wikipedia.org/wiki/H%C3%BClya
 and optionally https://de.wikipedia.org/wiki/H%C3%BClya

 I know there is SPARQL which previously this list helped me construct a
 query, but I know some requests seem to timeout when looking at a large
 amount of data so I am not sure this would work.

 The dumps I know are the full dataset, but I am not sure if there's any
 other subset dumps available or better way of grabbing this data

 Thanks in advance,
 HS


 ___
 Wikidata mailing list
 Wikidata@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata


>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>>
>>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata - short biographies

2016-02-01 Thread Hampton Snowball
Of course I meant sorry if this is a dumb question :)



On Mon, Feb 1, 2016 at 7:13 PM, Hampton Snowball 
wrote:

> Sorry if this is a dump question (I'm not a developer).  To run the
> command on the rdfslice program in mentions (" java -jar rdfslice.jar
> -source | -patterns  -out  -order
>  -debug ), can this be done with windows command
> prompt? or do I need some special developer version of java/console?
>
> Thanks for the tool.
>
> On Sun, Jan 31, 2016 at 3:53 PM, Edgard Marx <
> m...@informatik.uni-leipzig.de> wrote:
>
>> Hey,
>> you can simple use RDFSlice (
>> https://bitbucket.org/emarx/rdfslice/overview) directly on the dump file
>> (https://dumps.wikimedia.org/wikidatawiki/entities/20160125/)
>>
>> best,
>> Edgard
>>
>> On Sun, Jan 31, 2016 at 7:43 PM, Hampton Snowball <
>> hamptonsnowb...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I am interested in a subset of wikidata and I am trying to find the best
>>> way to get it without getting a larger dataset then necessary.
>>>
>>> Is there a way to just get the "bios" that appear on the wikidata pages
>>> below the name of the person/organization, as well as the link to the
>>> english wikipedia page / or all wikipedia pages?
>>>
>>> For example from: https://www.wikidata.org/wiki/Q1652291";
>>>
>>> "Turkish female given name"
>>> https://en.wikipedia.org/wiki/H%C3%BClya
>>> and optionally https://de.wikipedia.org/wiki/H%C3%BClya
>>>
>>> I know there is SPARQL which previously this list helped me construct a
>>> query, but I know some requests seem to timeout when looking at a large
>>> amount of data so I am not sure this would work.
>>>
>>> The dumps I know are the full dataset, but I am not sure if there's any
>>> other subset dumps available or better way of grabbing this data
>>>
>>> Thanks in advance,
>>> HS
>>>
>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata - short biographies

2016-02-01 Thread Hampton Snowball
Sorry if this is a dump question (I'm not a developer).  To run the command
on the rdfslice program in mentions (" java -jar rdfslice.jar -source
| -patterns  -out  -order 
-debug ), can this be done with windows command prompt? or
do I need some special developer version of java/console?

Thanks for the tool.

On Sun, Jan 31, 2016 at 3:53 PM, Edgard Marx  wrote:

> Hey,
> you can simple use RDFSlice (https://bitbucket.org/emarx/rdfslice/overview)
> directly on the dump file (
> https://dumps.wikimedia.org/wikidatawiki/entities/20160125/)
>
> best,
> Edgard
>
> On Sun, Jan 31, 2016 at 7:43 PM, Hampton Snowball <
> hamptonsnowb...@gmail.com> wrote:
>
>> Hello,
>>
>> I am interested in a subset of wikidata and I am trying to find the best
>> way to get it without getting a larger dataset then necessary.
>>
>> Is there a way to just get the "bios" that appear on the wikidata pages
>> below the name of the person/organization, as well as the link to the
>> english wikipedia page / or all wikipedia pages?
>>
>> For example from: https://www.wikidata.org/wiki/Q1652291";
>>
>> "Turkish female given name"
>> https://en.wikipedia.org/wiki/H%C3%BClya
>> and optionally https://de.wikipedia.org/wiki/H%C3%BClya
>>
>> I know there is SPARQL which previously this list helped me construct a
>> query, but I know some requests seem to timeout when looking at a large
>> amount of data so I am not sure this would work.
>>
>> The dumps I know are the full dataset, but I am not sure if there's any
>> other subset dumps available or better way of grabbing this data
>>
>> Thanks in advance,
>> HS
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata - short biographies

2016-02-01 Thread Hampton Snowball
Thanks.

I only plan on using a query to extract from all
english wikidata "articles" or all articles though anyway, hopefully the
other queries will work.

On Mon, Feb 1, 2016 at 4:33 PM, Stas Malyshev 
wrote:

> Hi!
>
> > *The first one, which seems to be only for 1 record, just as a test
> > seemed to give me an ERROR though:*
> >
> >
> > PREFIX wd: 
> > PREFIX wdt: 
> > PREFIX rdfs: 
> > PREFIX schema: 
> >
> > SELECT *
> > WHERE
> > {
> >
> >   ?o .
> > filter(lang(?o)='en').
> >
> > ?article schema:about ?item .
> > ?article schema:inLanguage "en" .
> > FILTER (SUBSTR(str(?article), 1, 25) = "https://en.wikipedia.org/";)
> > }
>
> This one is not correct - ?item should be replaced by
> . Or you can use BIND or VALUES
> syntax in SPARQL to bind all instanced of ?item to one or more specific
> items. But if you just leave it as ?item it matches any value - which
> means you just made it scan through all 15M items :) That will time out.
>
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata - short biographies

2016-02-01 Thread Stas Malyshev
Hi!

> *The first one, which seems to be only for 1 record, just as a test
> seemed to give me an ERROR though:*
> 
> 
> PREFIX wd: 
> PREFIX wdt: 
> PREFIX rdfs: 
> PREFIX schema: 
> 
> SELECT *
> WHERE
> {
>
>   ?o .   
> filter(lang(?o)='en').
> 
> ?article schema:about ?item .
> ?article schema:inLanguage "en" .
> FILTER (SUBSTR(str(?article), 1, 25) = "https://en.wikipedia.org/";)
> }

This one is not correct - ?item should be replaced by
. Or you can use BIND or VALUES
syntax in SPARQL to bind all instanced of ?item to one or more specific
items. But if you just leave it as ?item it matches any value - which
means you just made it scan through all 15M items :) That will time out.

-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata - short biographies

2016-02-01 Thread Edgard Marx
Wikidata seems to be highly queried by now, it is a public endoint.

However, the query bellow might work in RDFSlice:

ps: notice that the subject variable (?article) contains the wikipedia link
and it will be extracted.

SELECT *
WHERE
{
   ?article  ?o .
   ?article  ?o1 .
   ?article  ?o2 .
}

best,
Edgard

On Mon, Feb 1, 2016 at 5:12 PM, Hampton Snowball 
wrote:

> Thank you. This will give me the bios, however, I still want the
> associated wikipedia links.  Previously someone had given me a query that
> included the english wikipedia along with another property. You can see it
> below:
>
>
> PREFIX wd: 
> PREFIX wdt: 
> PREFIX rdfs: 
> PREFIX schema: 
>
> SELECT ?item  ?twitter ?article WHERE {
>   ?item wdt:P2002 ?twitter
>   OPTIONAL {?item rdfs:label ?item_label filter (lang(?item_label) = "en")
> .}
>
>   ?article schema:about ?item .
>   ?article schema:inLanguage "en" .
>   FILTER (SUBSTR(str(?article), 1, 25) = "https://en.wikipedia.org/";)
>
>  }
> ORDER BY ASC (?article)
>
>
> *I tried to take the PREFIX header and this portion to append to some of
> your queries.  *
>
>   ?article schema:about ?item .
>   ?article schema:inLanguage "en" .
>   FILTER (SUBSTR(str(?article), 1, 25) = "https://en.wikipedia.org/";)
>
>
> *The first one, which seems to be only for 1 record, just as a test seemed
> to give me an ERROR though:*
>
>
> PREFIX wd: 
> PREFIX wdt: 
> PREFIX rdfs: 
> PREFIX schema: 
>
> SELECT *
> WHERE
> {
>  <
> http://schema.org/description> ?o .
> filter(lang(?o)='en').
>
> ?article schema:about ?item .
> ?article schema:inLanguage "en" .
> FILTER (SUBSTR(str(?article), 1, 25) = "https://en.wikipedia.org/";)
> }
>
> *So I assume the other queries like this would not work (would timeout on
> query.wikidata.org  so can't test):*
>
>
> PREFIX wd: 
> PREFIX wdt: 
> PREFIX rdfs: 
> PREFIX schema: 
>
> SELECT *
> WHERE
> {
>?s  ?o .
>filter(lang(?o)='en').
>
> ?article schema:about ?item .
> ?article schema:inLanguage "en" .
> FILTER (SUBSTR(str(?article), 1, 25) = "https://en.wikipedia.org/";)
> }
>
>
> So am I doing something wrong with these combined queries in the syntax?
>
> Thanks in advance again, and the help thus far!
>
>
> On Mon, Feb 1, 2016 at 1:19 AM, Edgard Marx <
> m...@informatik.uni-leipzig.de> wrote:
>
>> Yep,
>>
>> Please notes that RDFSlice will take the subset.
>> That is, the triples that contain the property that you are looking for.
>> Here go three examples of SPARQL queries:
>>
>> ps: you can try them here https://query.wikidata.org.
>>
>> ** For your example,*
>>
>> SELECT *
>> WHERE
>> {
>>  <
>> http://schema.org/description> ?o .
>> filter(lang(?o)='en').
>> }
>>
>>
>> ** For all English bios:*
>>
>> SELECT *
>> WHERE
>> {
>>?s  ?o .
>>filter(lang(?o)='en').
>> }
>>
>> ** For all language bios:*
>>
>> SELECT *
>> WHERE
>> {
>>  <
>> http://schema.org/description> ?o .
>> }
>>
>>
>> best,
>> Edgard
>>
>>
>>
>> On Mon, Feb 1, 2016 at 4:34 AM, Hampton Snowball <
>> hamptonsnowb...@gmail.com> wrote:
>>
>>> Thanks. I see it requires constructing a query to only extract the data
>>> you want. E.g. the graph pattern:
>>>
>>>  - desired query, e.g. "SELECT * WHERE {?s ?p ?o}" or
>>> graph pattern e.g. "{?s ?p ?o}"
>>>
>>> Since I don't know about constructing queries, would you be able to tell
>>> me what would be the proper query to extract from all the pages the short
>>> bio, english wikipedia, maybe other wikipedias?
>>>
>>> For example from: https://www.wikidata.org/wiki/Q1652291";
>>>
>>> "Turkish female given name"
>>> https://en.wikipedia.org/wiki/H%C3%BClya
>>> and optionally https://de.wikipedia.org/wiki/H%C3%BClya
>>>
>>> Thanks in advance!
>>>
>>>
>>> On Sun, Jan 31, 2016 at 3:53 PM, Edgard Marx <
>>> m...@informatik.uni-leipzig.de> wrote:
>>>
 Hey,
 you can simple use RDFSlice (
 https://bitbucket.org/emarx/rdfslice/overview) directly on the dump
 file (https://dumps.wikimedia.org/wikidatawiki/entities/20160125/)

 best,
 Edgard

 On Sun, Jan 31, 2016 at 7:43 PM, Hampton Snowball <
 hamptonsnowb...@gmail.com> wrote:

> Hello,
>
> I am interested in a subset of wikidata and I am trying to find the
> best way to get it without getting a larger datase

Re: [Wikidata] Wikidata - short biographies

2016-02-01 Thread Hampton Snowball
Thank you. This will give me the bios, however, I still want the associated
wikipedia links.  Previously someone had given me a query that included the
english wikipedia along with another property. You can see it below:


PREFIX wd: 
PREFIX wdt: 
PREFIX rdfs: 
PREFIX schema: 

SELECT ?item  ?twitter ?article WHERE {
  ?item wdt:P2002 ?twitter
  OPTIONAL {?item rdfs:label ?item_label filter (lang(?item_label) = "en")
.}

  ?article schema:about ?item .
  ?article schema:inLanguage "en" .
  FILTER (SUBSTR(str(?article), 1, 25) = "https://en.wikipedia.org/";)

 }
ORDER BY ASC (?article)


*I tried to take the PREFIX header and this portion to append to some of
your queries.  *

  ?article schema:about ?item .
  ?article schema:inLanguage "en" .
  FILTER (SUBSTR(str(?article), 1, 25) = "https://en.wikipedia.org/";)


*The first one, which seems to be only for 1 record, just as a test seemed
to give me an ERROR though:*


PREFIX wd: 
PREFIX wdt: 
PREFIX rdfs: 
PREFIX schema: 

SELECT *
WHERE
{
     
?o .
filter(lang(?o)='en').

?article schema:about ?item .
?article schema:inLanguage "en" .
FILTER (SUBSTR(str(?article), 1, 25) = "https://en.wikipedia.org/";)
}

*So I assume the other queries like this would not work (would timeout on
query.wikidata.org  so can't test):*


PREFIX wd: 
PREFIX wdt: 
PREFIX rdfs: 
PREFIX schema: 

SELECT *
WHERE
{
   ?s  ?o .
   filter(lang(?o)='en').

?article schema:about ?item .
?article schema:inLanguage "en" .
FILTER (SUBSTR(str(?article), 1, 25) = "https://en.wikipedia.org/";)
}


So am I doing something wrong with these combined queries in the syntax?

Thanks in advance again, and the help thus far!


On Mon, Feb 1, 2016 at 1:19 AM, Edgard Marx 
wrote:

> Yep,
>
> Please notes that RDFSlice will take the subset.
> That is, the triples that contain the property that you are looking for.
> Here go three examples of SPARQL queries:
>
> ps: you can try them here https://query.wikidata.org.
>
> ** For your example,*
>
> SELECT *
> WHERE
> {
>  <
> http://schema.org/description> ?o .
> filter(lang(?o)='en').
> }
>
>
> ** For all English bios:*
>
> SELECT *
> WHERE
> {
>?s  ?o .
>filter(lang(?o)='en').
> }
>
> ** For all language bios:*
>
> SELECT *
> WHERE
> {
>  <
> http://schema.org/description> ?o .
> }
>
>
> best,
> Edgard
>
>
>
> On Mon, Feb 1, 2016 at 4:34 AM, Hampton Snowball <
> hamptonsnowb...@gmail.com> wrote:
>
>> Thanks. I see it requires constructing a query to only extract the data
>> you want. E.g. the graph pattern:
>>
>>  - desired query, e.g. "SELECT * WHERE {?s ?p ?o}" or
>> graph pattern e.g. "{?s ?p ?o}"
>>
>> Since I don't know about constructing queries, would you be able to tell
>> me what would be the proper query to extract from all the pages the short
>> bio, english wikipedia, maybe other wikipedias?
>>
>> For example from: https://www.wikidata.org/wiki/Q1652291";
>>
>> "Turkish female given name"
>> https://en.wikipedia.org/wiki/H%C3%BClya
>> and optionally https://de.wikipedia.org/wiki/H%C3%BClya
>>
>> Thanks in advance!
>>
>>
>> On Sun, Jan 31, 2016 at 3:53 PM, Edgard Marx <
>> m...@informatik.uni-leipzig.de> wrote:
>>
>>> Hey,
>>> you can simple use RDFSlice (
>>> https://bitbucket.org/emarx/rdfslice/overview) directly on the dump
>>> file (https://dumps.wikimedia.org/wikidatawiki/entities/20160125/)
>>>
>>> best,
>>> Edgard
>>>
>>> On Sun, Jan 31, 2016 at 7:43 PM, Hampton Snowball <
>>> hamptonsnowb...@gmail.com> wrote:
>>>
 Hello,

 I am interested in a subset of wikidata and I am trying to find the
 best way to get it without getting a larger dataset then necessary.

 Is there a way to just get the "bios" that appear on the wikidata pages
 below the name of the person/organization, as well as the link to the
 english wikipedia page / or all wikipedia pages?

 For example from: https://www.wikidata.org/wiki/Q1652291";

 "Turkish female given name"
 https://en.wikipedia.org/wiki/H%C3%BClya
 and optionally https://de.wikipedia.org/wiki/H%C3%BClya

 I know there is SPARQL which previously this list helped me construct a
 query, but I know some requests seem to timeout when looking at a large
 amount of data so I am not sure this would work.

 The dumps I know are the full dataset, but I am not sure if t

Re: [Wikidata] Wikidata - short biographies

2016-02-01 Thread Edgard Marx
Yep,

Please notes that RDFSlice will take the subset.
That is, the triples that contain the property that you are looking for.
Here go three examples of SPARQL queries:

ps: you can try them here https://query.wikidata.org.

** For your example,*

SELECT *
WHERE
{
     
?o .
filter(lang(?o)='en').
}


** For all English bios:*

SELECT *
WHERE
{
   ?s  ?o .
   filter(lang(?o)='en').
}

** For all language bios:*

SELECT *
WHERE
{
     
?o .
}


best,
Edgard



On Mon, Feb 1, 2016 at 4:34 AM, Hampton Snowball 
wrote:

> Thanks. I see it requires constructing a query to only extract the data
> you want. E.g. the graph pattern:
>
>  - desired query, e.g. "SELECT * WHERE {?s ?p ?o}" or graph
> pattern e.g. "{?s ?p ?o}"
>
> Since I don't know about constructing queries, would you be able to tell
> me what would be the proper query to extract from all the pages the short
> bio, english wikipedia, maybe other wikipedias?
>
> For example from: https://www.wikidata.org/wiki/Q1652291";
>
> "Turkish female given name"
> https://en.wikipedia.org/wiki/H%C3%BClya
> and optionally https://de.wikipedia.org/wiki/H%C3%BClya
>
> Thanks in advance!
>
>
> On Sun, Jan 31, 2016 at 3:53 PM, Edgard Marx <
> m...@informatik.uni-leipzig.de> wrote:
>
>> Hey,
>> you can simple use RDFSlice (
>> https://bitbucket.org/emarx/rdfslice/overview) directly on the dump file
>> (https://dumps.wikimedia.org/wikidatawiki/entities/20160125/)
>>
>> best,
>> Edgard
>>
>> On Sun, Jan 31, 2016 at 7:43 PM, Hampton Snowball <
>> hamptonsnowb...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I am interested in a subset of wikidata and I am trying to find the best
>>> way to get it without getting a larger dataset then necessary.
>>>
>>> Is there a way to just get the "bios" that appear on the wikidata pages
>>> below the name of the person/organization, as well as the link to the
>>> english wikipedia page / or all wikipedia pages?
>>>
>>> For example from: https://www.wikidata.org/wiki/Q1652291";
>>>
>>> "Turkish female given name"
>>> https://en.wikipedia.org/wiki/H%C3%BClya
>>> and optionally https://de.wikipedia.org/wiki/H%C3%BClya
>>>
>>> I know there is SPARQL which previously this list helped me construct a
>>> query, but I know some requests seem to timeout when looking at a large
>>> amount of data so I am not sure this would work.
>>>
>>> The dumps I know are the full dataset, but I am not sure if there's any
>>> other subset dumps available or better way of grabbing this data
>>>
>>> Thanks in advance,
>>> HS
>>>
>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata - short biographies

2016-01-31 Thread Edgard Marx
Yep,

One more reason to use RDFSlice ;-),

thnks

On Mon, Feb 1, 2016 at 7:25 AM, Stas Malyshev 
wrote:

> Hi!
>
> >
> > ** For all English bios:*
> >
> > SELECT *
> > WHERE
> > {
> >?s  ?o .
> >filter(lang(?o)='en').
> > }
>
> Please don't run this on query.wikidata.org though. Please add LIMIT.
> Otherwise you'd be trying to download several millions of data items,
> which would probably time out anyway. Add something like "LIMIT 10" to it.
>
> Thanks,
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata - short biographies

2016-01-31 Thread Stas Malyshev
Hi!

> 
> ** For all English bios:*
> 
> SELECT *
> WHERE
> {
>?s  ?o .   
>filter(lang(?o)='en').
> }

Please don't run this on query.wikidata.org though. Please add LIMIT.
Otherwise you'd be trying to download several millions of data items,
which would probably time out anyway. Add something like "LIMIT 10" to it.

Thanks,
-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata - short biographies

2016-01-31 Thread Hampton Snowball
Thanks. I see it requires constructing a query to only extract the data you
want. E.g. the graph pattern:

 - desired query, e.g. "SELECT * WHERE {?s ?p ?o}" or graph
pattern e.g. "{?s ?p ?o}"

Since I don't know about constructing queries, would you be able to tell me
what would be the proper query to extract from all the pages the short bio,
english wikipedia, maybe other wikipedias?

For example from: https://www.wikidata.org/wiki/Q1652291";

"Turkish female given name"
https://en.wikipedia.org/wiki/H%C3%BClya
and optionally https://de.wikipedia.org/wiki/H%C3%BClya

Thanks in advance!


On Sun, Jan 31, 2016 at 3:53 PM, Edgard Marx  wrote:

> Hey,
> you can simple use RDFSlice (https://bitbucket.org/emarx/rdfslice/overview)
> directly on the dump file (
> https://dumps.wikimedia.org/wikidatawiki/entities/20160125/)
>
> best,
> Edgard
>
> On Sun, Jan 31, 2016 at 7:43 PM, Hampton Snowball <
> hamptonsnowb...@gmail.com> wrote:
>
>> Hello,
>>
>> I am interested in a subset of wikidata and I am trying to find the best
>> way to get it without getting a larger dataset then necessary.
>>
>> Is there a way to just get the "bios" that appear on the wikidata pages
>> below the name of the person/organization, as well as the link to the
>> english wikipedia page / or all wikipedia pages?
>>
>> For example from: https://www.wikidata.org/wiki/Q1652291";
>>
>> "Turkish female given name"
>> https://en.wikipedia.org/wiki/H%C3%BClya
>> and optionally https://de.wikipedia.org/wiki/H%C3%BClya
>>
>> I know there is SPARQL which previously this list helped me construct a
>> query, but I know some requests seem to timeout when looking at a large
>> amount of data so I am not sure this would work.
>>
>> The dumps I know are the full dataset, but I am not sure if there's any
>> other subset dumps available or better way of grabbing this data
>>
>> Thanks in advance,
>> HS
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata - short biographies

2016-01-31 Thread Edgard Marx
Hey,
you can simple use RDFSlice (https://bitbucket.org/emarx/rdfslice/overview)
directly on the dump file (
https://dumps.wikimedia.org/wikidatawiki/entities/20160125/)

best,
Edgard

On Sun, Jan 31, 2016 at 7:43 PM, Hampton Snowball  wrote:

> Hello,
>
> I am interested in a subset of wikidata and I am trying to find the best
> way to get it without getting a larger dataset then necessary.
>
> Is there a way to just get the "bios" that appear on the wikidata pages
> below the name of the person/organization, as well as the link to the
> english wikipedia page / or all wikipedia pages?
>
> For example from: https://www.wikidata.org/wiki/Q1652291";
>
> "Turkish female given name"
> https://en.wikipedia.org/wiki/H%C3%BClya
> and optionally https://de.wikipedia.org/wiki/H%C3%BClya
>
> I know there is SPARQL which previously this list helped me construct a
> query, but I know some requests seem to timeout when looking at a large
> amount of data so I am not sure this would work.
>
> The dumps I know are the full dataset, but I am not sure if there's any
> other subset dumps available or better way of grabbing this data
>
> Thanks in advance,
> HS
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata - short biographies

2016-01-31 Thread Gerard Meijssen
Hoi,
Magnus created automated descriptions they are a start. Your only problem
is that they are not using sparql
Thanks,
 GerardM

On 31 January 2016 at 19:43, Hampton Snowball 
wrote:

> Hello,
>
> I am interested in a subset of wikidata and I am trying to find the best
> way to get it without getting a larger dataset then necessary.
>
> Is there a way to just get the "bios" that appear on the wikidata pages
> below the name of the person/organization, as well as the link to the
> english wikipedia page / or all wikipedia pages?
>
> For example from: https://www.wikidata.org/wiki/Q1652291";
>
> "Turkish female given name"
> https://en.wikipedia.org/wiki/H%C3%BClya
> and optionally https://de.wikipedia.org/wiki/H%C3%BClya
>
> I know there is SPARQL which previously this list helped me construct a
> query, but I know some requests seem to timeout when looking at a large
> amount of data so I am not sure this would work.
>
> The dumps I know are the full dataset, but I am not sure if there's any
> other subset dumps available or better way of grabbing this data
>
> Thanks in advance,
> HS
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Wikidata - short biographies

2016-01-31 Thread Hampton Snowball
Hello,

I am interested in a subset of wikidata and I am trying to find the best
way to get it without getting a larger dataset then necessary.

Is there a way to just get the "bios" that appear on the wikidata pages
below the name of the person/organization, as well as the link to the
english wikipedia page / or all wikipedia pages?

For example from: https://www.wikidata.org/wiki/Q1652291";

"Turkish female given name"
https://en.wikipedia.org/wiki/H%C3%BClya
and optionally https://de.wikipedia.org/wiki/H%C3%BClya

I know there is SPARQL which previously this list helped me construct a
query, but I know some requests seem to timeout when looking at a large
amount of data so I am not sure this would work.

The dumps I know are the full dataset, but I am not sure if there's any
other subset dumps available or better way of grabbing this data

Thanks in advance,
HS
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata