Re: [Wikidata] How to split a label by whitespace in WDQS ?

Thad Guidry Tue, 19 Sep 2017 11:41:13 -0700

Thanks Christopher,

But I really am looking to split by whitespace, with an unknown of how many
tokens in a label.  My example of human names was just to simplify, but
could be anything... not just human names.  Any Wikidata QID.
Like "Castle of Saint Pée sur Nivelle"
I would want 6 columns automatically created for that. Or in JSON terms..
An array of string objects.
{
"Castle",
"of",
"Saint",
"Pée",
"sur",
"Nivelle",
}


This has to do with a use case of pre-processing the label names for data
ingestion into further analysis workflows.
I was hoping that I could easily leverage a bit of horsepower for free from
the WDQS for this (splitting label names)...perhaps even using the Label
service itself to do the splitting.

The indexing service behind the scenes already stores much of this, and
stores those tokens for each label.
The problem is that we don't currently have a way to get the tokens of a
label for any particular QID and its labels in various languages.
And that's what I want to solve, either through SPARQL or an enhancement to
the Label service or something else.
If the answer is that I will have to resort to my own programmatic methods
via the dump files then so be it, I guess, but I'd rather not have to put
in the work for something that is done already behind the scenes.

-Thad
+ThadGuidry <https://plus.google.com/+ThadGuidry>

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] How to split a label by whitespace in WDQS ?

Reply via email to