this is something you might want to bring up with the Blazegraph team.
Jena for example provides the apf:strSplit SPARQL function in ARQ.

On Tue, Sep 19, 2017 at 2:39 PM, Thad Guidry <thadgui...@gmail.com> wrote:
> Thanks Christopher,
>
> But I really am looking to split by whitespace, with an unknown of how many
> tokens in a label.  My example of human names was just to simplify, but
> could be anything... not just human names.  Any Wikidata QID.
> Like "Castle of Saint Pée sur Nivelle"
> I would want 6 columns automatically created for that. Or in JSON terms.. An
> array of string objects.
> {
> "Castle",
> "of",
> "Saint",
> "Pée",
> "sur",
> "Nivelle",
> }
>
> This has to do with a use case of pre-processing the label names for data
> ingestion into further analysis workflows.
> I was hoping that I could easily leverage a bit of horsepower for free from
> the WDQS for this (splitting label names)...perhaps even using the Label
> service itself to do the splitting.
>
> The indexing service behind the scenes already stores much of this, and
> stores those tokens for each label.
> The problem is that we don't currently have a way to get the tokens of a
> label for any particular QID and its labels in various languages.
> And that's what I want to solve, either through SPARQL or an enhancement to
> the Label service or something else.
> If the answer is that I will have to resort to my own programmatic methods
> via the dump files then so be it, I guess, but I'd rather not have to put in
> the work for something that is done already behind the scenes.
>
> -Thad
> +ThadGuidry
>
> _______________________________________________
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>



-- 


---
Marco Neumann
KONA

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to