Re: default / customized fields in KeywordLinkingEngine

Rupert Westenthaler Thu, 10 May 2012 06:19:25 -0700

On 10.05.2012, at 14:18, seralf wrote:

> thanks for 1)
> 
> for the 2) point i was not very clear sorry.
> I have on my test a particular weird use case where i am trying to provide
> results for almost two different cases on the same rdfs:label field, (where
> i have to use different tokenization approach, if that work)
> So my idea is to try to create a parallel field with a different
> tokenization approac and then copy it on the _text field. This is most
> common on solr, but i am at the beginning with stanbol, so i have some
> doubt: for example i'm not sure if the _text field is the field always used
> for the matches or not.
> I hope i was more clear this time, but i'm probably trying to do something
> which is strange, i know :-)
> 
I try to replicate to ensure that we do not misunderstand each other

You have two two types of Entities in you vocabulary that both use rdfs:label.
But you would like to use two different fields so that you can use 
different Solr Field configurations (e.g. Tokenizers)

Copying values of rdf:label to an other field is easily possible with the 
Entityhub indexing tool.

If those two different Entities do have some distinct feature (e.g. a different 
rdf:type) you could use the

    org.apache.stanbol.entityhub.indexing.core.processor.LdpathProcessor

with a LDpath program like

    @prefix my : <http://www.example.com/my#>;
    my:label1 = .[rdf:type is my:type1]/rdfs:label;
    my:label2 = .[rdf:type is my:type1]/rdfs:label;

this would ensure that 

* labels of Entities of type my:type1 are indexed in my:label1 and 
* labels of Entities of type my:type2 are indexed in my:label2

The default "indexing.properties" file of the Entityhub Indexing tool also 
contains an example for how to configure the LdpathProcessor.

Note also that if you keep using the FiledMapperProcessor, than the rdfs:label 
would still contain the labels of all Entities.

For extraction you would need to configure two KeywordLinkingEngines (for 
my:label1 and my:label2).
The dereferenced Entities included by those two engine configurations would 
however miss the rdfs:label field. So if you would like to have the rdfs:label 
values in the Enhancement metadata I would need to implement the possibility to 
configure the list of included properties.

Regarding the *_text* field:

This is configured (by default) in a way that any text value of an property is 
copied to it. So it would not only contain the rdfs:labels, but also all other 
textual values of any outgoing relation of an entity.
Also note that this field can NOT be used with the KeywordLinkingEngine, 
because it is only indexed, but does not store the values.

I hope this helps.
best
Rupert

[1] https://issues.apache.org/jira/browse/STANBOL-596

> 
> 2012/5/10 Rupert Westenthaler <[email protected]>
> 
>> Hi
>> 
>> On Thu, May 10, 2012 at 12:00 PM, seralf <[email protected]> wrote:
>>> Hi i'm trying to use the keyword linking engine with a customized solr
>>> configuration. Basically i need to understand two different things:
>>> 
>>>  1. what are the default fields indexed and then used in the retrieval
>>>  process? i look at the DEREFERENCE_FIELDS in the source, and i'm not
>> sure
>>>  if this is or not the place to look at.
>> 
>> Currently it is hard coded in the "DEREFERENCE_FIELDS" constant
>> defining fields required by the Web UI of the enhancer. Currently it
>> includes:
>> 
>> * rdfs:comment
>> * geo:lat/geo:long
>> * foaf:depiction
>> * dbp-ont:thumbnail
>> 
>> However note that in addition to this also the
>> 
>> * nameField (the field configured to be used as label for extraction -
>> default: rdfs:label)
>> * redirectField (the field used to follow redirections - default:
>> rdf:seeAlso)
>> * typeField (the field used to determine the type of Entities -
>> default: rdf:type)
>> 
>> are included.
>> 
>> If you want this to be configurable I can easily add this feature. Not
>> sure why I have not enabled that in the beginning.
>> 
>>>  2. starting from the fact that if i am sure about the field that is
>> used
>>>  as a base to have a textual enhancement i could simple copy in that the
>>>  results from other fields in the config, i wonder if i could define new
>>>  fields and then consuming them into the process
>>> 
>> 
>> Sorry, I do not understand what you mean with that.
>> 
>>> thanks in advance if someone could give me some suggestion
>>> 
>>> Alfredo Serafini
>> 
>> best
>> Rupert
>> 
>> 
>> 
>> --
>> | Rupert Westenthaler             [email protected]
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>

Re: default / customized fields in KeywordLinkingEngine

Reply via email to