Re: default / customized fields in KeywordLinkingEngine

seralf Thu, 10 May 2012 06:28:03 -0700

uhm thank you, i have a more clearer idea right now, i have to re-check
what i'm doing and i'll try to follow your suggestion then, as i had
misunderstood some points, sorry :-)


thanks very much for the explanation!

Alfredo Serafini

2012/5/10 Rupert Westenthaler <[email protected]>

>
> On 10.05.2012, at 14:18, seralf wrote:
>
> > thanks for 1)
> >
> > for the 2) point i was not very clear sorry.
> > I have on my test a particular weird use case where i am trying to
> provide
> > results for almost two different cases on the same rdfs:label field,
> (where
> > i have to use different tokenization approach, if that work)
> > So my idea is to try to create a parallel field with a different
> > tokenization approac and then copy it on the _text field. This is most
> > common on solr, but i am at the beginning with stanbol, so i have some
> > doubt: for example i'm not sure if the _text field is the field always
> used
> > for the matches or not.
> > I hope i was more clear this time, but i'm probably trying to do
> something
> > which is strange, i know :-)
> >
> I try to replicate to ensure that we do not misunderstand each other
>
> You have two two types of Entities in you vocabulary that both use
> rdfs:label.
> But you would like to use two different fields so that you can use
> different Solr Field configurations (e.g. Tokenizers)
>
> Copying values of rdf:label to an other field is easily possible with the
> Entityhub indexing tool.
>
> If those two different Entities do have some distinct feature (e.g. a
> different rdf:type) you could use the
>
>    org.apache.stanbol.entityhub.indexing.core.processor.LdpathProcessor
>
> with a LDpath program like
>
>    @prefix my : <http://www.example.com/my#>;
>    my:label1 = .[rdf:type is my:type1]/rdfs:label;
>    my:label2 = .[rdf:type is my:type1]/rdfs:label;
>
> this would ensure that
>
> * labels of Entities of type my:type1 are indexed in my:label1 and
> * labels of Entities of type my:type2 are indexed in my:label2
>
> The default "indexing.properties" file of the Entityhub Indexing tool also
> contains an example for how to configure the LdpathProcessor.
>
> Note also that if you keep using the FiledMapperProcessor, than the
> rdfs:label would still contain the labels of all Entities.
>
> For extraction you would need to configure two KeywordLinkingEngines (for
> my:label1 and my:label2).
> The dereferenced Entities included by those two engine configurations
> would however miss the rdfs:label field. So if you would like to have the
> rdfs:label values in the Enhancement metadata I would need to implement the
> possibility to configure the list of included properties.
>
>
> Regarding the *_text* field:
>
> This is configured (by default) in a way that any text value of an
> property is copied to it. So it would not only contain the rdfs:labels, but
> also all other textual values of any outgoing relation of an entity.
> Also note that this field can NOT be used with the KeywordLinkingEngine,
> because it is only indexed, but does not store the values.
>
> I hope this helps.
> best
> Rupert
>
> [1] https://issues.apache.org/jira/browse/STANBOL-596
>
> >
> > 2012/5/10 Rupert Westenthaler <[email protected]>
> >
> >> Hi
> >>
> >> On Thu, May 10, 2012 at 12:00 PM, seralf <[email protected]> wrote:
> >>> Hi i'm trying to use the keyword linking engine with a customized solr
> >>> configuration. Basically i need to understand two different things:
> >>>
> >>>  1. what are the default fields indexed and then used in the retrieval
> >>>  process? i look at the DEREFERENCE_FIELDS in the source, and i'm not
> >> sure
> >>>  if this is or not the place to look at.
> >>
> >> Currently it is hard coded in the "DEREFERENCE_FIELDS" constant
> >> defining fields required by the Web UI of the enhancer. Currently it
> >> includes:
> >>
> >> * rdfs:comment
> >> * geo:lat/geo:long
> >> * foaf:depiction
> >> * dbp-ont:thumbnail
> >>
> >> However note that in addition to this also the
> >>
> >> * nameField (the field configured to be used as label for extraction -
> >> default: rdfs:label)
> >> * redirectField (the field used to follow redirections - default:
> >> rdf:seeAlso)
> >> * typeField (the field used to determine the type of Entities -
> >> default: rdf:type)
> >>
> >> are included.
> >>
> >> If you want this to be configurable I can easily add this feature. Not
> >> sure why I have not enabled that in the beginning.
> >>
> >>>  2. starting from the fact that if i am sure about the field that is
> >> used
> >>>  as a base to have a textual enhancement i could simple copy in that
> the
> >>>  results from other fields in the config, i wonder if i could define
> new
> >>>  fields and then consuming them into the process
> >>>
> >>
> >> Sorry, I do not understand what you mean with that.
> >>
> >>> thanks in advance if someone could give me some suggestion
> >>>
> >>> Alfredo Serafini
> >>
> >> best
> >> Rupert
> >>
> >>
> >>
> >> --
> >> | Rupert Westenthaler             [email protected]
> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> | A-5500 Bischofshofen
> >>
>
>

Re: default / customized fields in KeywordLinkingEngine

Reply via email to