Re: Solr Endpoint

Rupert Westenthaler Mon, 26 Sep 2011 10:40:25 -0700

Hi

On Mon, Sep 26, 2011 at 3:16 PM, Olivier Grisel
<[email protected]> wrote:
> 2011/9/26 João Pedro Oliveira <[email protected]>:
>> Good Afternoon.
>>
>> Is there any way to use the Solr Service through Apache Stanbol? I need to
>> make a faceting search over my entities stored in the entity hub. I´m
>> currently using the Query and Find endpoints from Stanbol but wath I wanted
>> was to make a more simple search, just dividing my files by categories to
>> get the total number of indexed files in each one.


Currently the only possibility us to configure the Entityhub to use an
external SolrServer.

You can use a normal SolrServer (version 3.3+). However you need to
configure it with a core compatible to the configuration expected by
the SolrYard.
If you want to start from scratch you can find the default configuration at [1].
If you want to reuse the current data you can find the currently used
index under

    {stanbol-root}/sling/entityhub/solrYard/indexes/{index}

Just copy the {index} over to the external SolrServer.

To configure the Entityhub to use an external SolrServer

* go to the configuration tab of the Apache Felix Webconsole
(http://localhost:8080/system/console/configMgr)
* search for the "Apache Stanbol Entityhub Yard: Solr Yard Configuration"
* open the configuration of the correct SolrYard
* change the value of the "Solr Index/Core" to the external "http://.."; url.

Before writing Queries you need to know how the SolrYard encodes RDF
Properties in field:

In general:

* All triples with the same subject are added to the same Solr
document with the "uri":"{subject}
* RDF properties are encoded "{prefix}/{ns-prefix}:{local-name}/"
where the {prefix} represents the datatype/language of the value

(1) namespace prefix mappings

All {ns-prefix} used within the index are stored in a special document
within the index.
This document has the id ("uri" is the field used for ids)

    "urn:eu.iksproject:rick.yard.solr:config.namespacePrefixConfig"

all fields within this document start with "_config/"

(2) field prefixes

The schema.xml gives an good overview over the defined prefixes. This
file can be found under "{index}/conf/schema.xml"

Short overview:

* "@{lang}" for languages
* "_!@" contains all text AND string values
* "bool", "int", "lon", "flo", "dou", "cal", "dur" for primitive datatypes
* "ref" for references (URI values)
* "str" for string values of the datatype xsd:string

special fields:

* "uri" document id field
* "_domain" is used by the SolrYard in cases where more than one
SolrYard instances use the same SolrServer/Core
* "_text" stores all text AND string values of ALL fields (and the
default search field)
* "_ref" stores all URI values of ALL fields (can be used to semantic
context searches)

I recommend to open an index of a SolrYard within Luke [2] and have
your own look on how the data are stored.

@João: I know this is a little bit complex ... if you have any
additional questions feel free to ask. You can also join the #stanbol
channel on IRC and ask me directly.

>
> I also think we should make it possible to enable the raw Solr servlet
> from the SolrYard configuration. I wonder if this would be complicated
> to implement though.
>
> Rupert, any thought?
>

@Olivier

In my opinion it a border line use case, because the way how the
SolrYard encodes fields would make custom queries very complex.
However if more users request this feature we need definitely have a
look.

In case of an external SorlServer we could simple forward requests. In
case of an EmbeddedSolrServer I do have access to the SolrCore.
Hopefully one can initialize the sold servlet based on that.

best
Rupert


[1] 
http://svn.apache.org/repos/asf/incubator/stanbol/trunk/entityhub/yard/solr/src/main/resources/solr/core/default.solrindex.zip
[2] http://code.google.com/p/luke/

-- 
| Rupert Westenthaler             [email protected]
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Solr Endpoint

Reply via email to