Re: Indexing a token to a different field in a custom filter

Erick Erickson Tue, 12 Nov 2013 04:44:05 -0800

Whether what Alvaro outlined works for you or
not, do NOT commit after every document if you
use SolrJ. The commit will hurt performance much
more than the HTTP overhead.


And you can always batch up, say, 1,000 documents
and use the server.add(doclist) method.

Overall, worrying about HTTP overhead is usually a
red herring.

Best,
Erick


On Tue, Nov 12, 2013 at 3:20 AM, Alvaro Cabrerizo <topor...@gmail.com>wrote:

> Hi,
>
> Maybe the synonym
> filter<
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
> >is
> the mirror you can look in. You can start creating a new field type in
> your schema that is stanbol enhanced. Let's follow with the parallelism, in
> the case of synonym we could have this schema:
>
> ...
> <fielType name="synonymtext" class="solr.TextField"
> positionIncrementGap="100">
>   <tokenizer class="solr.WhitespaceTokenizerFactory" />
>   <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true" />
> </fieldType>
> ...
> <field name="id" type="string" indexed="true" stored="true" required="true"
> />
> <field name="description" type="synonymtext" indexed="true" stored="true"
> multiValued="true" />
> ...
>
> In the case of stanbol:
>
> ...
> <fielType name="stanboltext" class="solr.TextField"
> positionIncrementGap="100">
>   <tokenizer class="solr.WhitespaceTokenizerFactory" />
>   <filter class="StanbolFilterFactory"  your Stanbol filter parameters here
> />
> </fieldType>
> ...
> <field name="id" type="string" indexed="true" stored="true" required="true"
> />
> <field name="description" type="synonymtext" indexed="true" stored="true"
> multiValued="true" />
> ...
>
> Thus the StanbolFilterFactory is in charge of connecting ot Stanbol and
> enhance the data coming from WhitespaceTokenizerFactory, creating an output
> that can be used by other filters.
>
> How do you index your data, then?
>
> Just send your doc:
>
> id:your id
> description:the data to be enhanced
>
>
> Other path you can follow is imitate the behaviour of
> CopyField<http://wiki.apache.org/solr/SchemaXml#Copy_Fields>in a more
> sofisticated fashion i.e. (copy, enhance an put in a new field).
> The you can have the next schema:
>
> ...
> <fielType name="text" class="solr.TextField" positionIncrementGap="100">
>   <tokenizer class="solr.WhitespaceTokenizerFactory" />
> </fieldType>
> ...
> <field name="id" type="string" indexed="true" stored="true" required="true"
> />
> <field name="description" type="text" indexed="true" stored="true"
> multiValued="true" />
> <field name="enhancedDescription" type="text" indexed="true" stored="true"
> multiValued="true" />
> <copyEnhanceField source="description" dest="enhancedDescription" />
>
> The copyEnhanceField is now in charge of take the original field, send to
> stanbol, get the response and write it in the new field.
>
> How do you index your data then?
>
> Just send your doc:
>
> id:your id
> description:the original data
>
> And you will get in solr:
>
> id:your id
> description:the original data
> enhancedDescription:the enhanced data
>
>
> Regards
>

Re: Indexing a token to a different field in a custom filter

Reply via email to