Whether what Alvaro outlined works for you or not, do NOT commit after every document if you use SolrJ. The commit will hurt performance much more than the HTTP overhead.
And you can always batch up, say, 1,000 documents and use the server.add(doclist) method. Overall, worrying about HTTP overhead is usually a red herring. Best, Erick On Tue, Nov 12, 2013 at 3:20 AM, Alvaro Cabrerizo <topor...@gmail.com>wrote: > Hi, > > Maybe the synonym > filter< > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory > >is > the mirror you can look in. You can start creating a new field type in > your schema that is stanbol enhanced. Let's follow with the parallelism, in > the case of synonym we could have this schema: > > ... > <fielType name="synonymtext" class="solr.TextField" > positionIncrementGap="100"> > <tokenizer class="solr.WhitespaceTokenizerFactory" /> > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > ignoreCase="true" expand="true" /> > </fieldType> > ... > <field name="id" type="string" indexed="true" stored="true" required="true" > /> > <field name="description" type="synonymtext" indexed="true" stored="true" > multiValued="true" /> > ... > > In the case of stanbol: > > ... > <fielType name="stanboltext" class="solr.TextField" > positionIncrementGap="100"> > <tokenizer class="solr.WhitespaceTokenizerFactory" /> > <filter class="StanbolFilterFactory" your Stanbol filter parameters here > /> > </fieldType> > ... > <field name="id" type="string" indexed="true" stored="true" required="true" > /> > <field name="description" type="synonymtext" indexed="true" stored="true" > multiValued="true" /> > ... > > Thus the StanbolFilterFactory is in charge of connecting ot Stanbol and > enhance the data coming from WhitespaceTokenizerFactory, creating an output > that can be used by other filters. > > How do you index your data, then? > > Just send your doc: > > id:your id > description:the data to be enhanced > > > Other path you can follow is imitate the behaviour of > CopyField<http://wiki.apache.org/solr/SchemaXml#Copy_Fields>in a more > sofisticated fashion i.e. (copy, enhance an put in a new field). > The you can have the next schema: > > ... > <fielType name="text" class="solr.TextField" positionIncrementGap="100"> > <tokenizer class="solr.WhitespaceTokenizerFactory" /> > </fieldType> > ... > <field name="id" type="string" indexed="true" stored="true" required="true" > /> > <field name="description" type="text" indexed="true" stored="true" > multiValued="true" /> > <field name="enhancedDescription" type="text" indexed="true" stored="true" > multiValued="true" /> > <copyEnhanceField source="description" dest="enhancedDescription" /> > > The copyEnhanceField is now in charge of take the original field, send to > stanbol, get the response and write it in the new field. > > How do you index your data then? > > Just send your doc: > > id:your id > description:the original data > > And you will get in solr: > > id:your id > description:the original data > enhancedDescription:the enhanced data > > > Regards >