I'd go for this option as well. The example update processor can't make it more easier and it's a very flexible approach. Judging from the patch in SOLR-2105 it should still work with the current 3.2 branch.
https://issues.apache.org/jira/browse/SOLR-2105 > Hi, > > Write a custom UpdateProcessor, which gives you full control of the > SolrDocument prior to indexing. The best would be if you write a generic > FieldSplitterProcessor which is configurable on what field to take as > input, what delimiter or regex to split on and finally what fields to > write the result to. This way other may re-use your code for their > splitting needs. > > See http://wiki.apache.org/solr/UpdateRequestProcessor and > http://wiki.apache.org/solr/SolrConfigXml#UpdateRequestProcessorChain_sect > ion > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > Solr Training - www.solrtraining.com > > On 27. mai 2011, at 15.47, Joe Fitzgerald wrote: > > Hello, > > > > > > > > I am in an odd position. The application server I use has built-in > > integration with SOLR. Unfortunately, its native capabilities are > > fairly limited, specifically, it only supports a standard/pre-defined > > set of fields which can be indexed. As a result, it has left me > > kludging how I work with Solr and doing things like putting what I'd > > like to be multiple, separate fields into a single Solr field. > > > > > > > > As an example, I may put a customer id and name into a single field > > called 'custom1'. Ideally, I'd like this information to be returned in > > separate fields...and even better would be for them to be indexed as > > separate fields but I can live without the latter. Currently, I'm > > building out a json representation of this information which makes it > > easy for me to deal with when I extract the results...but it all feels > > wrong. > > > > > > > > I do have complete control over the actual Solr installation (just not > > the indexing call to Solr), so I was hoping there may be a way to > > configure Solr to take my single field and split it up into a different > > field for each key in my json representation. > > > > > > > > I don't see anything native to Solr that would do this for me but there > > are a few features that I thought sounded similar and was hoping to get > > some opinions on how I may be able to move forward with this... > > > > > > > > Poly fields, such as the spatial location, might help? Can I build my > > own poly-field that would split up the main field into subfields? Do > > poly-fields let me return the subfields? I don't quite have my head > > around polyfields yet. > > > > > > > > Another option although I suspect this won't be considered a good > > approach, but what about extending the copyField functionality of > > schema.xml to support my needs? It would seem not entirely unreasonable > > that copyField would provide a means to extract only a portion of the > > contents of the source field to place in the destination field, no? I'm > > sure people more familiar with Solr's architecture could explain why > > this isn't really an appropriate thing for Solr to handle (just because > > it could doesn't mean it should)... > > > > The other - and probably best -- option would be to leverage Solr > > directly, bypassing the native integration of my application server, > > which we've already done for most cases. I'd love to go this route but > > I'm having a hard time figuring out how to "easily" accomplish the same > > functionality provided by my app server integration...perhaps someone on > > the list could help me with this path forward? Here is what I'm trying > > to accomplish: > > > > > > > > I'm indexing documents (text, pdf, html...) but I need to include fields > > in the results of my searches which are only available from a db query. > > I know how to have Solr index results from a db query, but I'm having > > trouble getting it to index the documents that are associated to each > > record of that query (full path/filename is one of the fields of that > > query). > > > > > > > > I started to try to use the dataImport handler to do this, by setting up > > a FileDataSource in addition to my jdbc data source. I tried to > > leverage the filedatasource to populate a sub-entity based on the db > > field that contains the full path/filename, but I wasn't sure how to > > specify the db field from the root query/entity. Before I spent too > > much time, I also realized I wasn't sure how to get Solr to deal with > > binary file types this way either which upon further reading seemed like > > I would need to leverage Tika - can that be done within the confines of > > dataimporthandler? > > > > > > > > Any advice is greatly appreciated. Thanks in advance, > > > > > > > > Joe