: : IMHO this could be something to add for future versions of solr. The : Terrier IR-framework for example and other IR solutions allow to specify : different XML-elements, which should be indexed in only one (lucene) field.
I don't know anythign about Terrier, but there are lots of simple ways to achieve thigns like this with Solr depending on what exactly you want, two off the top of my head... 1) use an XSLT on the client to extract only the fields you want from your XML file and build up the text fields you send to solr (we have the framework in place for you do even o that XSLT server side) 2) send each element that you care about as a seperate field -- you could use xpath like descripters for the names, ie... <field name="//root/AAA/BBB/CCC">body of tag CCC</field> ...and then use copyField with a wildcard in the source to consolidate all tags into a single text field... <dynamicField name="//*" type="text" indexed="true" stored="true" /> <copyField source="//*" dest="text" /> -Hoss