: The design issue for this is to be clear about the schema and how : documents are mapped into the schema. If all document types are : mapped into the same schema, then one type of query will work : for all. If the documents have different schemas (in the search : index), then the query needs an expansion specific to each : document type.
Right, the only way to provide a general purpose solution is to make sure any out of the box "UpdateParsers" (using the interface names from my previous email) can be configured in the solrconfig.xml to map the native concepts in the document format to user defined schema fields. (people writing their own custom UpdateParsers could allways hardcode their schema fields) I don't know anything about PDF structure, but using your RFC-2822 email as an example, the configuration for an Rfc2822UpdateParser would need to be able to specify which Headers map to which fields, and what to do with body text -- in theory, it could also be configured with refrences to other UpdateParser instances for dealing with multi-part mime messages (one other good out of the box UpdateParser hat i forgot to mention before would be an XSLTUpdateParser that could take in XML in any format the user wanted to send, along with the URL of an XSLT to apply to convert it to the Solr Standard <add><doc> format) -Hoss