Re: Handling disparate data sources in Solr

Chris Hostetter Mon, 08 Jan 2007 11:26:57 -0800

: The design issue for this is to be clear about the schema and how
: documents are mapped into the schema. If all document types are
: mapped into the same schema, then one type of query will work
: for all. If the documents have different schemas (in the search
: index), then the query needs an expansion specific to each
: document type.


Right, the only way to provide a general purpose solution is to make sure
any out of the box "UpdateParsers" (using the interface names from my
previous email) can be configured in the solrconfig.xml to map the native
concepts in the document format to user defined schema fields.

(people writing their own custom UpdateParsers could allways hardcode
their schema fields)

I don't know anything about PDF structure, but using your RFC-2822 email
as an example, the configuration for an Rfc2822UpdateParser would need to
be able to specify which Headers map to which fields, and what to do with
body text -- in theory, it could also be configured with refrences to
other UpdateParser instances for dealing with multi-part mime messages

(one other good out of the box UpdateParser hat i forgot to mention before
would be an XSLTUpdateParser that could take in XML in any format the user
wanted to send, along with the URL of an XSLT to apply to convert it to
the Solr Standard <add><doc> format)


-Hoss

Re: Handling disparate data sources in Solr

Reply via email to