Hi, Not too familiar these days with Nutch, but my guess is that a Solr analyser is getting applied. To have a field exactly as is, use the String fieldtype on Solr's schema.xml rather than tje text fieldtype.
Regards, Gora On 05-Aug-2011 6:35 PM, "Marek Bachmann" <[email protected]> wrote: > Hello people, > > I was just wondering how to avoid that the content-type string is split > in to multiple values. > For example: If a document has the content-type: "Application/pdf" it is > broken into three pieces "Application/pdf", "Application", "pdf" in the > solr filed type. > > I am not sure if this is done by nutch, or if it is an index topic in solr. > > Sure someone knows the answer to that. > > Thank you.

