Hi

Of course we all understand that these two types are not the same and serve
for different purposes but since Nutch doesn't make difference between them
it would be possible and reasonable to make content-type the same.

But there are might be some problems. Some nutch users might rely on
content-type and apply special parser for application/xhtml+xml,
considering maybe additional namespaces.

Of course for indexing and searching it replacement would be good.


in fact there many other examples when content type of different types can
be treated in the smae way and what if we had a feature of grouping several
content types into single?

Best Regards
Alexander Aristov


On 30 January 2012 17:12, Markus Jelsma <[email protected]> wrote:

> Hi,
>
> Should we not provide an optional replace for the content type field in
> index-
> more? They are the same for end-users but end up differently in an index.
>
> Thoughts?
> Thanks
>

Reply via email to