utlinks :http://portal.vcl.uci.cu/
> outlinks :http://postgresql.uci.cu
> outlinks :http://www.redmine.org/
> outlinks :http://www.redmine.org/guide
> contentLength : 5280
>
> and this is the page code that i check with firefox.
>
> "http://www.w3.org/T
tp://portal.vcl.uci.cu/
> outlinks :http://postgresql.uci.cu
> outlinks :http://www.redmine.org/
> outlinks :http://www.redmine.org/guide
> contentLength : 5280
>
> and this is the page code that i check with firefox.
>
> "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.
> To: user@nutch.apache.org
> Subject: RE: problem with text/html content type of documents appears
> application/xhtml+xml in solr index
>
> Hi. Markus.
> I was doing your recommendations but, my problem persist, some documents
> still with application/xhtml+xml instead of text/ht
th text/html content type of documents appears
> application/xhtml+xml in solr index
>
> Hi. Markus.
> I was doing your recommendations but, my problem persist, some documents
> still with application/xhtml+xml instead of text/html.
> I add the property to nutch-site.xml an
Hi. Markus.
I was doing your recommendations but, my problem persist, some documents still
with application/xhtml+xml instead of text/html.
I add the property to nutch-site.xml and make the conf/contenttype-mapping.txt
file
moreIndexingFilter.mapMimeTypes
true
I'm using nutch 1.5.1.
e an application/xhtml+xml to text/html in solr index.
>
>
>
>
> -Mensaje original-
> De: Markus Jelsma [mailto:markus.jel...@openindex.io]
> Enviado el: domingo, 25 de noviembre de 2012 4:33 AM
> Para: user@nutch.apache.org
> Asunto: RE: problem with text/html
ginal-
De: Markus Jelsma [mailto:markus.jel...@openindex.io]
Enviado el: domingo, 25 de noviembre de 2012 4:33 AM
Para: user@nutch.apache.org
Asunto: RE: problem with text/html content type of documents appears
application/xhtml+xml in solr index
Hi - trunk's more indexing filter can map mim
Hi - trunk's more indexing filter can map mime types to any target. With it you
can map both (x)html mimes to text/html or to `web page`.
https://issues.apache.org/jira/browse/NUTCH-1262
-Original message-
> From:Eyeris Rodriguez Rueda
> Sent: Sun 25-Nov-2012 00:48
> To: user@nutch.
8 matches
Mail list logo