Re: problem with text/html content type of documents appears application/xhtml+xml in solr index

2012-11-27 Thread Eyeris Rodriguez Rueda
utlinks :http://portal.vcl.uci.cu/ > outlinks :http://postgresql.uci.cu > outlinks :http://www.redmine.org/ > outlinks :http://www.redmine.org/guide > contentLength : 5280 > > and this is the page code that i check with firefox. > > "http://www.w3.org/T

RE: problem with text/html content type of documents appears application/xhtml+xml in solr index

2012-11-27 Thread Markus Jelsma
tp://portal.vcl.uci.cu/ > outlinks :http://postgresql.uci.cu > outlinks :http://www.redmine.org/ > outlinks :http://www.redmine.org/guide > contentLength : 5280 > > and this is the page code that i check with firefox. > > "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.

Re: problem with text/html content type of documents appears application/xhtml+xml in solr index

2012-11-27 Thread Eyeris Rodriguez Rueda
> To: user@nutch.apache.org > Subject: RE: problem with text/html content type of documents appears > application/xhtml+xml in solr index > > Hi. Markus. > I was doing your recommendations but, my problem persist, some documents > still with application/xhtml+xml instead of text/ht

RE: problem with text/html content type of documents appears application/xhtml+xml in solr index

2012-11-27 Thread Markus Jelsma
th text/html content type of documents appears > application/xhtml+xml in solr index > > Hi. Markus. > I was doing your recommendations but, my problem persist, some documents > still with application/xhtml+xml instead of text/html. > I add the property to nutch-site.xml an

RE: problem with text/html content type of documents appears application/xhtml+xml in solr index

2012-11-27 Thread Eyeris Rodriguez Rueda
Hi. Markus. I was doing your recommendations but, my problem persist, some documents still with application/xhtml+xml instead of text/html. I add the property to nutch-site.xml and make the conf/contenttype-mapping.txt file moreIndexingFilter.mapMimeTypes true I'm using nutch 1.5.1.

RE: problem with text/html content type of documents appears application/xhtml+xml in solr index

2012-11-25 Thread Markus Jelsma
e an application/xhtml+xml to text/html in solr index. > > > > > -Mensaje original- > De: Markus Jelsma [mailto:markus.jel...@openindex.io] > Enviado el: domingo, 25 de noviembre de 2012 4:33 AM > Para: user@nutch.apache.org > Asunto: RE: problem with text/html

RE: problem with text/html content type of documents appears application/xhtml+xml in solr index

2012-11-25 Thread Eyeris Rodriguez Rueda
ginal- De: Markus Jelsma [mailto:markus.jel...@openindex.io] Enviado el: domingo, 25 de noviembre de 2012 4:33 AM Para: user@nutch.apache.org Asunto: RE: problem with text/html content type of documents appears application/xhtml+xml in solr index Hi - trunk's more indexing filter can map mim

RE: problem with text/html content type of documents appears application/xhtml+xml in solr index

2012-11-25 Thread Markus Jelsma
Hi - trunk's more indexing filter can map mime types to any target. With it you can map both (x)html mimes to text/html or to `web page`. https://issues.apache.org/jira/browse/NUTCH-1262 -Original message- > From:Eyeris Rodriguez Rueda > Sent: Sun 25-Nov-2012 00:48 > To: user@nutch.