Hi everybody, I am trying to make search for my own website. For that I am using nutch and solr.
Problem with nutch is htmparser seems to me as a flat parser which concatenate everything title , Metatag , body into one single field content. Which is not my desired search result. Is it possible to separate somehow body part out of content or is it possible create body field that will have only <body></body> content of html page. thanks