I'm wondering, is there some way ("out of the box") to tell Solr that we're only interested in indexing certain parts of a page? For example, let's say I have a bunch of pages in my site that contain some common navigation elements, roughly like this:

<html>
  <head><title></title></head>
  <body>
    <div id="myNavBar">
      Stuff here about parts of my site
    </div>
    <div id="navBar2">
      More stuff about other parts of the site
    </div>
    ....A bunch of stuff particular to each individual page...
  </body>
</html>

Is there some way to either tell Solr to not index what's in the two divs whenever it encounters them (and it will-in nearly every page) or, failing that, to somehow easily give content in those areas a large negative score in order to get the same effect?

FWIW, we are using Nutch to do the crawling, but as I understand it there's no way to get Nutch to skip only parts of pages without writing custom code, right?

Reply via email to