On Friday 12 August 2011 14:07:42 Johan Svensson wrote: > Hello, > > This is my first question to this list, and I equally new to Nutch. > Sorry if this question might be too general. I'd be happy with good > links to documentation, if there are any. Google won't help me find > them. > > I need to understand how facets can be extracted from a web site crawled > by Nutch then indexed by Solr. On the web site, pages have meta tags, > like <meta name="price" content="123.45"/> or <meta name="categories" > content="category1, category2"/>. Can I tell Nutch to extract those and > Solr to treat them as facets?
You first need to extract meta data from your document in Nutch and add these as fields to your Nutch documents. I never tried it but there are some discussions about `extracting meta data using nutch` on the internet. Once the fields are in Solr you can use them as facets with ease. > > In the example above, I want to specify manually that the meta name > "categories" is to be treated as a facet, but the content should be > dynamically used as categories. > > Does it make sense? Is it possible to do with Nutch and Solr, or should > I rethink my way of using it? > > Best Regards, > > Johan -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350

