Re: strategy for post-processing answer set

2011-09-28 Thread Chris Hostetter
: it looks to me as if Solr just brings back the URLs. what I want to do is to : get the actual documents in the answer set, simplify their HTML and remove : all the javascript, ads, etc., and append them into a single document. : : Now ... does Nutch already have the documents? can I get them

Re: strategy for post-processing answer set

2011-09-24 Thread Fred Zimmerman
ok. this is a very basic question so please bear with me. I see where the velocity templates are and I have looked at the documentation and get the idea of how to write them. it looks to me as if Solr just brings back the URLs. what I want to do is to get the actual documents in the answer set,

Re: strategy for post-processing answer set

2011-09-23 Thread Fred Zimmerman
This seems to be out of date. I am running Solr 3.4 * the file structure of apachehome/contrib is different and I don't see velocity anywhere underneath * the page referenced below only talks about Solr 1.4 and 4.0 ? On Thu, Sep 22, 2011 at 19:51, Markus Jelsma markus.jel...@openindex.iowrote:

Re: strategy for post-processing answer set

2011-09-23 Thread Fred Zimmerman
ok, answered my own question, found velocity rw in solrconfig.xml. next question: where does velocity look for its templates? - Subscribe to the Nimble Books Mailing List http://eepurl.com/czS- for monthly updates On Fri, Sep 23, 2011 at

Re: strategy for post-processing answer set

2011-09-23 Thread Erik Hatcher
conf/velocity by default. See Solr's example configuration. Erik On Sep 23, 2011, at 12:37, Fred Zimmerman w...@nimblebooks.com wrote: ok, answered my own question, found velocity rw in solrconfig.xml. next question: where does velocity look for its templates?

strategy for post-processing answer set

2011-09-22 Thread Fred Zimmerman
Hi, I would like to take the HTML documents that are the result of a Solr search and combine them into a single HTML document that combines the body text of each individual document. What is a good strategy for this? I am crawling with Nutch and Carrot2 for clustering. Fred

Re: strategy for post-processing answer set

2011-09-22 Thread Markus Jelsma
Hi, Solr support the Velocity template engine and has veyr good support. Ideal for generating properly formatted output from the search engine. There's a clustering example and it's easy to format documents indexed by Nutch. http://wiki.apache.org/solr/VelocityResponseWriter Cheers Hi,

Re: strategy for post-processing answer set

2011-09-22 Thread Fred Zimmerman
can you say a bit more about this? I see Velocity and will download it and start playing around but I am not quite sure I understand all the steps that you are suggesting. Fred On Thu, Sep 22, 2011 at 19:51, Markus Jelsma markus.jel...@openindex.iowrote: Hi, Solr support the Velocity