RE: Nutch 1.7 + AJAX Solr returning ALL contents vs. SPECIFIC

Markus Jelsma Mon, 11 Nov 2013 11:17:16 -0800
Ah yes, this is probably about extracting it from pages, not returning it. 
Headings can be extracted using the headings plugin which is available in 1.7. 
You can also use Xpath for extraction but there's not a plugin available yet 
plus it won't work with parse-tika.
 
-----Original message-----
> From:Olle Romo <[email protected]>
> Sent: Monday 11th November 2013 19:50
> To: [email protected]
> Subject: Re: Nutch 1.7 + AJAX Solr returning ALL contents vs. SPECIFIC
> 
> Hi Mark,
> 
> Not sure if this is exactly what you're looking for but maybe try the 
> whitelist_blacklist_plugin from NUTCH-585 
> https://issues.apache.org/jira/browse/NUTCH-585
> 
> Best,
> Olle
> 
> On Nov 11, 2013, at 7:01 PM, "Reyes, Mark" <[email protected]> wrote:
> 
> > Hi:
> > 
> > I’m using Nutch 1.7 to crawl/index the pages of my domain to Solr and 
> > JavaScript library AJAX Solr to capture that index as JSON, which would 
> > then print that to the front-end.
> > 
> > My question is, if it’s possible to have specific content return (i.e. An 
> > H2 tag and a p tag) on the search results page versus all contents of that 
> > page?
> > 
> > Thank you,
> > Mark
> > 
> > 
> > IMPORTANT NOTICE: This e-mail message is intended to be received only by 
> > persons entitled to receive the confidential information it may contain. 
> > E-mail messages sent from Bridgepoint Education may contain information 
> > that is confidential and may be legally privileged. Please do not read, 
> > copy, forward or store this message unless you are an intended recipient of 
> > it. If you received this transmission in error, please notify the sender by 
> > reply e-mail and delete the message and any attachments.
> 
>
RE: Nutch 1.7 + AJAX Solr returning ALL contents vs. SPECIFIC

Reply via email to