Re: Block certain parts of HTML code from being indexed

2018-11-16 Thread Semyon Semyonov
/DOMContentUtils.java at private boolean getTextHelper(StringBuffer sb, Node node, boolean abortOnNestedAnchors, int anchorDepth)  Semyon Sent: Friday, November 16, 2018 at 10:34 AM From: "Jorge Betancourt" To: user@nutch.apache.org Subject: Re: Block certain parts of HTML code from being

Re: Block certain parts of HTML code from being indexed

2018-11-16 Thread Jorge Betancourt
ur environment - please only print this if you have to! > > > > > > -Original Message- > > From: Hany NASR > > Sent: Thursday, November 15, 2018 4:18 PM > > To: user@nutch.apache.org > > Subject: RE: Block certain parts of HTML code from being indexed

RE: Block certain parts of HTML code from being indexed

2018-11-16 Thread hany . nasr
: Thursday, November 15, 2018 4:18 PM To: user@nutch.apache.org Subject: RE: Block certain parts of HTML code from being indexed Hello Markus, What if I want to remove specific component or page section? Kind regards, Hany Shehata Solutions Architect, Marketing and Communications IT Corporate

RE: Block certain parts of HTML code from being indexed

2018-11-15 Thread hany . nasr
Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: Wednesday, November 14, 2018 4:11 PM To: user@nutch.apache.org Subject: RE: Block certain parts of HTML code from being indexed Hello Hany, Using parse-tika as your HTML parser, you can enable Boilerpipe (see nutch-default

RE: Block certain parts of HTML code from being indexed

2018-11-14 Thread Markus Jelsma
Hello Hany, Using parse-tika as your HTML parser, you can enable Boilerpipe (see nutch-default). Regards, Markus -Original message- > From:hany.n...@hsbc.com > Sent: Wednesday 14th November 2018 15:53 > To: user@nutch.apache.org > Subject: Block certain parts of HT

RE: Block certain parts of HTML code from being indexed

2018-11-14 Thread Yossi Tamari
h.apache.org > Subject: Block certain parts of HTML code from being indexed > > Hello All, > > I am using Nutch 1.15, and wondering if there is a feature for blocking > certain > parts of HTML code from being indexed (header & footer). > > Kind regards, >

Block certain parts of HTML code from being indexed

2018-11-14 Thread hany . nasr
Hello All, I am using Nutch 1.15, and wondering if there is a feature for blocking certain parts of HTML code from being indexed (header & footer). Kind regards, Hany Shehata Solutions Architect, Marketing and Communications IT Corporate Functions | HSBC Operations, Services and Technology