we could create an account for the project at SO, give the user list as an email address and set up an alert so that any question tagged as [nutch] gets sent to user@nutch.apache.org That should work shouldn't it?
On 12 February 2016 at 15:11, Mattmann, Chris A (3980) < chris.a.mattm...@jpl.nasa.gov> wrote: > That’s a cool idea but how would we set up the redirect since > wouldn’t that have to occur at SO? > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Chief Architect > Instrument Software and Science Data Systems Section (398) > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 168-519, Mailstop: 168-527 > Email: chris.a.mattm...@nasa.gov > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Associate Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > -----Original Message----- > From: Julien Nioche <lists.digitalpeb...@gmail.com> > Reply-To: "user@nutch.apache.org" <user@nutch.apache.org> > Date: Wednesday, February 10, 2016 at 6:48 AM > To: "user@nutch.apache.org" <user@nutch.apache.org> > Subject: Re: [MASSMAIL]Extract Contact Information - Custom Parser > > >See SO => > > > http://stackoverflow.com/questions/35299744/nutch-parser-plugin-collect-co > >ntact-information > > > >There seems to be more and more people sending the questions to both the > >ML > >and SO. Am wondering whether we should set up a redirect so that any > >question asked there lands automatically on the user list. Any thoughts? > > > >On 10 February 2016 at 14:43, Markus Jelsma <markus.jel...@openindex.io> > >wrote: > > > >> Yes, i would also implement a HtmlParserFilter plugin but execute the > >> regex on the parseText, because that is where you are going to find > >>phone > >> numbers etc. > >> Markus > >> > >> > >> > >> -----Original message----- > >> > From:Jorge Luis Betancourt González <jlbetanco...@uci.cu> > >> > Sent: Tuesday 9th February 2016 19:59 > >> > To: user@nutch.apache.org > >> > Subject: Re: [MASSMAIL]Extract Contact Information - Custom Parser > >> > > >> > Any particular requiremente that prevent you from implementing your > >> logic as a HtmlParser plugin? essentially the parsing will be done for > >>you > >> (by parse-html or parse-tika) and all you need to do is find the right > >> nodes and extract the desired information (see [1]). > >> > > >> > Regards, > >> > > >> > [1] http://svn.apache.org/repos/asf/nutch/trunk/src/plugin/headings/ > >> > > >> > ----- Mensaje original ----- > >> > De: "Bin Wang" <binwang...@gmail.com> > >> > Para: "Apache.Nutch.User" <user@nutch.apache.org> > >> > Enviados: Martes, 9 de Febrero 2016 13:19:35 > >> > Asunto: [MASSMAIL]Extract Contact Information - Custom Parser > >> > > >> > Hi there, > >> > > >> > I am working on a project that need to identify contact points on > >> company's > >> > website and used for the purpose of enhancing security. > >> > > >> > Right now, I managed to crawl several rounds of sites. The next step > >>will > >> > be to parse the HTML pages and locate where the contact information > >>is. > >> In > >> > this case, I am only interested in email addresses and phone > >>numbers.... > >> > > >> > Here is what I am planning to do, we can write a map reduce jobs to > >>parse > >> > HTML file and use things like regular expression in combo with > >> > Jsoup/Beautifulsoup HTML parsers to find the regular expression. > >> > > >> > However, I am wondering is there any parser plugin that has already > >>been > >> > implemented and maybe tested used for this purpose? > >> > > >> > Also, any feedback how to achieve this is much appreciated! > >> > > >> > Best regards, > >> > > >> > Bin > >> > > >> > > > > > > > >-- > > > >*Open Source Solutions for Text Engineering* > > > >http://www.digitalpebble.com > >http://digitalpebble.blogspot.com/ > >#digitalpebble <http://twitter.com/digitalpebble> > > -- *Open Source Solutions for Text Engineering* http://www.digitalpebble.com http://digitalpebble.blogspot.com/ #digitalpebble <http://twitter.com/digitalpebble>