Re: Help in developing a vertical search using nutch

Guy McDowell Wed, 18 Jun 2014 09:59:01 -0700

Hey Vishal,

I'm attempting to do a very similar thing, but not with real estate. I'm
only about one step ahead of you in this process though, so I can't offer
much help.

I think you are on the right path as far as having Nutch crawl only
websites related to real estate. A whole web crawl starting with seed URLs
outside of that vertical would probably be a waste of your time. Might as
well start with seeds in the vertical.

I think if you're using Nutch with Solr as the front-end search for your
users, Solr will rank your results based on relevancy of the keywords
entered in the search. I'm focusing on learning Nutch right now, so I'm not
certain of everything Solr does.

>From the research I've done, using Nutch 1.x is better than 2.x as it is
more stable and has more features. I could be wrong, but I think that's
worth double checking on.

I look forward to following your progress and learning from you. Hopefully
my progress will be able to help you as well.

Cheers!

Guy McDowell
[email protected]
http://www.GuyMcDowell.com

On Wed, Jun 18, 2014 at 9:27 AM, Vishal Tomar <[email protected]>
wrote:

> Hi,
>
> I am new to apache nutch and web crawlers in general, I am trying to build
> a vertical search engine for real estate.
>
> Now, How do I implement the crawler? Probably use Nutch for the crawling
> and modify it to only extract links from a page if the page contents are
> relevant to real estate. I'd probably need to write some kind of relevancy
> scoring function which uses a mixture of keywords, ontology and some kind
> of similarity detection based on sites I know to be relevant.
>
> Now is there any way by which I can configure Nutch to use my relevancy
> scoring function or do I need to change the source code, Also I would
> prefer working in python over java as I am much more familiar with it, so
> is there any library in python for nutch.
>
> Apart from this I would really appreciate any more pointers regarding nutch
> in general.
>
> Thanks
> Vishal
>

Re: Help in developing a vertical search using nutch

Reply via email to