Re: Frontera: large-scale, distributed web crawling framework

2015-10-02 Thread Jessica Glover
n/ > ++ > Adjunct Associate Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++ > > &g

Re: Frontera: large-scale, distributed web crawling framework

2015-10-02 Thread Jessica Glover
Sorry, just re-read and saw that it's open source and under what license? I apologize if you're not trying to sell this. On Fri, Oct 2, 2015 at 11:45 AM, Jessica Glover wrote: > Hmm... you're asking for a free consultation on an open source software > user mailing list

Re: Frontera: large-scale, distributed web crawling framework

2015-10-02 Thread Jessica Glover
Hmm... you're asking for a free consultation on an open source software user mailing list? First, this doesn't exactly seem like the appropriate place for that. Second, offer some incentive if you want someone to help you with your business. On Fri, Oct 2, 2015 at 11:33 AM, Alexander Sibiryakov w

Re: Nutch 2.3 server job status listener?

2015-06-23 Thread Jessica Glover
I could query less frequently, but I thought maybe there was a better way to determine whether a process had completed. On Mon, Jun 22, 2015 at 7:19 PM, Lewis John Mcgibbney < lewis.mcgibb...@gmail.com> wrote: > Hi Jessica, > > On Fri, Jun 19, 2015 at 10:06 AM, > wrote: > > > > > I'm writing a J

Re: Nutch crawls not appearing in Kibana

2015-06-18 Thread Jessica Glover
I am having the same problem as Brooks, and I noticed that the timestamp field is one month ahead. This explains why you can't see the results in Kibana, but I'm wondering how to fix the timestamp problem. On Wed, Jun 17, 2015 at 3:20 PM, Lewis John Mcgibbney < lewis.mcgibb...@gmail.com> wrote: >

Nutch 2.3 server job status listener?

2015-06-18 Thread Jessica Glover
I'm writing a Java application that uses the Nutch REST API to execute the crawl cycle. I need to be able to call the next job only when the previous job is finished. Right now, the only way I know to achieve this is by using GET /job/{jobId} and checking for "state":"FINISHED" within the returned

2.3 REST API and batchId

2015-06-17 Thread Jessica Glover
I'm having trouble understanding the concept of a batch and which elements of the crawl cycle require a batchId. I've found that I need to specify a batch ID when I run a generate job, but a batchId is not required for the fetch job to finish. But then my parse job fails with: ERROR impl.JobWorke

Re: REST API for crawling

2015-06-12 Thread Jessica Glover
om/document/d/1OGg22ATohapP2ycewIaTcUnENc2FeyYzni0ED_Jjxz8/edit?usp=sharing > > > Best Regards, > Dzmitry > > On Fri, Jun 12, 2015 at 5:10 PM, Jessica Glover < > glover.jessic...@gmail.com> > wrote: > > > Hello. I am trying to test out the 2.3 REST API usin

REST API for crawling

2015-06-12 Thread Jessica Glover
Hello. I am trying to test out the 2.3 REST API using curl, but I'm having trouble with the commands. I found out what arguments to use for the inject job from searching the archives, and that was successful, but when I try generate with no args, it fails: { "args": {}, "conf