Re: Fetcher threads & automation

2007-01-29 Thread Justin Hartman
obStream.py script? In the top of the logging file there is a section called formatters like this: [formatters] keys=simple Dennis Kubes Justin Hartman wrote: > Hi Dennis > > This is a great contribution and I personally thank you for making it > available to the community. > > I

Re: Fetcher threads & automation

2007-01-28 Thread Justin Hartman
gt;> you a copy. >> >> We are currently working on a more in-depth framework for automating >> these types of job streams in python but that is not complete yet. >> >> Andrzej, do you think this is something we should post to the wiki? > > Sure, if it's ok for you to release it I'm sure many people would find > it useful. > -- Regards Justin Hartman PGP Key ID: 102CC123

Re: Error while accessing Nutch from browser/tomcat, command-line works fine

2007-01-28 Thread Justin Hartman
mission java.security.AllPermission; Once done restart all as described above. On some systems the first hack will be suffice however there are some setups that require the AllPermission directive. Hope this helps. -- Regards Justin Hartman PGP Key ID: 102CC123

Re: Fetcher threads & automation

2007-01-28 Thread Justin Hartman
and/or index once it has been fetched or will the whole index need to be re-created? -- Regards Justin Hartman PGP Key ID: 102CC123

Fetcher threads & automation

2007-01-28 Thread Justin Hartman
s a daemon in the background and I can worry about other issues. Thank you in advance -- Regards Justin Hartman PGP Key ID: 102CC123

Re: Using Nutch for special content pages

2007-01-09 Thread Justin Hartman
nd how would you use it? I've been very interested in this plugin but it's not altogether documented that well (I don't think). -- Regards Justin Hartman PGP Key ID: 102CC123

Re: Error after SVN update

2007-01-08 Thread Justin Hartman
apred.JobClient.runJob(JobClient.java:399) > at org.apache.nutch.indexer.Indexer.index(Indexer.java:297) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:134) > -- Regards Justin Hartman PGP Key ID: 102CC123

Google Search on Nutch?

2007-01-03 Thread Justin Hartman
I'm sorry but I have to ask this question - stupid as it may seem Why does the Nutch home page [1] have Google Search integrated into the site when surely it should be using Nutch? What better a demonstration of the Nutch system than the Nutch home page? -- Regards Justin Hartman PGP K

Re: fetcher : some doubts

2007-01-02 Thread Justin Hartman
x27;t delete the index as people will have nothing to search for while the index is being re-built. Is there another way of doing this or am I missing the plot here big time? -- Regards Justin Hartman PGP Key ID: 102CC123

Re: fetcher : some doubts

2007-01-02 Thread Justin Hartman
On 1/2/07, Sean Dean <[EMAIL PROTECTED]> wrote: There actually isn't much of a reason to generate "huge" multi-million page fetch lists when you can create lots of smaller ones and merge them together. This allows for more of a ladder-style approach, and in some cases reduces the risk of errors

(SOLVED) Searching via http & statistical data

2006-12-29 Thread Justin Hartman
work. -- Regards Justin Hartman PGP Key ID: 102CC123

Re: Searching via http & statistical data

2006-12-29 Thread Justin Hartman
h-0.8.1.war file is located in this directory. Not an ideal situation this Regards Justin On 12/29/06, Nitin Borwankar <[EMAIL PROTECTED]> wrote: Nitin Borwankar wrote: > Justin Hartman wrote: > >> Hi guys >> >> I have my nutch system working pretty reasonably

Searching via http & statistical data

2006-12-29 Thread Justin Hartman
[2] http://localhost:9080/search.jsp?lang=en&query=apache [3] http://wiki.apache.org/nutch/NutchTutorial [4] http://lucene.apache.org/nutch/tutorial8.html [5] http://wiki.apache.org/nutch/FAQ#head-0c5dd359a76f9ac5ed54f9d81d79130e4c9c3302 -- Regards Justin Hartman PGP Key ID: 102CC123

Re: DmozParser Question

2006-12-28 Thread Justin Hartman
Hi Alan Just added the regex as suggested and running a fetch now. All is working brilliantly. Thanks for the help! Justin On 12/29/06, Justin Hartman <[EMAIL PROTECTED]> wrote: On 12/29/06, Alan Tanaman <[EMAIL PROTECTED]> wrote: > Hope that does the trick (haven't actua

Re: DmozParser Question

2006-12-28 Thread Justin Hartman
On 12/29/06, Alan Tanaman <[EMAIL PROTECTED]> wrote: Hope that does the trick (haven't actually tested it though...) Thanks Alan. I will implement it tomorrow and test it out to see if all is ok. I'll let you know how it all went. Regards Justin Justin, Normally, you can include the hyphen

Re: DmozParser Question

2006-12-28 Thread Justin Hartman
into our new file, which is "co-uk-urls" and ready to be injected into the Nutch DB. Lazy mans solution right here. Enjoy! - Original Message From: Justin Hartman <[EMAIL PROTECTED]> To: nutch-user@lucene.apache.org Sent: Thursday, December 28, 2006 5:08:30 AM Subject:

DmozParser Question

2006-12-28 Thread Justin Hartman
filter the Dmoz file to only include certain tld's such as .co.uk only in the dmoz/url file? I noticed that DmozParser supports both boolean and pattern however I'm not really sure how to implement it. Any help appreciated. -- Regards Justin Hartman PGP Key ID: 102CC123