obStream.py script? In the top
of the logging file there is a section called formatters like this:
[formatters]
keys=simple
Dennis Kubes
Justin Hartman wrote:
> Hi Dennis
>
> This is a great contribution and I personally thank you for making it
> available to the community.
>
> I
gt;> you a copy.
>>
>> We are currently working on a more in-depth framework for automating
>> these types of job streams in python but that is not complete yet.
>>
>> Andrzej, do you think this is something we should post to the wiki?
>
> Sure, if it's ok for you to release it I'm sure many people would find
> it useful.
>
--
Regards
Justin Hartman
PGP Key ID: 102CC123
mission java.security.AllPermission;
Once done restart all as described above.
On some systems the first hack will be suffice however there are some
setups that require the AllPermission directive.
Hope this helps.
--
Regards
Justin Hartman
PGP Key ID: 102CC123
and/or index once it has been fetched or will the whole
index need to be re-created?
--
Regards
Justin Hartman
PGP Key ID: 102CC123
s
a daemon in the background and I can worry about other issues.
Thank you in advance
--
Regards
Justin Hartman
PGP Key ID: 102CC123
nd how would you use it? I've been very
interested in this plugin but it's not altogether documented that well
(I don't think).
--
Regards
Justin Hartman
PGP Key ID: 102CC123
apred.JobClient.runJob(JobClient.java:399)
> at org.apache.nutch.indexer.Indexer.index(Indexer.java:297)
> at org.apache.nutch.crawl.Crawl.main(Crawl.java:134)
>
--
Regards
Justin Hartman
PGP Key ID: 102CC123
I'm sorry but I have to ask this question - stupid as it may seem
Why does the Nutch home page [1] have Google Search integrated into
the site when surely it should be using Nutch? What better a
demonstration of the Nutch system than the Nutch home page?
--
Regards
Justin Hartman
PGP K
x27;t
delete the index as people will have nothing to search for while the
index is being re-built.
Is there another way of doing this or am I missing the plot here big time?
--
Regards
Justin Hartman
PGP Key ID: 102CC123
On 1/2/07, Sean Dean <[EMAIL PROTECTED]> wrote:
There actually isn't much of a reason to generate "huge" multi-million page
fetch lists when you can create lots of smaller ones and merge them together. This allows
for more of a ladder-style approach, and in some cases reduces the risk of errors
work.
--
Regards
Justin Hartman
PGP Key ID: 102CC123
h-0.8.1.war file is
located in this directory.
Not an ideal situation this
Regards
Justin
On 12/29/06, Nitin Borwankar <[EMAIL PROTECTED]> wrote:
Nitin Borwankar wrote:
> Justin Hartman wrote:
>
>> Hi guys
>>
>> I have my nutch system working pretty reasonably
[2] http://localhost:9080/search.jsp?lang=en&query=apache
[3] http://wiki.apache.org/nutch/NutchTutorial
[4] http://lucene.apache.org/nutch/tutorial8.html
[5]
http://wiki.apache.org/nutch/FAQ#head-0c5dd359a76f9ac5ed54f9d81d79130e4c9c3302
--
Regards
Justin Hartman
PGP Key ID: 102CC123
Hi Alan
Just added the regex as suggested and running a fetch now. All is
working brilliantly. Thanks for the help!
Justin
On 12/29/06, Justin Hartman <[EMAIL PROTECTED]> wrote:
On 12/29/06, Alan Tanaman <[EMAIL PROTECTED]> wrote:
> Hope that does the trick (haven't actua
On 12/29/06, Alan Tanaman <[EMAIL PROTECTED]> wrote:
Hope that does the trick (haven't actually tested it though...)
Thanks Alan. I will implement it tomorrow and test it out to see if
all is ok. I'll let you know how it all went.
Regards
Justin
Justin,
Normally, you can include the hyphen
into our new file, which is "co-uk-urls" and ready to be injected into
the Nutch DB.
Lazy mans solution right here. Enjoy!
- Original Message
From: Justin Hartman <[EMAIL PROTECTED]>
To: nutch-user@lucene.apache.org
Sent: Thursday, December 28, 2006 5:08:30 AM
Subject:
filter the Dmoz file to only
include certain tld's such as .co.uk only in the dmoz/url file?
I noticed that DmozParser supports both boolean and pattern however
I'm not really sure how to implement it.
Any help appreciated.
--
Regards
Justin Hartman
PGP Key ID: 102CC123
17 matches
Mail list logo