There is a nutch developer in my neighborhood. Yes sir.
So lets stay in touch.
Mario
2009/9/24, Marko Bauhardt :
> Hi list.
> we have pushed the second nutch gui release version 0.2.
>
> You can download the binary or the sources on
> http://github.com/101tec/nutch/downloads
> Two main features
BELLINI ADAM wrote:
me again,
i forgot to tell u the easiest way...
once the crawl is finished you can dump the whole db (it contains all the links
to your html pages) in a text file..
./bin/nutch readdb crawl_folder/crawldb/ -dump DBtextFile
and you can perfor the wget on this db and archi
me again,
i forgot to tell u the easiest way...
once the crawl is finished you can dump the whole db (it contains all the links
to your html pages) in a text file..
./bin/nutch readdb crawl_folder/crawldb/ -dump DBtextFile
and you can perfor the wget on this db and archive the files
> Fro
hi
mabe you can run a crawl (dont forget to filter the pages just to keep html or
htm files (you will do it at conf/crawl-urlfilter.txt) )
after that you will go to the hadoop.log file and grep the sentence
'fetcher.Fetcher - fetching http' to get all the fetched urls.
dont forget to sort the f
Thanks Magnús and Susam for your responses and pointing me in the right
direction. I think I would spend time over the next few weeks trying out Nutch
over. I only needed the HTML – I don’t care if it is in the Database or in
separate files.
Thanks guys,
O.O.
--- Mer 30/9/09, Magnús Skúlaso
hi,
do you have some metadata 'lang' on the pages . becoz the plugin try first to
get the language form metadata..
if you see in the java source of the plugin LanguageIndexingFilter.java
// check if LANGUAGE found, possibly put there by HTMLLanguageParser
String lang = parse.getData().g
That's 1.0
Thanks a lot. I'll give it a try.
პატივისცემით,
დავით ჯაში
On Wed, Sep 30, 2009 at 18:37, Marko Bauhardt wrote:
>
>
>> Sorry for my bad English, I`ll rephrase:
>
> :) No Problem.
>
>>
>>
>> Can I add this GUI to existing Nutch installation? I've made some
>> modifications to mine,
I´ve solved this problem using ant 1.6.5 instead of 1.7
El 29 de septiembre de 2009 12:18, Jaime Martín escribió:
> Hi again:
> I just want to be able to build nucth in eclipse. What version do you use?
> Is last official release 1.0 not advisable? any plugin or reliable svn
> version required?
>
Sorry for my bad English, I`ll rephrase:
:) No Problem.
Can I add this GUI to existing Nutch installation? I've made some
modifications to mine, so starting from scratch would be quite
time-consuming.
Ah ok understand. Hm. The gui is forked from the release-1.0 tag. what
for nutch ver
Thanks,
Sorry for my bad English, I`ll rephrase:
Can I add this GUI to existing Nutch installation? I've made some
modifications to mine, so starting from scratch would be quite
time-consuming.
პატივისცემით,
დავით ჯაში
On Wed, Sep 30, 2009 at 18:19, Marko Bauhardt wrote:
> Hi David.
> sorry
Hi David.
sorry i dont understand your question. documentation about the nutch
gui can you find here
http://wiki.github.com/101tec/nutch
marko
On Sep 30, 2009, at 4:02 PM, David Jashi wrote:
Any documentation on how to add this GUI to existing NUtch instance?
პატივისცემით,
დავით ჯაში
Any documentation on how to add this GUI to existing NUtch instance?
პატივისცემით,
დავით ჯაში
2009/9/30 Bartosz Gadzimski :
> Hello,
>
> First - great job, it looks and works very nice.
>
> I have a question about urlfilters. Is this possible to get regex-urlfilter
> per instance (different fo
On Sep 30, 2009, at 3:47 PM, Bartosz Gadzimski wrote:
Hello,
Hi Bartosz
First - great job, it looks and works very nice.
:) Thanks!
I have a question about urlfilters. Is this possible to get regex-
urlfilter per instance (different for each instance) ?
good idea. i think you cou
Hello,
First - great job, it looks and works very nice.
I have a question about urlfilters. Is this possible to get
regex-urlfilter per instance (different for each instance) ?
Also what for is nutch-gui/conf/regex-urlfilter.txt file ?
Feature request - option to merge segments or maybe remo
On Wed, Sep 30, 2009 at 01:12, BELLINI ADAM wrote:
>
> hi
>
> try to activate the language-identifier plugin
> you must add it in the nutch-site.xml file in the
> plugin.includes section.
Ooops. It IS activated.
2009-09-29 16:39:15,671 INFO plugin.PluginRepository -
Language Identification Pa
On Wed, Sep 30, 2009 at 01:12, BELLINI ADAM wrote:
>
> hi
>
> try to activate the language-identifier plugin
> you must add it in the nutch-site.xml file in the
> plugin.includes section.
Shame on me! Thanks a lot.
>
> it's some thing like that
>
>
>
>
> plugin.includes
> protocol-httpclien
Actually its quite easy to modify the parse-html filter to do this.
That is saving the HTML to a file or to some database, you could then
configure it to skip all unnecessary plugins. I think it depends a lot on
the other requirements you have whether using nutch for this task is the
right way to
17 matches
Mail list logo