Re: graphical user interface v0.2 for nutch

2009-09-30 Thread Mario Schroeder
There is a nutch developer in my neighborhood. Yes sir. So lets stay in touch. Mario 2009/9/24, Marko Bauhardt : > Hi list. > we have pushed the second nutch gui release version 0.2. > > You can download the binary or the sources on > http://github.com/101tec/nutch/downloads > Two main features

Re: R: Using Nutch for only retriving HTML

2009-09-30 Thread Andrzej Bialecki
BELLINI ADAM wrote: me again, i forgot to tell u the easiest way... once the crawl is finished you can dump the whole db (it contains all the links to your html pages) in a text file.. ./bin/nutch readdb crawl_folder/crawldb/ -dump DBtextFile and you can perfor the wget on this db and archi

RE: R: Using Nutch for only retriving HTML

2009-09-30 Thread BELLINI ADAM
me again, i forgot to tell u the easiest way... once the crawl is finished you can dump the whole db (it contains all the links to your html pages) in a text file.. ./bin/nutch readdb crawl_folder/crawldb/ -dump DBtextFile and you can perfor the wget on this db and archive the files > Fro

RE: R: Using Nutch for only retriving HTML

2009-09-30 Thread BELLINI ADAM
hi mabe you can run a crawl (dont forget to filter the pages just to keep html or htm files (you will do it at conf/crawl-urlfilter.txt) ) after that you will go to the hadoop.log file and grep the sentence 'fetcher.Fetcher - fetching http' to get all the fetched urls. dont forget to sort the f

Re: R: Using Nutch for only retriving HTML

2009-09-30 Thread O. Olson
Thanks Magnús and Susam for your responses and pointing me in the right direction. I think I would spend time over the next few weeks trying out Nutch over. I only needed the HTML – I don’t care if it is in the Database or in separate files. Thanks guys, O.O. --- Mer 30/9/09, Magnús Skúlaso

RE: Multilanguage support in Nutch 1.0

2009-09-30 Thread BELLINI ADAM
hi, do you have some metadata 'lang' on the pages . becoz the plugin try first to get the language form metadata.. if you see in the java source of the plugin LanguageIndexingFilter.java // check if LANGUAGE found, possibly put there by HTMLLanguageParser String lang = parse.getData().g

Re: graphical user interface v0.2 for nutch

2009-09-30 Thread David Jashi
That's 1.0 Thanks a lot. I'll give it a try. პატივისცემით, დავით ჯაში On Wed, Sep 30, 2009 at 18:37, Marko Bauhardt wrote: > > >> Sorry for my bad English, I`ll rephrase: > > :) No Problem. > >> >> >> Can I add this GUI to existing Nutch installation? I've made some >> modifications to mine,

Re: Specify at least one source--a file or resource collection error

2009-09-30 Thread Jaime Martín
I´ve solved this problem using ant 1.6.5 instead of 1.7 El 29 de septiembre de 2009 12:18, Jaime Martín escribió: > Hi again: > I just want to be able to build nucth in eclipse. What version do you use? > Is last official release 1.0 not advisable? any plugin or reliable svn > version required? >

Re: graphical user interface v0.2 for nutch

2009-09-30 Thread Marko Bauhardt
Sorry for my bad English, I`ll rephrase: :) No Problem. Can I add this GUI to existing Nutch installation? I've made some modifications to mine, so starting from scratch would be quite time-consuming. Ah ok understand. Hm. The gui is forked from the release-1.0 tag. what for nutch ver

Re: graphical user interface v0.2 for nutch

2009-09-30 Thread David Jashi
Thanks, Sorry for my bad English, I`ll rephrase: Can I add this GUI to existing Nutch installation? I've made some modifications to mine, so starting from scratch would be quite time-consuming. პატივისცემით, დავით ჯაში On Wed, Sep 30, 2009 at 18:19, Marko Bauhardt wrote: > Hi David. > sorry

Re: graphical user interface v0.2 for nutch

2009-09-30 Thread Marko Bauhardt
Hi David. sorry i dont understand your question. documentation about the nutch gui can you find here http://wiki.github.com/101tec/nutch marko On Sep 30, 2009, at 4:02 PM, David Jashi wrote: Any documentation on how to add this GUI to existing NUtch instance? პატივისცემით, დავით ჯაში

Re: graphical user interface v0.2 for nutch

2009-09-30 Thread David Jashi
Any documentation on how to add this GUI to existing NUtch instance? პატივისცემით, დავით ჯაში 2009/9/30 Bartosz Gadzimski : > Hello, > > First - great job, it looks and works very nice. > > I have a question about urlfilters. Is this possible to get regex-urlfilter > per instance (different fo

Re: graphical user interface v0.2 for nutch

2009-09-30 Thread Marko Bauhardt
On Sep 30, 2009, at 3:47 PM, Bartosz Gadzimski wrote: Hello, Hi Bartosz First - great job, it looks and works very nice. :) Thanks! I have a question about urlfilters. Is this possible to get regex- urlfilter per instance (different for each instance) ? good idea. i think you cou

Re: graphical user interface v0.2 for nutch

2009-09-30 Thread Bartosz Gadzimski
Hello, First - great job, it looks and works very nice. I have a question about urlfilters. Is this possible to get regex-urlfilter per instance (different for each instance) ? Also what for is nutch-gui/conf/regex-urlfilter.txt file ? Feature request - option to merge segments or maybe remo

Re: Multilanguage support in Nutch 1.0

2009-09-30 Thread David Jashi
On Wed, Sep 30, 2009 at 01:12, BELLINI ADAM wrote: > > hi > > try to activate the language-identifier plugin > you must add it in the nutch-site.xml file in the   > plugin.includes section. Ooops. It IS activated. 2009-09-29 16:39:15,671 INFO plugin.PluginRepository - Language Identification Pa

Re: Multilanguage support in Nutch 1.0

2009-09-30 Thread David Jashi
On Wed, Sep 30, 2009 at 01:12, BELLINI ADAM wrote: > > hi > > try to activate the language-identifier plugin > you must add it in the nutch-site.xml file in the   > plugin.includes section. Shame on me! Thanks a lot. > > it's some thing like that > > > > >  plugin.includes >  protocol-httpclien

Re: R: Using Nutch for only retriving HTML

2009-09-30 Thread Magnús Skúlason
Actually its quite easy to modify the parse-html filter to do this. That is saving the HTML to a file or to some database, you could then configure it to skip all unnecessary plugins. I think it depends a lot on the other requirements you have whether using nutch for this task is the right way to