[Nutch-dev] Re: mapred crawling exception - Job failed!

2006-01-04 Thread Andrzej Bialecki
Lukas Vlcek wrote: I gave it a next try this night and I still have troubles. This is the very end of my log (full version is attached) and you can see another nasty exception: Do you use the Fetcher in parsing or non-parsing mode, i.e. do you run a ParseSegment as a separate step? -- B

[Nutch-dev] Re: mapred crawling exception - Job failed!

2006-01-04 Thread Lukas Vlcek
I gave it a next try this night and I still have troubles. This is the very end of my log (full version is attached) and you can see another nasty exception: ... 060104 213644 map 100% 060104 213645 Optimizing index. java.lang.NullPointerException: value cannot be null at org.apache.lucen

[Nutch-dev] Re: no static NutchConf

2006-01-04 Thread ilango gurusamy
Stefan I would like to help you to do your project on the Nutch-based search appliance deamon. The reason is: I want to have experience and learn stuff. I started playing around with Nutch. I wrote a scraper in perl and now I am trying to run one of the sample plugins too ilango Stefa

[Nutch-dev] Re: IndexSorter optimizer

2006-01-04 Thread Byron Miller
Great reading and great ideas. In such a system where you have say 3 segment partitions is it possible to build a mapreduce job to efficiently fetch, retreive and update these segments? Use a map job to process a segment for deletion and somehow process that segment to create a new fetchlist from

[Nutch-dev] Re: no static NutchConf

2006-01-04 Thread Stefan Groschupf
Another use case for eliminating the static uses of NutchConf is to simplify the construction of a configuration gui. It would be nice to have a web-based interface which permits one to configure parameters and then have it run the system. This should be able to run multiple Nutch instanc

[Nutch-dev] injection infinite loop

2006-01-04 Thread Andy Liu
If you inject the crawldb with a url file that doesn't end with a line feed, an infinite loop is entered. Anybody else encounter this problem? 060104 160950 Running job: job_7uku5w 060104 160952 map 0% 060104 160954 map 50% 060104 160957 map -2631% 060104 160959 map -259756% 060104 161002 ma

[Nutch-dev] Re: no static NutchConf

2006-01-04 Thread David Wallace
Hi Stefan, I think these are fine things to be doing. Just two points: (1) Why not just always pass the NutchConf to the constructor of any class that needs it? Instead of distinguishing between the case of whether the class will use 1 or 2 configuration parameters; or more than that. Just for

[Nutch-dev] [jira] Closed: (NUTCH-142) NutchConf should use the thread context classloader

2006-01-04 Thread Piotr Kosiorowski (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-142?page=all ] Piotr Kosiorowski closed NUTCH-142: --- Fix Version: 0.7.2-dev 0.8-dev Resolution: Fixed > NutchConf should use the thread context classloader > ---

[Nutch-dev] Re: no static NutchConf

2006-01-04 Thread Doug Cutting
Andrzej Bialecki wrote: Example: what happens now if you try to run more than one fetcher at the same time, where the fetcher parameters differ (or a set of activated plugins differs)? You can't - the local tasks on each tasktracker will use whatever local config is there. That's true when ma

[Nutch-dev] Re: no static NutchConf

2006-01-04 Thread Thomas Jaeger
Hi, Stefan Groschupf wrote: [...] > Any comments, improvement suggestions, more use-cases? I completely agree with you. I have two more ideas: 1) create NutchConf as interface (not class) 2) make it work as plugin 1) If NutchConf is an interface, the NutchConf implementation can be written with

[Nutch-dev] [jira] Commented: (NUTCH-39) pagination in search result

2006-01-04 Thread Neal Whitley (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-39?page=comments#action_12361786 ] Neal Whitley commented on NUTCH-39: --- Sorry I'm new to Java but finally figured out what the problem was and resolved it: (Declaration tags) <%! private static int suppose

[Nutch-dev] Re: svn commit: r365850 - in /lucene/nutch/trunk/src/plugin/protocol-httpclient: ./ lib/ src/java/org/apache/nutch/protocol/httpclient/

2006-01-04 Thread Andrzej Bialecki
Piotr Kosiorowski wrote: Andrzej, Do you think it would be a good idea to commit it in 0.7 branch for 0.7.2 release? I personally prefer to use released libraries instead of RC if possible. It does not require a lot of changes and you have already tested it with existing code... Piotr I d

[Nutch-dev] Re: svn commit: r365850 - in /lucene/nutch/trunk/src/plugin/protocol-httpclient: ./ lib/ src/java/org/apache/nutch/protocol/httpclient/

2006-01-04 Thread Piotr Kosiorowski
Andrzej, Do you think it would be a good idea to commit it in 0.7 branch for 0.7.2 release? I personally prefer to use released libraries instead of RC if possible. It does not require a lot of changes and you have already tested it with existing code... Piotr [EMAIL PROTECTED] wrote: Author

[Nutch-dev] Re: no static NutchConf

2006-01-04 Thread Piotr Kosiorowski
+1 in general In fact I like the approach presented by Stefan to pass only required parameters to objects that have small number of configurable params instead of NutchConf - it makes it obvious which parameters are required for such basic objects to run and as they are usually building blocks

[Nutch-dev] [jira] Commented: (NUTCH-164) Locale (language) choice by first session has global effect to all sessions

2006-01-04 Thread KuroSaka TeruHiko (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-164?page=comments#action_12361782 ] KuroSaka TeruHiko commented on NUTCH-164: - Actually, the current language selection scheme needs an overhaul. The locale for the message bundle is determined only by th

[Nutch-dev] [jira] Commented: (NUTCH-39) pagination in search result

2006-01-04 Thread Neal Whitley (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-39?page=comments#action_12361781 ] Neal Whitley commented on NUTCH-39: --- When I try to add Jacks code on search.jsp I'm getting an Exception report: org.apache.jasper.JasperException: Unable to compile class fo

[Nutch-dev] Re: no static NutchConf

2006-01-04 Thread Andrzej Bialecki
Jérôme Charron wrote: Excuse me in advance, I probably missed something, but what are the use cases for having many NutchConf instances with different values? Running many different tasks in parallel, each using different config, inside the same JVM. Ok, I understand this Andrzej,

[Nutch-dev] [jira] Created: (NUTCH-164) Locale (language) choice by first session has global effect to all sessions

2006-01-04 Thread KuroSaka TeruHiko (JIRA)
Locale (language) choice by first session has global effect to all sessions --- Key: NUTCH-164 URL: http://issues.apache.org/jira/browse/NUTCH-164 Project: Nutch Type: Bug Components: web gui

[Nutch-dev] Re: IndexSorter optimizer

2006-01-04 Thread Andrzej Bialecki
Doug Cutting wrote: Byron Miller wrote: On optimizing performance, does anyone know if google is exporting its entire dataset as an index or only somehow indexing the topN % (since they only show the first 1000 or so results anyway) Both. The highest-scoring pages are kept in separate inde

[Nutch-dev] Re: no static NutchConf

2006-01-04 Thread Jérôme Charron
> >Excuse me in advance, I probably missed something, but what are the use > >cases for having many NutchConf instances with different values? > Running many different tasks in parallel, each using different config, > inside the same JVM. Ok, I understand this Andrzej, but it is not really what I

[Nutch-dev] RE: no static NutchConf

2006-01-04 Thread Steve Betts
If you are going to be able to reconfigure a nutch component at runtime, you need to remove any configuration from the constructor and have a method that allows you to get/set the configuration for the component. The problem with keeping the entire configuration in a single component is trying to d

[Nutch-dev] Re: no static NutchConf

2006-01-04 Thread Andrzej Bialecki
Jérôme Charron wrote: Excuse me in advance, I probably missed something, but what are the use cases for having many NutchConf instances with different values? Running many different tasks in parallel, each using different config, inside the same JVM. -- Best regards, Andrzej Bialecki

[Nutch-dev] Re: no static NutchConf

2006-01-04 Thread Jérôme Charron
> My idea is to be able using low level things outside of nutch also. > It is may a philosophically question in case of the map file writer > you pass a complete hashmap with a bunch of properties to the object, > but the objects only reads one int from this hashmap. I personal > don't like to use

[Nutch-dev] Re: IndexSorter optimizer

2006-01-04 Thread Doug Cutting
Byron Miller wrote: On optimizing performance, does anyone know if google is exporting its entire dataset as an index or only somehow indexing the topN % (since they only show the first 1000 or so results anyway) Both. The highest-scoring pages are kept in separate indexes that are searched f

[Nutch-dev] Re: no static NutchConf

2006-01-04 Thread Stefan Groschupf
I don't fully agree with this. In most such cases, you already have a NutchConf instance in the method or class context, so it makes sense to use it in the constructor. You could add these construtors with all parameters iterated, but I'd expect that the constructors using NutchConf would

[Nutch-dev] Re: no static NutchConf

2006-01-04 Thread Andrzej Bialecki
Stefan Groschupf wrote: Hi, to move forward in the direction of having a nutch gui, I would love to start removing the static access of NutchConf. Based on experience first I would love to get a kind of general agreement and a 'go' before wasting to much time for an unaccented solution.

[Nutch-dev] no static NutchConf

2006-01-04 Thread Stefan Groschupf
Hi, to move forward in the direction of having a nutch gui, I would love to start removing the static access of NutchConf. Based on experience first I would love to get a kind of general agreement and a 'go' before wasting to much time for an unaccented solution. I suggest: + removing Nut

[Nutch-dev] [jira] Created: (NUTCH-163) LogFormatter design

2006-01-04 Thread Daniel Feinstein (JIRA)
LogFormatter design --- Key: NUTCH-163 URL: http://issues.apache.org/jira/browse/NUTCH-163 Project: Nutch Type: Improvement Environment: All platforms Reporter: Daniel Feinstein In Nutch project LogFormatter has duplicated functionality: 1) Log

[Nutch-dev] Re: mapred crawling exception - Job failed!

2006-01-04 Thread Lukas Vlcek
Thanks guys! I really didn't have the latest copy... L. On 1/4/06, Byron Miller <[EMAIL PROTECTED]> wrote: > Fixed in the copy i run as i've been able to get my > 100k pages indexed without getting that error. > > -byron > > --- Andrzej Bialecki <[EMAIL PROTECTED]> wrote: > > > Lukas Vlcek wrote: >

[Nutch-dev] 优惠代开发票!

2006-01-04 Thread 宏远贸易有限公司
您好! 宏远贸易有限公司因进项较多,每月有部分结余发票可优惠对外代开.普通 发票(税率2%左右),增值发票(税率6%左右),可验证后付款.(注如普通发票金额 在30万以上税率0.6%) 联系电话:13631599266(陈先生) --- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJ

[Nutch-dev] Re: mapred crawling exception - Job failed!

2006-01-04 Thread Byron Miller
Fixed in the copy i run as i've been able to get my 100k pages indexed without getting that error. -byron --- Andrzej Bialecki <[EMAIL PROTECTED]> wrote: > Lukas Vlcek wrote: > > >Hi, > > > >I am trying to use the latest nutch-trunk version > but I am facing > >unexpected "Job failed!" exceptio

[Nutch-dev] Re: mapred crawling exception - Job failed!

2006-01-04 Thread Andrzej Bialecki
Lukas Vlcek wrote: Hmmm... If I am looking correctly into my local SVN copy then I see I last updated yesterday - thus I have revision 365850 (Update of HTTPClient to v3.0). So this should be already fixed... :-( Andrzej, since you did probably the fix, is there anything special I should check

[Nutch-dev] Re: mapred crawling exception - Job failed!

2006-01-04 Thread Lukas Vlcek
Hmmm... If I am looking correctly into my local SVN copy then I see I last updated yesterday - thus I have revision 365850 (Update of HTTPClient to v3.0). So this should be already fixed... :-( Andrzej, since you did probably the fix, is there anything special I should check to be sure I have the

[Nutch-dev] Re: [bug] Re: NegativeArraySizeException in search server

2006-01-04 Thread Gal Nitzan
Yes correct. for a second I thought it was fixed :) On Wed, 2006-01-04 at 10:57 +0100, Marko Bauhardt wrote: > Hi, > I got the same Exception. The cause of this exception is the default > value of searcher.max.hits property in the nutch-default.xml. The > default value is Integer.MAX_VALUE.

[Nutch-dev] [bug] Re: NegativeArraySizeException in search server

2006-01-04 Thread Marko Bauhardt
Hi, I got the same Exception. The cause of this exception is the default value of searcher.max.hits property in the nutch-default.xml. The default value is Integer.MAX_VALUE. But the class org.apache.lucene.util.PriorityQueue increment this max.value. The next number after Integer.MAX_VALUE

[Nutch-dev] Re: mapred crawling exception - Job failed!

2006-01-04 Thread Gal Nitzan
Yes it was fixed. just update your code from trunk. On Wed, 2006-01-04 at 08:51 +0100, Andrzej Bialecki wrote: > Lukas Vlcek wrote: > > >Hi, > > > >I am trying to use the latest nutch-trunk version but I am facing > >unexpected "Job failed!" exception. It seems that all crawling work > >has been