Lukas Vlcek wrote:
I gave it a next try this night and I still have troubles.
This is the very end of my log (full version is attached) and you can
see another nasty exception:
Do you use the Fetcher in parsing or non-parsing mode, i.e. do you run a
ParseSegment as a separate step?
--
B
I gave it a next try this night and I still have troubles.
This is the very end of my log (full version is attached) and you can
see another nasty exception:
...
060104 213644 map 100%
060104 213645 Optimizing index.
java.lang.NullPointerException: value cannot be null
at org.apache.lucen
Stefan
I would like to help you to do your project on the Nutch-based search
appliance deamon. The reason is: I want to have experience and learn stuff. I
started playing around with Nutch. I wrote a scraper in perl and now I am
trying to run one of the sample plugins too
ilango
Stefa
Great reading and great ideas.
In such a system where you have say 3 segment
partitions is it possible to build a mapreduce job to
efficiently fetch, retreive and update these segments?
Use a map job to process a segment for deletion and
somehow process that segment to create a new fetchlist
from
Another use case for eliminating the static uses of NutchConf is to
simplify the construction of a configuration gui. It would be nice
to have a web-based interface which permits one to configure
parameters and then have it run the system. This should be able to
run multiple Nutch instanc
If you inject the crawldb with a url file that doesn't end with a line feed,
an infinite loop is entered. Anybody else encounter this problem?
060104 160950 Running job: job_7uku5w
060104 160952 map 0%
060104 160954 map 50%
060104 160957 map -2631%
060104 160959 map -259756%
060104 161002 ma
Hi Stefan,
I think these are fine things to be doing. Just two points:
(1) Why not just always pass the NutchConf to the constructor of any
class that needs it? Instead of distinguishing between the case of
whether the class will use 1 or 2 configuration parameters; or more than
that. Just for
[ http://issues.apache.org/jira/browse/NUTCH-142?page=all ]
Piotr Kosiorowski closed NUTCH-142:
---
Fix Version: 0.7.2-dev
0.8-dev
Resolution: Fixed
> NutchConf should use the thread context classloader
> ---
Andrzej Bialecki wrote:
Example: what happens now if you try to run more than one fetcher at the
same time, where the fetcher parameters differ (or a set of activated
plugins differs)? You can't - the local tasks on each tasktracker will
use whatever local config is there.
That's true when ma
Hi,
Stefan Groschupf wrote:
[...]
> Any comments, improvement suggestions, more use-cases?
I completely agree with you.
I have two more ideas:
1) create NutchConf as interface (not class)
2) make it work as plugin
1) If NutchConf is an interface, the NutchConf implementation can be
written with
[
http://issues.apache.org/jira/browse/NUTCH-39?page=comments#action_12361786 ]
Neal Whitley commented on NUTCH-39:
---
Sorry I'm new to Java but finally figured out what the problem was and resolved
it: (Declaration tags)
<%!
private static int suppose
Piotr Kosiorowski wrote:
Andrzej,
Do you think it would be a good idea to commit it in 0.7 branch for
0.7.2 release? I personally prefer to use released libraries instead
of RC if possible. It does not require a lot of changes and you have
already tested it with existing code...
Piotr
I d
Andrzej,
Do you think it would be a good idea to commit it in 0.7 branch for
0.7.2 release? I personally prefer to use released libraries instead of
RC if possible. It does not require a lot of changes and you have
already tested it with existing code...
Piotr
[EMAIL PROTECTED] wrote:
Author
+1 in general
In fact I like the approach presented by Stefan to pass only required
parameters to objects that have small number of configurable params
instead of NutchConf - it makes it obvious which parameters are required
for such basic objects to run and as they are usually building blocks
[
http://issues.apache.org/jira/browse/NUTCH-164?page=comments#action_12361782 ]
KuroSaka TeruHiko commented on NUTCH-164:
-
Actually, the current language selection scheme needs an overhaul.
The locale for the message bundle is determined only by th
[
http://issues.apache.org/jira/browse/NUTCH-39?page=comments#action_12361781 ]
Neal Whitley commented on NUTCH-39:
---
When I try to add Jacks code on search.jsp I'm getting an Exception report:
org.apache.jasper.JasperException: Unable to compile class fo
Jérôme Charron wrote:
Excuse me in advance, I probably missed something, but what are the use
cases for having many NutchConf instances with different values?
Running many different tasks in parallel, each using different config,
inside the same JVM.
Ok, I understand this Andrzej,
Locale (language) choice by first session has global effect to all sessions
---
Key: NUTCH-164
URL: http://issues.apache.org/jira/browse/NUTCH-164
Project: Nutch
Type: Bug
Components: web gui
Doug Cutting wrote:
Byron Miller wrote:
On optimizing performance, does anyone know if google
is exporting its entire dataset as an index or only
somehow indexing the topN % (since they only show the
first 1000 or so results anyway)
Both. The highest-scoring pages are kept in separate inde
> >Excuse me in advance, I probably missed something, but what are the use
> >cases for having many NutchConf instances with different values?
> Running many different tasks in parallel, each using different config,
> inside the same JVM.
Ok, I understand this Andrzej, but it is not really what I
If you are going to be able to reconfigure a nutch component at runtime, you
need to remove any configuration from the constructor and have a method that
allows you to get/set the configuration for the component. The problem with
keeping the entire configuration in a single component is trying to
d
Jérôme Charron wrote:
Excuse me in advance, I probably missed something, but what are the use
cases for having many NutchConf instances with different values?
Running many different tasks in parallel, each using different config,
inside the same JVM.
--
Best regards,
Andrzej Bialecki
> My idea is to be able using low level things outside of nutch also.
> It is may a philosophically question in case of the map file writer
> you pass a complete hashmap with a bunch of properties to the object,
> but the objects only reads one int from this hashmap. I personal
> don't like to use
Byron Miller wrote:
On optimizing performance, does anyone know if google
is exporting its entire dataset as an index or only
somehow indexing the topN % (since they only show the
first 1000 or so results anyway)
Both. The highest-scoring pages are kept in separate indexes that are
searched f
I don't fully agree with this. In most such cases, you already have
a NutchConf instance in the method or class context, so it makes
sense to use it in the constructor. You could add these construtors
with all parameters iterated, but I'd expect that the constructors
using NutchConf would
Stefan Groschupf wrote:
Hi,
to move forward in the direction of having a nutch gui, I would love
to start removing the static access of NutchConf.
Based on experience first I would love to get a kind of general
agreement and a 'go' before wasting to much time for an unaccented
solution.
Hi,
to move forward in the direction of having a nutch gui, I would love
to start removing the static access of NutchConf.
Based on experience first I would love to get a kind of general
agreement and a 'go' before wasting to much time for an unaccented
solution.
I suggest:
+ removing Nut
LogFormatter design
---
Key: NUTCH-163
URL: http://issues.apache.org/jira/browse/NUTCH-163
Project: Nutch
Type: Improvement
Environment: All platforms
Reporter: Daniel Feinstein
In Nutch project LogFormatter has duplicated functionality:
1) Log
Thanks guys!
I really didn't have the latest copy...
L.
On 1/4/06, Byron Miller <[EMAIL PROTECTED]> wrote:
> Fixed in the copy i run as i've been able to get my
> 100k pages indexed without getting that error.
>
> -byron
>
> --- Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
>
> > Lukas Vlcek wrote:
>
您好!
宏远贸易有限公司因进项较多,每月有部分结余发票可优惠对外代开.普通
发票(税率2%左右),增值发票(税率6%左右),可验证后付款.(注如普通发票金额
在30万以上税率0.6%)
联系电话:13631599266(陈先生)
---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJ
Fixed in the copy i run as i've been able to get my
100k pages indexed without getting that error.
-byron
--- Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
> Lukas Vlcek wrote:
>
> >Hi,
> >
> >I am trying to use the latest nutch-trunk version
> but I am facing
> >unexpected "Job failed!" exceptio
Lukas Vlcek wrote:
Hmmm...
If I am looking correctly into my local SVN copy then I see I last
updated yesterday - thus I have revision 365850 (Update of HTTPClient
to v3.0). So this should be already fixed... :-(
Andrzej, since you did probably the fix, is there anything special I
should check
Hmmm...
If I am looking correctly into my local SVN copy then I see I last
updated yesterday - thus I have revision 365850 (Update of HTTPClient
to v3.0). So this should be already fixed... :-(
Andrzej, since you did probably the fix, is there anything special I
should check to be sure I have the
Yes correct. for a second I thought it was fixed :)
On Wed, 2006-01-04 at 10:57 +0100, Marko Bauhardt wrote:
> Hi,
> I got the same Exception. The cause of this exception is the default
> value of searcher.max.hits property in the nutch-default.xml. The
> default value is Integer.MAX_VALUE.
Hi,
I got the same Exception. The cause of this exception is the default
value of searcher.max.hits property in the nutch-default.xml. The
default value is Integer.MAX_VALUE. But the class
org.apache.lucene.util.PriorityQueue increment this max.value.
The next number after Integer.MAX_VALUE
Yes it was fixed. just update your code from trunk.
On Wed, 2006-01-04 at 08:51 +0100, Andrzej Bialecki wrote:
> Lukas Vlcek wrote:
>
> >Hi,
> >
> >I am trying to use the latest nutch-trunk version but I am facing
> >unexpected "Job failed!" exception. It seems that all crawling work
> >has been
36 matches
Mail list logo