Jack Tang wrote:
It was odd that when I input
every command, the NameNode would throw exception:
051206 003714 Server connection on port 9000 from 127.0.0.1: starting
051206 003715 Server connection on port 9000 from 127.0.0.1 caught:
java.net.SocketException: Connection reset
java.net.SocketExc
When you run multiple commands within nutch it seems to process the
pending tasks in the order that they were added to the queue. In some
cases this means you may be 50% through many jobs (complete map but not
reduce) while processes maps for yet more jobs.
I think Nutch should prioritize a pendi
Hey folks,
We're looking at launching a search engine in the beginning of the new
year that will eventually grow to being a multi-billion page index.
Three questions:
First, and most important for now, does anyone have any useful numbers
for what the hardware requirements are to run such an
[
http://issues.apache.org/jira/browse/NUTCH-133?page=comments#action_12359770 ]
Jerome Charron commented on NUTCH-133:
--
Doug,
Oh, yes, I understand what you mean. Yes, that realy make sense.
I will commit a patch of Content for this (and removing all
Doug Cutting wrote:
Implementing something like this for Lucene would not be too difficult.
The index would need to be re-sorted by document boost: documents would
be re-numbered so that highly-boosted documents had low document
numbers.
In particular, one could:
1. Create an array of int[ma
Doug Cutting wrote:
Andrzej Bialecki wrote:
Hmm... Please define what "adequate" means. :-) IMHO, "adequate" is
when for any query the response time is well below 1 second.
Otherwise the service seems sluggish. Response times over 3 seconds
are normally not acceptable.
It depends. Clearl
[
http://issues.apache.org/jira/browse/NUTCH-133?page=comments#action_12359753 ]
Doug Cutting commented on NUTCH-133:
Stefan,
The primary reason to keep classes and method names the same is to simplify the
evaluation of your patch. A good patch should
Andrzej Bialecki wrote:
Hmm... Please define what "adequate" means. :-) IMHO, "adequate" is when
for any query the response time is well below 1 second. Otherwise the
service seems sluggish. Response times over 3 seconds are normally not
acceptable.
It depends. Clearly an average response ti
Piotr Kosiorowski wrote:
Hi,
I started to think about implementing special kind of Lucene Query (if I
remember correctly I would have to write my own Scorer and probably a few
other classes) optimized for Nutch some time ago. I assumed having
specialized query I would be able to avoid accessing
[
http://issues.apache.org/jira/browse/NUTCH-133?page=comments#action_12359729 ]
Jerome Charron commented on NUTCH-133:
--
Stefan:
Taking a closer look at the ParserFactory patch:
1. You can use the MimeType.clean(String) static method to clean the
cont
Hi,
I started to think about implementing special kind of Lucene Query (if I
remember correctly I would have to write my own Scorer and probably a few
other classes) optimized for Nutch some time ago. I assumed having
specialized query I would be able to avoid accessing some of lucene index
structu
[ http://issues.apache.org/jira/browse/NUTCH-133?page=all ]
Stefan Groschupf closed NUTCH-133:
--
Resolution: Won't Fix
We will split the problems described here into a set of bugs to fix things step
by step.
> ParserFactory does not work as expect
[
http://issues.apache.org/jira/browse/NUTCH-133?page=comments#action_12359725 ]
Stefan Groschupf commented on NUTCH-133:
Doug,
ok, I will split things in different patches and open a set of new bugs.
Jerome:
If you take a carefully look to my pat
Dear all,
Currently I'm using the Nutch plug-in "clustering-carrot2" and would like to
ask for some help. When I built the search result clusters, only the search
results that occurred twice or more will be grouped into one cluster. At the
same time, if some results(keywords) only occur once, i
[
http://issues.apache.org/jira/browse/NUTCH-133?page=comments#action_12359717 ]
Jerome Charron commented on NUTCH-133:
--
I think Doug proposal is the good way of solving this content-type issue.
This solution just miss the "guess mime-type from file ext
(Moving the discussion to nutch-dev, please drop the cc: when responding)
Doug Cutting wrote:
Andrzej Bialecki wrote:
It's nice to have these couple percent... however, it doesn't solve
the main problem; I need 50 or more percent increase... :-) and I
suspect this can be achieved only by som
16 matches
Mail list logo