[Nutch-dev] [jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2005-12-16 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360645 ] Doug Cutting commented on NUTCH-139: I'm confused as to why all of the constant names have "X_nutch" in them. I'd expect to see something like that in their string values,

[Nutch-dev] [jira] Commented: (NUTCH-140) Add alias capability in parse-plugins.xml file that allows mimeType->extensionId mapping

2005-12-16 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-140?page=comments#action_12360643 ] Chris A. Mattmann commented on NUTCH-140: - Hey Stefan, Mainly, it would be to make them more human readable. Also, if I go in there and define all the aliases for th

[Nutch-dev] [jira] Updated: (NUTCH-139) Standard metadata property names in the ParseData metadata

2005-12-16 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=all ] Chris A. Mattmann updated NUTCH-139: Attachment: NUTCH-139.Mattmann.patch.txt Okay Folks, Here's the patch file. Phew. Spent the whole day working on this today. Finally got all the unit

[Nutch-dev] RE: "Something is Wrong with Google's Mathematical Model"

2005-12-16 Thread Paul Sutter
The paper claims that he's developed a better algorithm. Has he published that yet? Paul Sutter -Original Message- From: Stefan Groschupf [mailto:[EMAIL PROTECTED] Sent: Friday, December 16, 2005 11:28 AM To: nutch-dev@lucene.apache.org Subject: "Something is Wrong with Google's Mathe

[Nutch-dev] bug in parse-rtf?

2005-12-16 Thread Chris Mattmann
Hi Folks, Anybody been experiencing problems building the parse-rtf plugin? I just noticed while working on NUTCH-139 that there's a line at the end of RTFParser.java in parse-rtf that returns a new ParseImpl, however, the constructor for ParseData uses the old ParseData constructor (pre Andrz

Re: [Nutch-dev] Re: [VOTE] Commiter access for Stefan Groschupf

2005-12-16 Thread Kashif Khadim
+1 Thanks Stefan. --- Florent Gluck <[EMAIL PROTECTED]> wrote: > Totally agree, +1 > > Thanks for the help :) > > --Flo > > Andrzej Bialecki wrote: > > > Hi, > > > > During the past year and more Stefan participated > actively in the > > development, and contributed many high-quality > patc

[Nutch-dev] Re: [VOTE] Commiter access for Stefan Groschupf

2005-12-16 Thread Florent Gluck
Totally agree, +1 Thanks for the help :) --Flo Andrzej Bialecki wrote: > Hi, > > During the past year and more Stefan participated actively in the > development, and contributed many high-quality patches. He's been > spending considerable effort on addressing many issues in JIRA, and > proposin

[Nutch-dev] Re: [VOTE] Commiter access for Stefan Groschupf

2005-12-16 Thread Byron Miller
+1 Thanks for all the hard work! Very much appreciated --- Andrzej Bialecki <[EMAIL PROTECTED]> wrote: > Hi, > > During the past year and more Stefan participated > actively in the > development, and contributed many high-quality > patches. He's been > spending considerable effort on addressing

[Nutch-dev] [VOTE] Commiter access for Stefan Groschupf

2005-12-16 Thread Andrzej Bialecki
Hi, During the past year and more Stefan participated actively in the development, and contributed many high-quality patches. He's been spending considerable effort on addressing many issues in JIRA, and proposing fixes and improvements. Apparently he has too much free time on his hands, and it'

[Nutch-dev] Re: "Something is Wrong with Google’s Mathematical Model"

2005-12-16 Thread Fredrik Andersson
I know (or at least suspect) that Google has a distributed way of computing a singular value decompositions for large matrices (i.e for the term-document matrix). I think the same technique for dimension reduction can be applied to approximate some eigenvalues of sparse matrices (the link matrix, f

[Nutch-dev] [jira] Updated: (NUTCH-3) multi values of header discarded

2005-12-16 Thread Stefan Groschupf (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-3?page=all ] Stefan Groschupf updated NUTCH-3: - Attachment: multiValuesPropertyPatch.txt Attached a patch that adds a getProperties method to the ContentProperties class to receive a string array of values fo

[Nutch-dev] "Something is Wrong with Google’s Mathematica l Model"

2005-12-16 Thread Stefan Groschupf
Hi, found this link on a news site, may some can found this interesting. "An Israeli mathematician, Hillel Tal-Ezer, of the Academic College of Tel Aviv in Yaffo has written a paper on the faults of google's mathematical algorithms for page ranking" http://www2.mta.ac.il/~hillel/data_mining/f

[Nutch-dev] Re: mapred merge to trunk

2005-12-16 Thread Doug Cutting
Doug Cutting wrote: Barring objections, I will do this tomorrow morning, Pacific time. The mapred branch has now been merged to trunk. Use the following command to switch your mapred working copies to trunk: svn switch https://svn.apache.org/repos/asf/lucene/nutch/trunk Doug --

RE: [Nutch-dev] distributed seach

2005-12-16 Thread Ledio Ago
Title: RE: [Nutch-dev] distributed seach Thank you Stefan for the reply. I did have seperate physical indexes in seperate machines with about 900K URLs in each of them.  I run Tomcat in one of those boxes, and tested the load.  I got the same numbers as I got when I didn't use the distribute

[Nutch-dev] [jira] Commented: (NUTCH-39) pagination in search result

2005-12-16 Thread Dima (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-39?page=comments#action_12360573 ] Dima commented on NUTCH-39: --- I have the same problem as byron. Does anyone know how to fix it? > pagination in search result > --- > > Key: NUTCH-39

Re: [Nutch-dev] distributed seach

2005-12-16 Thread Stefan Groschupf
Hi Ledio,the actually nutch is 0.7 or you can also use the 0.8 branch code. Also you are using old mailing lists and I suggest you use the apache nutch user mailing list.http://lucene.apache.org/nutch/mailing_lists.htmlTo answer your question, nutch does forward the query to all search server and c

[Nutch-dev] [jira] Commented: (NUTCH-143) Improper error numbers returned on exit

2005-12-16 Thread Stefan Groschupf (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-143?page=comments#action_12360571 ] Stefan Groschupf commented on NUTCH-143: Would be great in case you can provide a patch. > Improper error numbers returned on exit > ---