[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2005-12-21 Thread Jerome Charron (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361041 ] Jerome Charron commented on NUTCH-139: -- Ok, Chris and me will implement MetadataNames in this way. Just some few comments: I plan to move the MetadataNames to a class rath

Re: nutch-0.8-dev *mapred.input.subdir* problem ?

2005-12-21 Thread Stefan Groschupf
Lukas, the input folder are normally setted by the tools to you can not change that. However in case you use a unix box, check that the user that runs nutch has read and write acess to all the folder defined in the nutch- site/default.xml. (I guess that can be the problem, nutch use e.g. /tmp

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2005-12-21 Thread Andrzej Bialecki (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361043 ] Andrzej Bialecki commented on NUTCH-139: - Regarding the move to a class with public static fields: I don't have any problem with that. Regarding the Levenshtein dista

Re: nightly build

2005-12-21 Thread Stefan Groschupf
try to checkout the latest sources from the subversion server. There will be no new nightly builds until the new western year. Stefan Am 20.12.2005 um 21:35 schrieb tigger .: Hi All The the nightly build is not working: bin/nutch admin db -create Exception in thread "main" java.lang.NoClassDe

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2005-12-21 Thread Jerome Charron (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361045 ] Jerome Charron commented on NUTCH-139: -- Andrzej, Do you read in my mind? Yes of course, that's the way I want to do it: First checks for the most common cases (lower case

Re: nutch-0.8-dev *mapred.input.subdir* problem ?

2005-12-21 Thread Lukas Vlcek
Stefan, Nutch created folders in /tmp so I think if it should able to create files there as well. I also tried to change all /tmp* in conf file to my home folder with the same result (i.e.: folders were created and several files were dumped there but it yielded the same exception). Are you able t

Re: nutch-0.8-dev *mapred.input.subdir* problem ?

2005-12-21 Thread Stefan Groschupf
Yes, I'm able to run it, no problem but I'm using the step by step commands not the crawl (allinOne) command. Can you try a "ant test" - do all test pass? Am 21.12.2005 um 12:52 schrieb Lukas Vlcek: Stefan, Nutch created folders in /tmp so I think if it should able to create files there as w

IndexSorter optimizer

2005-12-21 Thread Andrzej Bialecki
Hi, I'm happy to report that further tests performed on a larger index seem to show that the overall impact of the IndexSorter is definitely positive: performance improvements are significant, and the overall quality of results seems at least comparable, if not actually better. The reason wh

Crawling a nutch index with Lucene

2005-12-21 Thread Oliver Hummel
Hi, I'm rather new to nutch, but is there something wrong with the idea of creating an index with nutch (using the intranet search from the nutch tutorial) and searching this index with Lucene? I.e. doing something like this: import org.apache.lucene.search.IndexSearcher; import org.apache.lucene

Re: Crawling a nutch index with Lucene

2005-12-21 Thread Daniel Naber
On Mittwoch 21 Dezember 2005 17:13, Oliver Hummel wrote: > java.lang.ArrayIndexOutOfBoundsException: -1 That's the error you get when you open a Lucene 1.9 index with Lucene 1.4. But I don't know if that's also the case here. Regards Daniel -- http://www.danielnaber.de

Re: Crawling a nutch index with Lucene

2005-12-21 Thread Oliver Hummel
Yep, that's it. Nutch has Lucene 1.9 in its lib. Many thanks! Oliver Daniel Naber wrote: > On Mittwoch 21 Dezember 2005 17:13, Oliver Hummel wrote: > > >>java.lang.ArrayIndexOutOfBoundsException: -1 > > > That's the error you get when you open a Lucene 1.9 index with Lucene 1.4. > But I

Re: [VOTE] Commiter access for Stefan Groschupf

2005-12-21 Thread ogjunk-nutch
I'm late, but better late than never: +1 (I thought Stefan was already a committer, actually). Stefan: will you be putting some of those media-style Nutch tutorials in Nutch's own Wiki? Otis - Original Message From: Andrzej Bialecki <[EMAIL PROTECTED]> To: nutch-dev@lucene.apache.org

Re: nutch-0.8-dev *mapred.input.subdir* problem ?

2005-12-21 Thread Paul Baclace
You can ignore mapred.input.subdir; I find it is an unneeded option. Now that the mapred branch is merged to be the trunk, there is a need to clarify the documentation since the a change was made to have the input be specified as a directory and then all files in that directory are considered inp

Re: IndexSorter optimizer

2005-12-21 Thread Stefan Groschupf
Hi Andrzej, wow are really great news! Using the optimized index, I reported previously that some of the top-scoring results were missing. As it happens, the missing results were typically the "junk" pages with high tf/idf but low "boost". Since we collect up to N hits, going from higher to

Re: IndexSorter optimizer

2005-12-21 Thread Byron Miller
I've got 400mill db i can run this against over the next few days. -byron --- Stefan Groschupf <[EMAIL PROTECTED]> wrote: > Hi Andrzej, > > wow are really great news! > > Using the optimized index, I reported previously > that some of the > > top-scoring results were missing. As it happens, >

Re: Static initializers

2005-12-21 Thread Stefan Groschupf
Andrzej, well I'm not ready with digging into the problem but want to ask some more questions. BTW I counted 195 places that use NutchConf.get(), so this will be a bigger patch. :) As I mentioned I would love to go the inversion of control way, so not using nutchConf in the constructor but

Re: IndexSorter optimizer

2005-12-21 Thread American Jeff Bowden
Andrzej Bialecki wrote: Hi, I'm happy to report that further tests performed on a larger index seem to show that the overall impact of the IndexSorter is definitely positive: performance improvements are significant, and the overall quality of results seems at least comparable, if not actual

[jira] Created: (NUTCH-147) nutch map reduce does not work in windows map reduce runs in a loop

2005-12-21 Thread raghavendra prabhu (JIRA)
nutch map reduce does not work in windows map reduce runs in a loop --- Key: NUTCH-147 URL: http://issues.apache.org/jira/browse/NUTCH-147 Project: Nutch Type: Bug Components: indexer Versions: 0

Re: IndexSorter optimizer

2005-12-21 Thread Andrzej Bialecki
American Jeff Bowden wrote: Andrzej Bialecki wrote: Hi, I'm happy to report that further tests performed on a larger index seem to show that the overall impact of the IndexSorter is definitely positive: performance improvements are significant, and the overall quality of results seems at l