Build failed in Hudson: Nutch-trunk #1209

2010-07-21 Thread Apache Hudson Server
See -- [...truncated 3878 lines...] deps-test: deploy: copy-generated-lib: test: [echo] Testing plugin: urlnormalizer-basic [junit] Running org.apache.nutch.net.urlnormalizer.basic.T

Re: [Nutchbase] Multi-value ParseResult missing

2010-07-21 Thread Mattmann, Chris A (388J)
Hey Andrzej, We're having the same sorts of discussions in Tika-ville right now. Check out this page on the Tika wiki: http://wiki.apache.org/tika/MetadataDiscussion Comments, thoughts, welcome. Depending on what comes out of Tika, we may be able to leverage upon it... Cheers, Chris On 7/21

[Nutchbase] Multi-value ParseResult missing

2010-07-21 Thread Andrzej Bialecki
Hi, I noticed that nutchbase doesn't use the multi-valued ParseResult, instead all parse plugins return a simple Parse. As a consequence, it's not possible to return multiple values from parsing a single WebPage, something that parsers for compound documents absolutely require (archives, rss,

[jira] Commented: (NUTCH-858) No longer able to set per-field boosts on lucene documents

2010-07-21 Thread Edward Drapkin (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890876#action_12890876 ] Edward Drapkin commented on NUTCH-858: -- Ah, okay. Is there an ETA on a 1.2 release or

[jira] Commented: (NUTCH-858) No longer able to set per-field boosts on lucene documents

2010-07-21 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890873#action_12890873 ] Andrzej Bialecki commented on NUTCH-858: - Unfortunately no. The patch was included

[jira] Commented: (NUTCH-858) No longer able to set per-field boosts on lucene documents

2010-07-21 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890864#action_12890864 ] Andrzej Bialecki commented on NUTCH-858: - This was fixed in trunk/ - there will be

[jira] Commented: (NUTCH-858) No longer able to set per-field boosts on lucene documents

2010-07-21 Thread Edward Drapkin (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890865#action_12890865 ] Edward Drapkin commented on NUTCH-858: -- Is there a patch against 1.1 that exists anywhe

[jira] Updated: (NUTCH-858) No longer able to set per-field boosts on lucene documents

2010-07-21 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-858: Assignee: Andrzej Bialecki Fix Version/s: 1.2 > No longer able to set per-field bo

[jira] Created: (NUTCH-858) No longer able to set per-field boosts on lucene documents

2010-07-21 Thread Edward Drapkin (JIRA)
No longer able to set per-field boosts on lucene documents -- Key: NUTCH-858 URL: https://issues.apache.org/jira/browse/NUTCH-858 Project: Nutch Issue Type: Bug Components: in

Re: Nutchbase merge strategy

2010-07-21 Thread Andrzej Bialecki
On 2010-07-21 21:12, Mattmann, Chris A (388J) wrote: Hey Andrzej, +1 to all of the above - see below. So if 1-4 make sense, let's do 1, 2 and 3 today or tomorrow -- 4 can happen over the next few weeks. WDYT? This is a serious move - let's wait a bit, say until Monday, to give chance to ot

Re: Nutchbase merge strategy

2010-07-21 Thread Mattmann, Chris A (388J)
Hey Andrzej, > +1 to all of the above - see below. > >> >> So if 1-4 make sense, let's do 1, 2 and 3 today or tomorrow -- 4 can happen >> over the next few weeks. WDYT? > > This is a serious move - let's wait a bit, say until Monday, to give > chance to others to comment. Agreed. Let's wait un

Re: Nutchbase merge strategy

2010-07-21 Thread Andrzej Bialecki
On 2010-07-21 20:36, Mattmann, Chris A (388J) wrote: Hmmminteresting. OK, my one comment would be: why wait? trunk is traditional not guaranteed to be stable and it seems like you guys have nutchbase *sorta* working enough that the time is ripe to just switch now. And then you won't further

Re: Nutchbase merge strategy

2010-07-21 Thread Mattmann, Chris A (388J)
Hmmminteresting. OK, my one comment would be: why wait? trunk is traditional not guaranteed to be stable and it seems like you guys have nutchbase *sorta* working enough that the time is ripe to just switch now. And then you won't further confuse folks like me that are happy to check out the n

Nutchbase merge strategy

2010-07-21 Thread Andrzej Bialecki
Hi all, I'd like to discuss what is the best way forward to merging the nutchbase code with trunk. First some important facts: * nutchbase is almost totally API incompatible with Nutch 1.x. While the main ideas remain the same, and most of the tools remain as well, their implementation is v

Build failed in Hudson: Nutch-trunk #1208

2010-07-21 Thread Apache Hudson Server
See -- [...truncated 3881 lines...] deps-test: deploy: copy-generated-lib: test: [echo] Testing plugin: urlnormalizer-basic [junit] Running org.apache.nutch.net.urlnormalizer.basic.T