Re: [jira] Updated: (NUTCH-627) Minimize host address lookup

2008-04-10 Thread Chris Mattmann
(even though svn allows you to commit directly) - >> witness the recent situation with Grant. If you wish I can start a vote, >> and I'm sure it will be positive, and we will have a clean situation >> from the formal POV. Ok? >> > +1 >> +1, as well. Chee

Re: End-Of-Life status for 0.7.x?

2008-01-17 Thread Chris Mattmann
tively support it, whether >> we have enough resources to make any new releases or apply patches that >> sit in JIRA? >> >> My opinion is that we should mark it EOL, and close all JIRA issues that >> are relevant only to 0.7.x, with the status Won't Fix. &

Tika 0.1-incubating released

2008-01-07 Thread Chris Mattmann
ssue currently in JIRA. Thanks! Cheers, Chris ______ Chris Mattmann, Ph.D. [EMAIL PROTECTED] Cognizant Development Engineer Early Detection Research Network Project _ Jet Propulsion Laboratory

Re: Student contributions

2008-01-02 Thread Chris Mattmann
ignificant > contribution. Are there any implementation tasks you guys think would > be appropriate for a small group of undergrad, upperclass CS students? > I'm looking for ideas for improving Nutch that they could accomplish > in a few weeks time. > > Thanks, ___

Re: Commit Times for Issues

2007-11-16 Thread Chris Mattmann
love to hear from > others in the community. What I think would be best is to come to a > consensus on this and then have a wiki page describing this and other > processes for committers. > > Dennis Kubes __ Chris Mattmann, Ph.D. [EMAIL PROTECTED] Cogniz

Re: Tika API

2007-11-06 Thread Chris Mattmann
sure if I should submit a JIRA > issue for this, but I'm happy to do so if anyone else has seen this issue. No problem: let's discuss the JIRA issue once we get an answer to the above questions. Thanks for being more descriptive and looking forward to your response. Cheers, Chris

Re: Tika API

2007-11-06 Thread Chris Mattmann
> I think there may be a bug in the Content.java when it tries to convert > the textual representation of the type to a MimeType. It always returns > null. I'm trying to fix it but I can't find an API for Tika (or even > src). Can someone point me in the right direction?

Re: JIRA, Resolving and Closing Issues

2007-10-18 Thread Chris Mattmann
:58 AM, "Dennis Kubes" <[EMAIL PROTECTED]> wrote: > Quick question about Jira. When we commit, are we supposed to first > resolve and then close the issue. What is the process on this. > > Dennis Kubes __ Chris Mattmann, Ph

Re: writing a new parse-exe plugin

2007-10-17 Thread Chris Mattmann
eption in EXE parser: "+e.getMessage()); > e.printStackTrace(LogUtil.getWarnStream(LOG)); > } > return new ParseStatus(ParseStatus.FAILED, > "Can't be handled as exe document. " + > e).getEmptyParse(getConf()); > } > > /// i'm not sure

Re: [jira] Closed: (NUTCH-562) Port mime type framework to use Tika mime detection framework

2007-10-10 Thread Chris Mattmann
velopers that were interested in that portion of the code started developing in that arena. I'm not compariing Hadoop to Tika, but certainly there are some similarities here. -Chris __ Chris Mattmann, Ph.D. [EMAIL PROTECTED] Cognizant Dev

Re: [jira] Closed: (NUTCH-562) Port mime type framework to use Tika mime detection framework

2007-10-09 Thread Chris Mattmann
Folks, Either way is fine with me. I committed the patch for the following reasons: 1. Though the patch sat for around 36 hrs, the JIRA issue has been around nearly 2 weeks, without any comment at all. I used this as a baseline for relative interest in the patch. Though a patch file is ultimate

Re: svn commit: r550669 - in /lucene/nutch/trunk/src: java/org/apache/nutch/util/ plugin/languageidentifier/src/java/org/apache/nutch/analysis/lang/ plugin/parse-html/src/java/org/apache/nutch/parse/h

2007-06-25 Thread Chris Mattmann
No problemo! Thanks! Cheers, Chris On 6/25/07 9:45 PM, "Dennis Kubes" <[EMAIL PROTECTED]> wrote: > ooopsgotta remember to do that. Done. > > Dennis > > Chris Mattmann wrote: >> On 6/25/07 8:34 PM, "[EMAIL PROTECTED]" <[EMAIL PROTECTE

Re: svn commit: r550669 - in /lucene/nutch/trunk/src: java/org/apache/nutch/util/ plugin/languageidentifier/src/java/org/apache/nutch/analysis/lang/ plugin/parse-html/src/java/org/apache/nutch/parse/h

2007-06-25 Thread Chris Mattmann
On 6/25/07 8:34 PM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > Author: kubes > Date: Mon Jun 25 20:33:59 2007 > New Revision: 550669 > > URL: http://svn.apache.org/viewvc?view=rev&rev=550669 > Log: > NUTCH-497: Fixes problems relating to StackOverflow errors > and extreme nested tags. Adds

Re: [jira] Updated: (NUTCH-497) Extreme Nested Tags causes StackOverflowException in DomContentUtils...Spider Trap

2007-06-25 Thread Chris Mattmann
Dennis, +1 On 6/25/07 4:42 PM, "Dennis Kubes" <[EMAIL PROTECTED]> wrote: > If no one has any objections, I will go ahead and commit this. > > Dennis Kubes > > Dennis Kubes (JIRA) wrote: >> [ >> https://issues.apache.org/jira/browse/NUTCH-497?page=com.atlassian.jira.plugi >> n.system.issu

Re: Build failed in Hudson: Nutch-Nightly #123

2007-06-20 Thread Chris Mattmann
On 6/20/07 8:17 AM, "Doğacan Güney" <[EMAIL PROTECTED]> wrote: > Since you are doing compile-core, no plugins get compiled > (say, urlfilter-prefix), then when you do a ant test in feed > only protocol-file gets compiled. So, no urlfilter-prefix, no problem :). > I have to say that I am certain t

Re: Build failed in Hudson: Nutch-Nightly #123

2007-06-20 Thread Chris Mattmann
On 6/20/07 7:17 AM, "Doğacan Güney" <[EMAIL PROTECTED]> wrote: > It never passes for me (not even when I do it in src/plugin/feed). If you > check the output, parseResult only contains a single entry which is > rsstest.rss. Okay, please tell me I'm not crazy here. I'm on Mac OS X 10.4, Java vers

Re: Build failed in Hudson: Nutch-Nightly #123

2007-06-20 Thread Chris Mattmann
Doğacan, This is strange indeed. I noticed this during my testing of parse-feed, however, thought it was an anomaly. I got this same strange cryptic unit test error message, and then after some frustration figuring it out, I did ant clean, then ant compile-core test, and miraculously the error se

Re: Welcome Doğacan as Nutch committer

2007-06-12 Thread Chris Mattmann
+1 Welcome to the team, Doğacan! Cheers, Chris On 6/12/07 9:43 AM, "Sami Siren" <[EMAIL PROTECTED]> wrote: > Doğacan Güney wrote: >> Hi all, >> I hope that together we will make nutch rock even harder. > > By looking at your earlier efforts there should be no doubt. > > Welcome! >

Committer

2007-05-30 Thread Chris Mattmann
Hi Folks, I'd just like to throw out my +1 for Doğacan Güney's committer status. I've been impressed by several of his contributions and the guy just keeps them coming and coming. I'm not a member of the Lucene PMC, so I don't have official voting rights, however, I would like to express my suppo

Nutch 0.9 officially released!

2007-04-05 Thread Chris Mattmann
Hi Folks, After some hard work from all folks involved, we've managed to push out Apache Nutch, release 0.9. This is the second release of Nutch based entirely on the underlying Hadoop platform. This release includes several critical bug fixes, as well as key speedups described in more detail at

Re: Nutch Release 0.9 - Waiting for release to propagate to mirrors

2007-04-05 Thread Chris Mattmann
list announcing the completion of the release. Thanks! Cheers, Chris On 4/4/07 7:21 PM, "Chris Mattmann" <[EMAIL PROTECTED]> wrote: > Hi Guys, > > I've just moved forward with step 13 in the release process (waiting for > release to propogate to mirro

Nutch Release 0.9 - Waiting for release to propagate to mirrors

2007-04-04 Thread Chris Mattmann
Hi Guys, I've just moved forward with step 13 in the release process (waiting for release to propogate to mirrors). Should I just go ahead and do the other steps (update Nutch site, update Lucene site, Update javadoc, create version in JIRA, etc.)? It seems that I could do these without the relea

Re: [VOTE] Release Apache Nutch 0.9

2007-04-04 Thread Chris Mattmann
thing wrapped up tonight! :-) Cheers, Chris On 4/4/07 8:04 AM, "Sami Siren" <[EMAIL PROTECTED]> wrote: > Chris Mattmann wrote: >> Hi Folks, >> >> I have posted a candidate for the Apache Nutch 0.9 release at >> >> http://people.apache.org/~mat

Re: [VOTE] Release Apache Nutch 0.9

2007-04-02 Thread Chris Mattmann
Folks, As an FYI, here is a link to the log of the steps that I followed to get to this point in the release: http://people.apache.org/~mattmann/NUTCH_0.9_release_log_v2.doc Cheers, Chris On 4/2/07 10:52 PM, "Chris Mattmann" <[EMAIL PROTECTED]> wrote: > Hi Folks, &g

[VOTE] Release Apache Nutch 0.9

2007-04-02 Thread Chris Mattmann
Hi Folks, I have posted a candidate for the Apache Nutch 0.9 release at http://people.apache.org/~mattmann/nutch_0.9/rc2/ See the included CHANGES-0.9.txt file for details on release contents and latest changes. The release was made from the 0.9-dev trunk, including the recent patch applied

Re: svn commit: r524932 - in /lucene/nutch/trunk/src/java/org/apache/nutch/segment: SegmentMerger.java SegmentReader.java

2007-04-02 Thread Chris Mattmann
issue. Sorry about > not getting to it sooner. > > Dennis Kubes > > Chris Mattmann wrote: >> Hi Dennis, >> >> Thanks for taking care of this. :-) Could you update CHANGES.txt as well? >> Once you take care of that, in about 2 hrs (when I get home), I

Re: svn commit: r524932 - in /lucene/nutch/trunk/src/java/org/apache/nutch/segment: SegmentMerger.java SegmentReader.java

2007-04-02 Thread Chris Mattmann
Hi Dennis, Thanks for taking care of this. :-) Could you update CHANGES.txt as well? Once you take care of that, in about 2 hrs (when I get home), I'll begin the release process again. Thanks! Cheers, Chris On 4/2/07 2:40 PM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > Author: kubes

Re: [VOTE] Release Apache Nutch 0.9

2007-04-02 Thread Chris Mattmann
Hi Guys, >> I think we're discussing about the same thing(improving the process), I >> just don't think 0.9 is out yet :) >> >> >> But to wrap it up for me: >> >> +1 for creating 0.9 branch after fixing the bug (and removing the tag), >> creating new rc >> and starting a vote. > > > +1. +1.

Re: Next release - 0.10.0 or 1.0.0 ?

2007-03-28 Thread Chris Mattmann
My +1 for 1.0.0. I already changed it to 0.10.0, but this can be easily reverted, and was probably something that I should have brought to the attention of the dev list before I did that (sorry about that). In any case, I think 1.0.0 makes a lot of sense, politically, and software wise. Nutch is pr

Re: [VOTE] Release Apache Nutch 0.9

2007-03-28 Thread Chris Mattmann
Well, it's just going to add more work for me, but in the end, it's probably something that needs to be in there. I could go either way on this though, as in, if we don't commit it, 0.9.1 shouldn't be far off. Here's my +1 for going ahead and committing it... On 3/28/07 10:21 AM, "Dennis Kubes" <

Re: [VOTE] Release Apache Nutch 0.9

2007-03-28 Thread Chris Mattmann
Folks, Discussing this with Andrzej, and reading his email below, I tend to agree more with this procedure below. I would like to call for a vote to change the existing as-documented procedure (on the wiki) to branch first, do testing in branch (apply patches where needed), and then when the bra

Re: [VOTE] Release Apache Nutch 0.9

2007-03-27 Thread Chris Mattmann
Hey Sami, > > Well the sum itself is obviously the same :) The point in this is to use > same > conventions in Lucene family, not strictly required, but still IMO it just > looks better. Okey dok -- I will run the md5sum command, and generate a .md5 for the nutch release that matches that. I wi

Re: [VOTE] Release Apache Nutch 0.9

2007-03-27 Thread Chris Mattmann
st/lucene/nutch/, using the same convention as the others. To get the header, I did a gpg --list-keys. Thanks! Cheers, Chris On 3/27/07 8:14 AM, "Chris Mattmann" <[EMAIL PROTECTED]> wrote: > Hi Sami, > >> A very limited acid test shows that I can do crawling and sear

Re: [VOTE] Release Apache Nutch 0.9

2007-03-27 Thread Chris Mattmann
Hi Sami, > A very limited acid test shows that I can do crawling and searching > through web app so that part is ok. Great! Similar tests of my own showed the same. > > About signatures: I can't find your public gpg key anywhere (to verify > the signature), not in KEYS file nor in keyservers I

[VOTE] Release Apache Nutch 0.9

2007-03-26 Thread Chris Mattmann
Hi Folks, I have posted a candidate for the Apache Nutch 0.9 release at http://people.apache.org/~mattmann/nutch_0.9/ See the included CHANGES-0.9.txt file for details on release contents and latest changes. The release was made from the 0.9-dev trunk. Please vote on releasing these packages a

Re: Nutch 0 .9 release progress update

2007-03-26 Thread Chris Mattmann
!) Thanks! Cheers, Chris On 3/26/07 10:22 PM, "Sami Siren" <[EMAIL PROTECTED]> wrote: > Chris Mattmann wrote: >> Hi Folks, >> >> Just to update everyone on progress. I've made it to Step 13 (waiting for >> release to appear on mirrors) in the R

Nutch 0 .9 release progress update

2007-03-26 Thread Chris Mattmann
Hi Folks, Just to update everyone on progress. I've made it to Step 13 (waiting for release to appear on mirrors) in the Release Process: http://wiki.apache.org/nutch/Release_HOWTO You can view a full log of the fun that I've been having by going to: http://people.apache.org/~mattmann/NUT

Re: Initiation of 0.9 release process

2007-03-26 Thread Chris Mattmann
he process goes smoothly, I can probably get it done on my own. Thanks for the offer: I'll be sure to call on you if I get stuck. :-) Cheers, Chris On 3/26/07 10:06 AM, "Dennis Kubes" <[EMAIL PROTECTED]> wrote: > Let me know if I can help in any way? > > Dennis

Initiation of 0.9 release process

2007-03-26 Thread Chris Mattmann
Hi Folks, As your friendly neighborhood 0.9 release manager, I just wanted to give you all a heads up that I'd like to begin the release process today. If I hear no objections by 00:00:00 UTC time, I will begin the release process then. I will notify the list as soon as I'm done. Thanks! Chee

FW: [jira] Created: (HADOOP-1147) remove all @author tags from source

2007-03-22 Thread Chris Mattmann
Hey Doug, Do you think we should do this in Nutch too? I'm in favor of doing this -- what does everyone else feel? Thanks! Cheers, Chris __ Chris A. Mattmann [EMAIL PROTECTED] Staff Member Modeling and Data Management Systems Section (387) Data Ma

Re: svn commit: r516759 - /lucene/nutch/trunk/CHANGES.txt

2007-03-10 Thread Chris Mattmann
Dennis, No probs. Thanks, a lot! Cheers, Chris On 3/10/07 5:35 PM, "Dennis Kubes" <[EMAIL PROTECTED]> wrote: > > > Chris Mattmann wrote: >> Hi Dennis, >> >> Not to nit-pick, but the place where you inserted your change isn't at the >

Re: svn commit: r516759 - /lucene/nutch/trunk/CHANGES.txt

2007-03-10 Thread Chris Mattmann
Hi Dennis, Not to nit-pick, but the place where you inserted your change isn't at the end (where they typically should be placed). You inserted in the middle of the file, throwing off the numbering (there are now 2 sets of 18, and 19 in the unreleased changes section). Could you please append you

Re: [jira] Commented: (NUTCH-384) Protocol-file plugin does not allow the parse plugins framework to operate properly

2007-03-08 Thread Chris Mattmann
Cheers, Chris On 3/8/07 1:55 PM, "Andrzej Bialecki" <[EMAIL PROTECTED]> wrote: > Chris Mattmann wrote: >> Hi Andrzej, >> >> Yep, +1. I also want to make a small update, where instead of creating a >> new NutchConf object, to just pass it throug

Re: [jira] Commented: (NUTCH-384) Protocol-file plugin does not allow the parse plugins framework to operate properly

2007-03-08 Thread Chris Mattmann
Hi Andrzej, Yep, +1. I also want to make a small update, where instead of creating a new NutchConf object, to just pass it through (maybe via the protocol layer?). Does this make sense? Cheers, Chris On 3/8/07 1:47 PM, "Andrzej Bialecki (JIRA)" <[EMAIL PROTECTED]> wrote: > > [ > h

0.9 release

2007-03-07 Thread Chris Mattmann
Hi Folks, As suggested by Sami, I'm moving this discussion to the nutch-dev list. Seems like I am the guy that is going to do the Nutch 0.9 release :-) However, it seems also that there are some issues that need to be sorted out first. I'd like to follow up to Andrzej's email about loose ends be

FW: Nutch release process help

2007-03-06 Thread Chris Mattmann
ssions on the nutch list in the future. Cheers, Chris -- Forwarded Message From: Chris Mattmann <[EMAIL PROTECTED]> Date: Mon, 05 Mar 2007 21:25:30 -0800 To: Piotr Kosiorowski <[EMAIL PROTECTED]> Cc: Chris Mattmann <[EMAIL PROTECTED]>, Andrzej Bialecki <[EMAIL PROTEC

Re: Issues pending before 0.9 release

2007-03-05 Thread Chris Mattmann
Hi Guys, > Blocker > > * NUTCH-400 (Update & add missing license headers) - I believe this is > fixed and should be closed +1, thanks to Sami for closing it. > > * NUTCH-353 (pages that serverside forwards will be refetched every > time) - this was partially fixed in NUTCH-273, but a m

Re: Welcome Dennis Kubes as Nutch committer

2007-02-28 Thread Chris Mattmann
Dennis, I take my coffee black: with a single creamer ;) Okay, okay, sorry: I thought we were talking about *real* hazing ;) Cheers, Chris On 2/28/07 12:31 PM, "Dennis Kubes" <[EMAIL PROTECTED]> wrote: > Hi All, > > Thank you Andrzej for your kind words. I am looking forward to working >

Re: log guards

2007-02-28 Thread Chris Mattmann
rdinate our efforts? > > Dennis Kubes > > Jérôme Charron wrote: >> Hi Chris, >> >> The JIRA issue is the 309 : https://issues.apache.org/jira/browse/NUTCH-309 >> Thanks for your help. >> >> Jérôme >> >> On 2/13/07, Chris Mattman

Re: log guards

2007-02-13 Thread Chris Mattmann
Hi Doug, and Jerome, Ah, yes, the log guard conversation. I remember this from a while back. Hmmm, do you guys know what issue that this recorded as in JIRA? I have some free time recently, so I will be able to add this to my list of Nutch stuff to work on, and would be happy to take the lead on

Re: RSS-fecter and index individul-how can i realize this function

2007-02-08 Thread Chris Mattmann
uired, and contacting the folks who've begun work on this issue. Thanks! Cheers, Chris On 2/7/07 1:31 PM, "Doug Cutting" <[EMAIL PROTECTED]> wrote: > Chris Mattmann wrote: >> Got it. So, the logic behind this is, why bother waiting until the >> followin

Re: RSS-fecter and index individul-how can i realize this function

2007-02-07 Thread Chris Mattmann
07 11:11 AM, "Doug Cutting" <[EMAIL PROTECTED]> wrote: > Chris Mattmann wrote: >> Sorry to be so thick-headed, but could someone explain to me in really >> simple language what this change is requesting that is different from the >> current Nutch API? I still don&#

Re: RSS-fecter and index individul-how can i realize this function

2007-02-07 Thread Chris Mattmann
Guys, Sorry to be so thick-headed, but could someone explain to me in really simple language what this change is requesting that is different from the current Nutch API? I still don't get it, sorry... Cheers, Chris On 2/7/07 9:58 AM, "Doug Cutting" <[EMAIL PROTECTED]> wrote: > Renaud Richa

Re: RSS-fecter and index individul-how can i realize this function

2007-02-06 Thread Chris Mattmann
Hi Doug, > Since the target of the link must still be indexed separately from the > item itself, how much use is all this? If the RSS document is > considered a single page that changes frequently, and item's links are > considered ordinary outlinks, isn't much the same effect achieved? IMHO, ye

Re: RSS-fecter and index individul-how can i realize this function

2007-02-01 Thread Chris Mattmann
in the site. > > IMHO the only thing "missing" in the parse-rss plugin is storing the data in > the CrawlDatum and "parsing" it in the next fetch phase. Maybe adding a new > flag to CrawlDatum, that would flag the URL as "parsable" not "fetchab

Re: RSS-fecter and index individul-how can i realize this function

2007-01-30 Thread Chris Mattmann
.nutch > nutch > > > > > > > > > http://lucene.apache.org/nutch > > > > > > news > > > > > > > kauu On 1/31/07, Chris > Mattmann <[EMAIL PROTECTED]> wrote: > > Hi there, > > I could most > li

Re: RSS-fecter and index individul-how can i realize this function

2007-01-30 Thread Chris Mattmann
el and callback framework for parsing RSS/Atom/Feed XML documents. When you mention asynchronous above, are you talking about the protocol for fetching the different RSS documents? Thanks! Cheers, Chris > > Thanks > > > -Original Message- > From: Chris Mattmann <[

Re: RSS-fecter and index individul-how can i realize this function

2007-01-30 Thread Chris Mattmann
Hi there, I could most likely be of assistance, if you gave me some more information. For instance: I'm wondering if the use case you describe below is already supported by the current RSS parse plugin? The current RSS parser, parse-rss, does in fact index individual items that are pointed to b

Re: [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2007-01-25 Thread Chris Mattmann
> It's at least out-of-date and perhaps obsolete. A quick read of > Fetcher.java looks like there might be a case where a "fatal" error is > logged but the fetcher doesn't exit, in FetcherThread#output(). > So this raises an interesting question: People (such as Scott G.) out there -- are you f

Re: [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2007-01-25 Thread Chris Mattmann
Hi Doug, So, does this render the patch that I wrote obsolete? Cheers, Chris On 1/25/07 10:08 AM, "Doug Cutting" <[EMAIL PROTECTED]> wrote: > Scott Ganyo (JIRA) wrote: >> ... since Hadoop hijacks and reassigns all log formatters (also a bad >> practice!) in the org.apache.hadoop.util.LogF

Re: Reviving Nutch 0.7

2007-01-22 Thread Chris Mattmann
> Before doubling (or after 0.9.0 tripling?) the maintenance/development work > please consider the following: > > One option would be re factoring the code in a way that the parts that are > usable to other projects like protocols?, parsers (this actually was > proposed by > Jukka Zitting some

Re: How to Become a Nutch Developer

2007-01-21 Thread Chris Mattmann
Hi Dennis, On 1/21/07 11:47 AM, "Dennis Kubes" <[EMAIL PROTECTED]> wrote: > All, > > I am working on a "How to Become a Nutch Developer" document for the > wiki and I need some input. > > I need an overview of how the process for JIRA works? If I am a > developer new to Nutch and just startin

Re: Next Nutch release

2007-01-16 Thread Chris Mattmann
Folks, When would you like to make the release? I've been working on NUTCH-185, but got a bit bogged down with other work. If there is interest in having NUTCH-185 included in the release, I could make a push to get out a patch by week's end... As for the rest, my +1 for NUTCH-61 being included

Re: svn commit: r485076 - in /lucene/nutch/trunk/src: java/org/apache/nutch/metadata/SpellCheckedMetadata.java test/org/apache/nutch/metadata/TestSpellCheckedMetadata.java

2006-12-09 Thread Chris Mattmann
org.apache.nutch.metadata that aggreates all the met key fields from HttpHeaders, and it would be the place that the met key fields for FileHeaders, etc. could go into. Let me know what you think, and thanks! Cheers, Chris On 12/9/06 3:53 PM, "Sami Siren" <[EMAIL PROTECTED]> wrote: > C

Re: svn commit: r485076 - in /lucene/nutch/trunk/src: java/org/apache/nutch/metadata/SpellCheckedMetadata.java test/org/apache/nutch/metadata/TestSpellCheckedMetadata.java

2006-12-09 Thread Chris Mattmann
Hi Sami, On 12/9/06 2:27 PM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > Author: siren > Date: Sat Dec 9 14:27:07 2006 > New Revision: 485076 > > URL: http://svn.apache.org/viewvc?view=rev&rev=485076 > Log: > Optimize SpellCheckedMetadata further by taking into account the fact that it > i

Re: Welcome Chris Mattmann as Nutch committer

2006-11-23 Thread Chris Mattmann
Thanks, Andrzej, thanks to the rest of the folks who voted me in! I really appreciate the honor and pledge to help maintain the high quality of the Nutch source code. Best wishes and happy holidays to all the folks on the list! Cheers, Chris On 11/23/06 4:10 AM, "Andrzej Bialecki" <[EMAIL PR

Re: [jira] Closed: (NUTCH-406) Metadata tries to write null values

2006-11-23 Thread Chris Mattmann
Hi Sami, On 11/23/06 9:45 AM, "Sami Siren" <[EMAIL PROTECTED]> wrote: > Couple of points: > > 1. You used tabs I just installed a new version of Eclipse, and forgot to change the default preference for using tabs versus just whitespaces. I've went ahead and changed this in my Eclipse and will

Re: What's the status of Nutch-GUI?

2006-11-21 Thread Chris Mattmann
ieldvalue2",...)); Both the values "fieldvalue" and "fieldvalue2" will both get stored in the index for the key "fieldname". So, if I understand you correctly (which I may not ;) ), then I think you can omit the check that you are talking about above and just g

Re: [jira] Updated: (NUTCH-379) ParseUtil does not pass through the content's URL to the ParserFactory

2006-10-13 Thread Chris Mattmann
Hi Guys, Can we disable the selection of "released versions" within JIRA for issues so that people like me don't continue to get confused? Thanks! Cheers, Chris On 10/13/06 9:32 AM, "Sami Siren (JIRA)" <[EMAIL PROTECTED]> wrote: > [ http://issues.apache.org/jira/browse/NUTCH-379?page

Re: Nutch requires JDK 1.5 now?

2006-10-03 Thread Chris Mattmann
d really change the email address for JIRA to not use the Apache incubator one anymore, and to use to Lucene one. Sound good? If so, could someone with permissions please take care of it? :-) Cheers, Chris On 10/3/06 9:04 AM, "Sami Siren" <[EMAIL PROTECTED]> wrote: > Andrzej

Re: Nutch requires JDK 1.5 now?

2006-10-03 Thread Chris Mattmann
> > The switch to 1.5 format was also logged on jira issue > http://issues.apache.org/jira/browse/NUTCH-360 > -- > Sami Siren Ahh, I didn't see this. Way to go Sami, I love it when people actually keep records of changes! ;) Cheers, Chris __ Chris

Nutch requires JDK 1.5 now?

2006-10-03 Thread Chris Mattmann
Hi Folks, I noticed that Nutch now requires JDK 5 in order to compile, due to recent changes to the PluginRepository and some other classes. I think that this is a good move, however, I wasn't sure that I had seen any "official" announcement that Nutch now requires 1.5... Cheers, Chris __

Re: Patch Available status?

2006-08-31 Thread Chris Mattmann
Hi Doug, > > But the nutch-developers Jira group pretty closely corresponds to > Nutch's committers, so perhaps all committers should be permitted to > close, although this should be exercised with caution, only at releases, > since closes cannot be undone in this workflow. > > Another alternati

Re: Patch Available status?

2006-08-30 Thread Chris Mattmann
Hi Doug and Andrzej, +1. I think that workflow makes a lot of sense. Currently users in the nutch-developers group can close and resolve issues. In the Hadoop workflow, would this continue to be the case? Cheers, Chris On 8/30/06 3:14 PM, "Andrzej Bialecki" <[EMAIL PROTECTED]> wrote: > Do

Re: 0.8 not loading plugins

2006-08-17 Thread Chris Mattmann
Hi Chris, It seems from your email message that your plugin is located in $NUTCH_HOME/build/custom-meta? Is this where your plugin * code * is currently stored? If so, this is the wrong location and the most likely reason that your plugin isn't being loaded. Plugin code should live in $NUTCH_HO

Re: Any plans to move to build Nutchusing Maven?

2006-08-16 Thread Chris Mattmann
Hi Steven, On 8/16/06 7:36 AM, "steven shingler" <[EMAIL PROTECTED]> wrote: > (This thread moved from the User List.) > > OK Lukas, lets open it up to the dev list! :) > > Particularly, does the group feel moving to Maven would be _a good thing_ ? +1 I suggested this (however did not make an

Re: Tika update

2006-08-16 Thread Chris Mattmann
agement on the Nutch mailing lists recently. From that interest, we have gathered the following list of candidate committers who have expressed interested in our proposed project. The leader of the Tika project would be Chris Mattmann. Chris works at NASA's Jet Propulsion Laboratory as a Member

Re: Tika update

2006-08-16 Thread Chris Mattmann
Hi Jukka, Thanks for your email. Indeed, there was discussion on the Lucene PMC email list, about the Tika project. It was decided by the powers that be to discuss it more on the Nutch mailing list before moving forward with any vote on making Tika a sub-project of Apache Lucene. With regards to

Patch Available status?

2006-08-15 Thread Chris Mattmann
Hi Guys, I've seen on the Hadoop mailing list recently that there was a new status added for issues in JIRA called "Patch Available" to let committers know that a patch is ready for review to commit. How about we add this to the Nutch jira instance as well? I tried doing this, but I don't think I

Re: parse-plugins.xml

2006-08-03 Thread Chris Mattmann
Hey Andrzej, On 8/3/06 8:19 AM, "Andrzej Bialecki" <[EMAIL PROTECTED]> wrote: > Chris Mattmann wrote: >> Hi Marko, >> >>Thanks for your question. Basically it was set up as a sort of "last >> result" of getting at least * some * i

Re: parse-plugins.xml

2006-08-03 Thread Chris Mattmann
Hi Marko, Thanks for your question. Basically it was set up as a sort of "last result" of getting at least * some * information from the PDF file, albeit littered with garbage. If indeed the parse-text does not really make sense in terms of a backup parser to handle PDF files and get at least s

RE: Library for extracting text content from binaries

2006-07-24 Thread Chris Mattmann
Hi Jukka, Thanks for your email. Jerome Charron and I proposed a project with a similar goal in mind that we wanted to dub "Tika". Tika would effectively be a Lucene sub-project, and would factor out some of the capabilities you mention below from Nutch, incl: 1. MimeType repository 2. Parser i

Re: [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2006-06-05 Thread Chris Mattmann
Hi Andrzej, > > The main problem, as Scott observed, is that the static flag affects all > instances of the task executing inside the same JVM. If there are > several Fetcher tasks (or any other tasks that check for SEVERE flag!), > belonging to different jobs, all of them will quit. This is cert

Re: [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2006-06-05 Thread Chris Mattmann
Folks, Before I (or someone else) reopens the issue, I think it's important to understand the implications: >1) Having a *side-effect* of the entire system stop processing after merely > logging a message at a certain event level is a poor practice. I'm not sure that the Fetcher quitting is a *

Re: Nutch Parser Bug

2006-04-25 Thread Chris Mattmann
Hi Alex, I also noticed this issue a while back. It's described here: http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200510.mbox/%3c435 [EMAIL PROTECTED] Cheers, Chris On 4/25/06 2:41 PM, "Alex" <[EMAIL PROTECTED]> wrote: > Hi there, > > I'm fairly new to nutch and in working

RE: [Proposal] New Lucene sub-project

2006-04-24 Thread Chris Mattmann
ts own Stand-alone library. Just my two cents, thanks! Cheers, Chris > > Otis > > - Original Message > From: Jérôme Charron <[EMAIL PROTECTED]> > To: nutch-dev@lucene.apache.org > Sent: Friday, April 7, 2006 4:26:54 AM > Subject: [Proposal] New Luc

RE: plugin.dtd

2006-04-16 Thread Chris Mattmann
Hi Stefan, The DTD actually does allow for custom attributes: Jerome factored them out of the form: ="" ="" . > Into the form: ... See the difference? Using the parameter tags, we can have a generic DTD that supports any parameter name and value. The other way, I had to go t

0.8 release?

2006-04-12 Thread Chris Mattmann
Hi Guys, Any progress on the 0.8 release? Was there any resolution about which JIRA issues to complete before the 0.8 release? We had a bit of conversation there and some ideas, but no definitive answer... Thanks for your help, and sorry to pester ;) Cheers, Chris __

Re: 0.8 release schedule (was Re: latest build throws error - critical)

2006-04-07 Thread Chris Mattmann
Hi Andrzej, On 4/7/06 12:18 PM, "Andrzej Bialecki" <[EMAIL PROTECTED]> wrote: > Do you guys have any additional insights / suggestions whether NUTCH-240 > and/or NUTCH-61 should be included in this release? Looking at the JIRA popular issues pane for Nutch ( http://issues.apache.org/jira/browse

Re: 0.8 release schedule (was Re: latest build throws error - critical)

2006-04-07 Thread Chris Mattmann
+1 On 4/7/06 10:20 AM, "Doug Cutting" <[EMAIL PROTECTED]> wrote: > Chris Mattmann wrote: >> +1 for a release sooner rather than later. > > I think this is a good plan. There's no reason we can't do another > release in a month. If it is back-co

Re: 0.8 release schedule (was Re: latest build throws error - critical)

2006-04-06 Thread Chris Mattmann
+1 for a release sooner rather than later. Several interesting features contributed since the 0.7 branch I believe are now tested and production-worthy, at least in my environment. Hats off to the folks who were able to split the MapReduce and NDFS into Hadoop -- I'm going to be experimenting with

Re: Null Pointer exception in AnalyzerFactory?

2006-03-13 Thread Chris Mattmann
Thanks Jerome! :-) Cheers, Chris On 3/13/06 4:02 PM, "Jérôme Charron" <[EMAIL PROTECTED]> wrote: >> I updated to the latest SVN revision (385691) today, and I am now seeing >> a >> Null Pointer exception in the AnalyzerFactory.java class. > > Fixed (r385702). Thanks Chris. > > >> NOTE:

Null Pointer exception in AnalyzerFactory?

2006-03-13 Thread Chris Mattmann
Hi Folks, I updated to the latest SVN revision (385691) today, and I am now seeing a Null Pointer exception in the AnalyzerFactory.java class. It seems that in some cases, the method: private Extension getExtension(String lang) { Extension extension = (Extension) this.conf.getObject(lang)

RE: found resource parse-plugins.xm?

2006-03-06 Thread Chris Mattmann
if (this.extensionPoint == null) { throw new RuntimeException("x point " + Parser.X_POINT_ID + " not found."); > -Original Message- > From: Chris Mattmann [mailto:[EMAIL PROTECTED] > Sent: Monday, March 06, 2006 7:51 PM > To: 'nutch-dev@lucene.ap

RE: found resource parse-plugins.xm?

2006-03-06 Thread Chris Mattmann
oint == null) { throw new RuntimeException("x point " + Parser.X_POINT_ID + " not found."); Cheers, Chris > Cheers, > Stefan > > Am 07.03.2006 um 04:38 schrieb Chris Mattmann: > > > Hi Stefan, > > > >> after a short time

RE: found resource parse-plugins.xm?

2006-03-06 Thread Chris Mattmann
Hi Stefan, > after a short time I already had 1602 time this lines in my > tasktracker log files. > 060307 022707 task_m_2bu9o4 found resource parse-plugins.xml at > file:/home/joa/nutch/conf/parse-plugins.xml > > Sounds like this file is loaded 1602 (after lets say 3 minutes) I > guess that was

RE: duplicate libs

2006-02-13 Thread Chris Mattmann
Hi Andrzej, > > commons-httpclient-3.0-beta1.jar src/plugin/parse-rss/lib > > commons-httpclient-3.0.jarsrc/plugin/protocol-httpclient/lib > > Not sure what was the reason to use the beta1, perhaps no reason except > that it was the latest available at the moment... Yup, I think that w

Re: duplicate libs

2006-02-13 Thread Chris Mattmann
Hey Doug, I think that at least in the case of parse-rss, parse-pdf, and the nutch core if there's probably some utility in having lib-xxx plugins (or at least putting these jars in the $NUTCH_HOME/lib) for: commons-httpclient log4j xerces Then, protocol-httpclient, parse-pdf and the rest of t

Re: ignore eclipse .project and .classpath

2006-02-09 Thread Chris Mattmann
:15 PM EST > Subject: Re: ignore eclipse .project and .classpath > > +1 > > Am 08.02.2006 um 06:16 schrieb Chris Mattmann: > >> Hi Folks, >> >> >> >> Just wondering if someone could add to the svn:ignore property for >> Nutch >> the

ignore eclipse .project and .classpath

2006-02-07 Thread Chris Mattmann
Hi Folks, Just wondering if someone could add to the svn:ignore property for Nutch the files: .classpath .project I happen to use eclipse to do Nutch development and always ignore these files in my other eclipse projects as well. Cheers, Chris __

  1   2   >