[Nutch-dev] [jira] Closed: (NUTCH-33) MIME content type detector (using magic char sequences)

2005-04-17 Thread John Xing (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-33?page=history ] John Xing closed NUTCH-33: -- Resolution: Fixed > MIME content type detector (using magic char sequences) > --- > > Key: NUTCH-33 >

[Nutch-dev] [jira] Updated: (NUTCH-45) Log corrupt segments in SegmentMergeTool

2005-04-17 Thread Otis Gospodnetic (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-45?page=history ] Otis Gospodnetic updated NUTCH-45: -- Attachment: SegmentMergeTool.patch A small patch. I also trimmed some trailing lines. I tried making a diff/patch that ignored whitespace changes, but

[Nutch-dev] [jira] Created: (NUTCH-45) Log corrupt segments in SegmentMergeTool

2005-04-17 Thread Otis Gospodnetic (JIRA)
Log corrupt segments in SegmentMergeTool Key: NUTCH-45 URL: http://issues.apache.org/jira/browse/NUTCH-45 Project: Nutch Type: Improvement Reporter: Otis Gospodnetic Priority: Trivial Attachments: SegmentMergeTool.pat

[Nutch-dev] [jira] Commented: (NUTCH-33) MIME content type detector (using magic char sequences)

2005-04-17 Thread John Xing (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-33?page=comments#action_63022 ] John Xing commented on NUTCH-33: Just committed. Thanks. Nutch is licensed under the Apache License. If freedesktop mime database uses GPL, it could be problematic to have it i

[Nutch-dev] going backwards? svn getting deprecated errors

2005-04-17 Thread Byron Miller
Trying to run a recent svn checkout (today) i'm getting deprecated API errors on search.jsp I've also noted the following errors in the opensearch servlet: compile-core: [javac] Compiling 243 source files to /home2/mozdex/svn/nutch/build/classes [javac] /home2/mozdex/svn/nutch/src/java/or

[Nutch-dev] Re: [jira] Commented: (NUTCH-39) pagination in search result

2005-04-17 Thread Jack Tang
Hi Doug I check the code this morning. Pity my weekend without network and I implement NutchSearchRss in my way I extended rss4j(I know one target of nutch is keep the search engine small, and rss4j is a small jar without any dependencies) to generate OpenSearch rss and NutchSearch rss. If yo

[Nutch-dev] [jira] Updated: (NUTCH-33) MIME content type detector (using magic char sequences)

2005-04-17 Thread Jerome Charron (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-33?page=history ] Jerome Charron updated NUTCH-33: Attachment: NUTCH-33-050417.patch John, here is a patch that solves the uses of deprecated APIs in unit tests. Jerome > MIME content type detector (using m

[Nutch-dev] [jira] Created: (NUTCH-44) too many search results

2005-04-17 Thread Emilijan Mirceski (JIRA)
too many search results --- Key: NUTCH-44 URL: http://issues.apache.org/jira/browse/NUTCH-44 Project: Nutch Type: Bug Components: web gui Environment: web environment Reporter: Emilijan Mirceski There should be a limitation (user define

[Nutch-dev] [jira] Commented: (NUTCH-33) MIME content type detector (using magic char sequences)

2005-04-17 Thread Jerome Charron (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-33?page=comments#action_63015 ] Jerome Charron commented on NUTCH-33: - John, I will fix the use of deprecated APIs in unit tests as soon as possible. Damian, thank you very mutch for the link on freedesk

[Nutch-dev] [jira] Assigned: (NUTCH-30) rss feed parser

2005-04-17 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-30?page=history ] Chris A. Mattmann reassigned NUTCH-30: -- Assign To: Chris A. Mattmann > rss feed parser > --- > > Key: NUTCH-30 > URL: http://issues.apache.org/jira/browse/N

[Nutch-dev] [jira] Updated: (NUTCH-30) rss feed parser

2005-04-17 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-30?page=history ] Chris A. Mattmann updated NUTCH-30: --- Attachment: parse-rss-srcbin-incl-path.zip Hi John, Here ya go. The zip file includes: 1. up-to-date zipped up src of the plugin, incl. required

[Nutch-dev] [jira] Commented: (NUTCH-33) MIME content type detector (using magic char sequences)

2005-04-17 Thread Damian Gajda (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-33?page=comments#action_63012 ] Damian Gajda commented on NUTCH-33: --- Hello guys, one question: Wouldn't it be more resonable to use already developed description of magic numbers than develop Your own? O

[Nutch-dev] [jira] Closed: (NUTCH-19) Space in Java.exe path chokes bin/nutch

2005-04-17 Thread John Xing (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-19?page=history ] John Xing closed NUTCH-19: -- Resolution: Fixed > Space in Java.exe path chokes bin/nutch > --- > > Key: NUTCH-19 > URL: http://issues.apache

[Nutch-dev] [jira] Closed: (NUTCH-22) ontology supported query refinement

2005-04-17 Thread John Xing (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-22?page=history ] John Xing closed NUTCH-22: -- Resolution: Fixed > ontology supported query refinement > --- > > Key: NUTCH-22 > URL: http://issues.apache.org/jir

[Nutch-dev] RE: [jira] Commented: (NUTCH-30) rss feed parser

2005-04-17 Thread Chris A Mattmann
Hi John, You got it. I'll check out the latest SVN right now, and create the patch. I'm not sure I understand what you mean about the text/xml though? Are you talking about how in the plugin.xml for the parse-rss plugin, that it tries to associate text/xml with rss? I can take that out, not probl

[Nutch-dev] [jira] Commented: (NUTCH-30) rss feed parser

2005-04-17 Thread John Xing (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-30?page=comments#action_63010 ] John Xing commented on NUTCH-30: Could we have an updated patch & zip against most recent svn? Also I am not sure it is a good idea to have parse-rss capture any mime type text

[Nutch-dev] [jira] Commented: (NUTCH-33) MIME content type detector (using magic char sequences)

2005-04-17 Thread John Xing (JIRA)
deprecated method: [javac] C:\cygwin\home\john\nutch\devel\nutch-svn-20050417\src\test\org\apache\nutch\util\mime\TestMimeTypes.java:79: warning: [deprecation] readLine() in java.io.DataInputStream has been deprecated [javac] String[] tokens = in.readLine().split(";"

[Nutch-dev] Re: language identifier

2005-04-17 Thread Andrzej Bialecki
Andy Liu wrote: One thing that can be done is to move the n-gram language detection calls to the HTMLLangaugeParser (a HtmlParseFilter plugin). After This is not practical, because content comes from other parser plugins as well, so this code fragment would have to be added to all plugins... get

[Nutch-dev] [jira] Commented: (NUTCH-34) Parsing different content formats

2005-04-17 Thread Andrzej Bialecki (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-34?page=comments#action_62996 ] Andrzej Bialecki commented on NUTCH-34: Currently there is such a "registry", and it is built and maintained by PluginRepository. So, it seems to me that the only ch

[Nutch-dev] Re: Someone working on NUTCH-34?

2005-04-17 Thread Andrzej Bialecki
Jérôme Charron wrote: Does someone is working on the NUTCH-34 issue? http://issues.apache.org/jira/browse/NUTCH-34 If no, I'm candidate ... If yes, can I help? I just made a comment on JIRA on this issue. Your help would be appreciated! -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _