[Nutch-dev] Chinese in Nutch:My solution

2005-04-11 Thread cao yuzhong
hi,every one: I have integrated Nutch with an intelligent Chinese Lexical Analysis System.So Nutch now can segment Chinese words effectively. Following is my solution: 1.modify NutchAnalysis.jj: -| <#CJK:// non-alphabets - [ - "\u3040"-"\u318f",

Re: [Nutch-dev] Feature request - pluggable Analyzer

2005-04-11 Thread Jason Tang
David Please talk more about you own Analyzer:) And first I think we should know what NutchDocumentAnalyzer should focus on and what should not(Anyone to explain?). BTW: I like AnalyzerFactory to maintain/cache all analyzers /Jack === At 2005-04-12, 12:34:37 you wrote: === >Hi al

[Nutch-dev] Your message could not be sent

2005-04-11 Thread Mail Delivery System
This is an automated email generated by pop2.net4india.com. Please do not reply to this. You are receiving it because a message that you sent could not be delivered to all of its intended recipients. Following is/are the address(es) failed and reason for the failure: [EMAIL PROTECTED] U

[Nutch-dev] Feature request - pluggable Analyzer

2005-04-11 Thread David Wallace
Hi all, I have found a need to do document analysis other than that which is provided by the NutchDocumentAnalyzer class. I have written my own Analyzer class, and I need to plug it into the Nutch framework. What I've done is the following, and I'd like to suggest that it be made part of the main

[Nutch-dev] Re: How to do OR search in Nutch?

2005-04-11 Thread zhang jin
If I want to support Or.How I should do? Thanks very much! On Apr 12, 2005 1:30 AM, Doug Cutting <[EMAIL PROTECTED]> wrote: > > Kannan Sundaramoorthy wrote: > > When searching for two terms using OR operator (or with just space > > between two terms), nutch returns the results as if the terms wer

[Nutch-dev] Re: XML OUTPUT

2005-04-11 Thread Jack Tang
Guys Please join "Nutch-39 issue" thread in nutch-dev maillist discussion. Thanks /Jack On Apr 12, 2005 8:28 AM, zhang jin <[EMAIL PROTECTED]> wrote: > Thanks very much,that's very good! > > On Apr 12, 2005 12:56 AM, Orlando Tempobono - AtlasVision < > [EMAIL PROTECTED]> wrote: > > > > Hi, > >

[Nutch-dev] Re: [jira] Commented: (NUTCH-36) Chinese in Nutch

2005-04-11 Thread Jack Tang
Cutting I agree with you! All segmentation of the character stream should be done in NutchAnalysis.jj. More, here are something wrong in my solution. I feel so so so sorry about my "impulsive" patch. I found it some days ago, and I am working on it. In my project I just replace my CJKAnalyzer wi

[Nutch-dev] Re: XML OUTPUT

2005-04-11 Thread zhang jin
Thanks very much,that's very good! On Apr 12, 2005 12:56 AM, Orlando Tempobono - AtlasVision < [EMAIL PROTECTED]> wrote: > > Hi, > > We are working in a network of search websites here in Brazil called > www.sitedebusca.com the complete list are in > http://www.servi

[Nutch-dev] [jira] Closed: (NUTCH-15) ipc client timeout should be configurable

2005-04-11 Thread Stefan Grroschupf (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-15?page=history ] Stefan Grroschupf closed NUTCH-15: -- submitted. > ipc client timeout should be configurable > - > > Key: NUTCH-15 > URL: http://iss

[Nutch-dev] Bot information within server log

2005-04-11 Thread Michael Wechner
Hi I am using the SVN version of Nutch (Last Changed Rev: 160963) and receive the following bot information within my server log when crawling my site "NutchCVS/0.06-dev (Nutch; http://www.nutch.org/docs/en/bot.html; [EMAIL PROTECTED])" I think it would make sense to update this re the move to Ap

[Nutch-dev] [jira] Commented: (NUTCH-36) Chinese in Nutch

2005-04-11 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-36?page=comments#action_62604 ] Doug Cutting commented on NUTCH-36: --- I like what this patch does, but not how it does it. Nutch should perform bi-gram segementation of CJK character sequences. This patch

[Nutch-dev] RE: sorting search results

2005-04-11 Thread Chirag Chaman
This is definitely helpful. We can get rid of quite a bit of custom code we wrote and manually merge with each release. CC- -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Monday, April 11, 2005 5:30 PM To: nutch-dev@incubator.apache.org Subject: sorting search re

[Nutch-dev] resolve or close bugs?

2005-04-11 Thread Stefan Groschupf
Hi, I'm confused by this comment: Doug: "In the future, things shouldn't be resolved until the patch is committed. I just committed this." http://issues.apache.org/jira/browse/NUTCH-29#action_61664 I personal understand the life cycle of a issue like this: - Create an issue. - Assign an issue to

[Nutch-dev] [jira] Updated: (NUTCH-35) modify XML parsing code in Nutch to use single API

2005-04-11 Thread Stefan Grroschupf (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-35?page=history ] Stefan Grroschupf updated NUTCH-35: --- Attachment: xml_API_patch.txt A patch that removes all dom4j dependencies in the plugin manifestparser. > modify XML parsing code in Nutch to use singl

[Nutch-dev] sorting search results

2005-04-11 Thread Doug Cutting
Currently query filters generate a Lucene Query. But Lucene's search methods have other parameters besides a query. In particular, these can also be passed a Filter and a Sort. In part as a work-around, Nutch automatically converts Lucene query clauses with zero boost into filters. But ther

[Nutch-dev] Re: How to do OR search in Nutch?

2005-04-11 Thread Doug Cutting
Kannan Sundaramoorthy wrote: When searching for two terms using OR operator (or with just space between two terms), nutch returns the results as if the terms were united by AND, i.e. does not search for each term separately. Nutch just looks for documents where both terms are present. Does "OR" se

[Nutch-dev] Re: [jira] Commented: (NUTCH-39) pagination in search result

2005-04-11 Thread Doug Cutting
Nick Lothian wrote: Shouldn't things like that, , and be defined in the XSLT as presentation data? I agree. The XML should avoid presentation-specific stuff. Navigation urls are okay (next page, more from site, etc.) as they can be tricky to compute. Doug

[Nutch-dev] Re: [jira] Commented: (NUTCH-39) pagination in search result

2005-04-11 Thread Doug Cutting
[EMAIL PROTECTED] wrote: I use velocity, and I rewrite the jsp pages to servelt. I think in this case the program and html code is more separeted. I agree. The RSS generator should be a servlet, not a jsp page. Doug --- SF email is sponsored by -

[Nutch-dev] Re: XML OUTPUT

2005-04-11 Thread Orlando Tempobono - AtlasVision
Hi, We are working in a network of search websites here in Brazil called www.sitedebusca.com the complete list are in http://www.servicodebusca.com/sitesdebusca.php and we add some patchs on search.jsp to show the results in a simple XML format, to read in your own application actually write

Re: [Nutch-dev] Re: Image and Video Search

2005-04-11 Thread Hasan Diwan
On Apr 10, 2005 6:24 AM, Stefan Groschupf <[EMAIL PROTECTED]> wrote: > Feel free to contribute such plugins, it isn't difficult to write. I am working on a plugin to index Jpeg metadata. Contact me off-list if you'd like to help. -- Cheers, Hasan Diwan <[EMAIL PROTECTED]> --

AW: [Nutch-dev] Re: tools cleanup

2005-04-11 Thread Strittmatter, Stephan
Hi, I already started some tests on using cli2. CLI v. 1 is in my opinion not supporting al required parameters. I defined a interface "Tool" and created a AbstractTool class. Currently i started to change the existing tools to be extended from them. Addition I want to create a commons-laun

[Nutch-dev] when compile nutch-0.6,there is a problem

2005-04-11 Thread Zhou LiBing
when compile nutch-0.6,there is a problem. First I execute the command "ant jar" Then, " ant war " ,then the compiling is failed ,the error report as follows [xslt] Loading stylesheet /home/bluegrid/nutch/src/web/style/nutch- header.xsl [xslt]

[Nutch-dev] rank of hits

2005-04-11 Thread Siva Bandhamravuri
Hi, I want to boost the ranks of some documents based on some criteria. How can I access the rank of a document after the search is returned in search.jsp thanks Siva --- SF email is sponsored by - The IT Product Guide Read honest & candid

[Nutch-dev] Re: nutch engines

2005-04-11 Thread Stefan Groschupf
Siva, 1.) I ask myself this question as well some weeks ago. I guess this are firefox plugins. 2.) There are already tools to compare results how ever they are webbased like the tool from Antonio Gulli. http://rankcomparison.di.unipi.it/ Some weeks ago I was staring to write a small tool to be ab

[Nutch-dev] Question re index merge call in crawl tool

2005-04-11 Thread David Wallace
Hi all, I am trying to understand Nutch a little better, so that I can evaluate its suitability for a project I am soon to embark on. I have been studying the code in CrawlTool.java (used for an "intranet search"). The line that bothers me is the call to IndexMerger.main(), near the end of main()

[Nutch-dev] Re: Image and Video Search

2005-04-11 Thread Stefan Groschupf
Nutch does not support image or video search yet, but there was a mp3 parser plugin. Check this plugin code and then you can think about writing a own image or video parser plugin. You can index meta data you can extract from images or videos and anchor and meta information from the html page as

[Nutch-dev] How to do OR search in Nutch?

2005-04-11 Thread Kannan Sundaramoorthy
Hi, When searching for two terms using OR operator (or with just space between two terms), nutch returns the results as if the terms were united by AND, i.e. does not search for each term separately. Nutch just looks for documents where both terms are present. Does "OR" search require change in a

[Nutch-dev] XML OUTPUT

2005-04-11 Thread lumavanossi
Hi! Does anybody knows how to output search results in XML format? I would like to provide my data like Google/Yahoo do with their API's. Thanks!

Re: [Nutch-dev] Supported web server platform & version

2005-04-11 Thread Stefan Groschupf
The nutch user frontend will run in any Servlet/JSP Specification 2.4/2.0 confirm servlet container. Am 11.04.2005 um 11:57 schrieb [EMAIL PROTECTED]: I want to know what is the supported web server platform and version that Nutch would support?  Is IPlanet 5.0 SP 5 supported? regards Brian --

[Nutch-dev] Supported web server platform & version

2005-04-11 Thread Brian . Liew
I want to know what is the supported web server platform and version that Nutch would support?  Is IPlanet 5.0 SP 5 supported? regards Brian - Office : 65-63312466 This email is confidential. If you are not the addressee tell the sender immediately and destroy