RE: Hi Experts

2006-03-28 Thread Ranjan K. Baisak
I am using HTMLparser to parse all html pages and to get required information out of that. Let me tell you my crawler. I have to search for all pages of group website(e.g. www.group.com and it contians links to news.groups.com, forum.news.com etc...).So I will get all links of group website using P

RE: Hi Experts

2006-03-28 Thread Babu, KameshNarayana \(GE, Research, consultant\)
" -Original Message- From: Ranjan K. Baisak [mailto:[EMAIL PROTECTED] Sent: Wednesday, March 29, 2006 12:06 PM To: java-user@lucene.apache.org Subject: Re: Hi Experts For internet searching Nutch is the best tool. But however as you dont want to use cygwin then you need to use Lucene in

Re: Hi Experts

2006-03-28 Thread Ranjan K. Baisak
For internet searching Nutch is the best tool. But however as you dont want to use cygwin then you need to use Lucene in following way. You need to download whole page and create an index out of that page. Then use lucene to search offline content than online. I have used lucene in this way and I h

RE: Hi Experts

2006-03-28 Thread Aditya Liviandi
The way lucene works is you need to have the index first. Only then you can search it. So if you want to search within a given URL, you need to somehow create the index of all the webpages within that URL. If the webserver linked to that URL is also yours, then that would not be a big deal. But i

Hi Experts

2006-03-28 Thread Babu, KameshNarayana \(GE, Research, consultant\)
Hi Experts, Iam a new bie. Iam suppose to select a search engine for my project, which should search from the given URL and display the result. I should use the search engine in windows OS only. I should not use any other external tool like CGYWIN which is used in NUTCH. Any expet guidance w

Re: API for setting lock directory

2006-03-28 Thread Daniel Noll
Otis Gospodnetic wrote: This was the old behaviour in Lucene. You may want to check the ML archives to see why the lock was moved to temp dir by default, I remember there being some discussion around that. This might have been2 years ago? To answer my own question, I've just discovered

Data structure of a Lucene Index

2006-03-28 Thread Prasenjit Mukherjee
It seems to me that lucene doesn't use B-tree for its indexing storage. Any paper/article which explains the theory behind data-structure of single index(segment). I am not referring to the merge algorithm, I am curious to know the storage structure of a single optimized lucene index. Any po

Re: API for setting lock directory

2006-03-28 Thread Otis Gospodnetic
This was the old behaviour in Lucene. You may want to check the ML archives to see why the lock was moved to temp dir by default, I remember there being some discussion around that. This might have been2 years ago? Otis - Original Message From: Daniel Noll <[EMAIL PROTECTED]>

Re: API for setting lock directory

2006-03-28 Thread Daniel Noll
Gopikrishnan Subramani wrote: With Lucene 1.4.3 I used to use a different lock directory from the default by setting org.apache.lucene.lockdir system property. But as per http://lucene.apache.org/java/docs/systemproperties.html this is not supported in Lucene 1.9 and above. I don't see any equiva

Re: BooleanQuery containing SpanNearQuery throws ArrayOutOfBoundsException .

2006-03-28 Thread Paul Elschot
Comments inline below. On Tuesday 28 March 2006 18:29, Ramana Jelda wrote: > > Hi, > I have a got strange problem. > My searchterm : "mp3 player" > Lucene Query : > +( > +( > spanNear([productName:mp, productName:3], 3, true) > spanNear([subName:mp, subName:3], 3, true) >) > +(p

Re: API for setting lock directory

2006-03-28 Thread Daniel Naber
On Dienstag 28 März 2006 13:24, Gopikrishnan Subramani wrote: > With Lucene 1.4.3 I used to use a different lock directory from the > default by setting org.apache.lucene.lockdir system property. But as per > http://lucene.apache.org/java/docs/systemproperties.html this is not > supported in Lucen

Install problem

2006-03-28 Thread Jim Douglas
I use ant for many other builds so I know that's not the problem. When I run 'ant' from the directory I have untarred lucene to I get a 'build failed' error, it says, Cannot find common-build.xml imported from /root/lucene-1.9.1/build.xml Where could I find common-build.xml, and build-deprec

RE: Commercial vendors monitoring this ML? was: Lucene Performance Issues

2006-03-28 Thread Runde, Kevin
Of course they are monitoring this mail list, Lucene rocks and it is beating them. Do yourself a favor and dedicate some time to testing Lucene vs. any commercial application. A little time spent up front testing the tools can save you significant time later optimizing, hacking in a new tool, or re

Commercial vendors monitoring this ML? was: Lucene Performance Issues

2006-03-28 Thread jwang
Weird, I was just about to comment on the fact that since posting that my organization has decided to use Lucene, I got calls from two commercial vendors that didn't give me the time of the day while I was doing my comparison analysis. Both of them referred to some random "colleague" in the busine

Re: Nutch? Solr? Red-piranha?

2006-03-28 Thread Eivind Hasle Amundsen
I don't have any experience with Red-piranha. I'm wondering if the Red-Piranha project is still going strong. The so-called "community edition" seems to be lacking development according to the CVS repository: http://cvs.sourceforge.net/viewcvs.py/red-piranha/CVSROOT/ (The last commits seem

Re: to OR or not

2006-03-28 Thread Erik Hatcher
I much prefer the one catch-all field approach, personally. As long as scoring works out how you'd like using this technique, then it'll make for simpler (and thus faster) queries. However, reindexing is necessary to bring another field into the mix, whereas run-time ORing is more flexibl

Re: Nutch? Solr? Red-piranha?

2006-03-28 Thread Yonik Seeley
On 3/28/06, Michael Levy <[EMAIL PROTECTED]> wrote: > I'm looking for advice on selecting a search application. I'm > responsible for developing a new search platform for use in a historical > research organization and museum. I've pretty much decided on Lucene as > the library for custom servlet

Re: Lucene Performance Issues

2006-03-28 Thread thomasg
Thanks v. much for your thoughts, a lot to think about. I'm currently doing some benchmark tests on typical usage scenarios with lucene. I'm actually using lucene through its integration with Jackrabbit dms so may not be easy/possible to use a different search engine anyway. Of course I'd rather b

Re: Lucene Performance Issues

2006-03-28 Thread Otis Gospodnetic
Hi Thomas, Sound like FUD to me. No concrete numbers, and the benchmark they mention eh, haven't we all seen "funny" benchmarks before? Lucene is used in many large operations (e.g. Technorati, Simpy) that involve a LOT of indexing and searching, large indices, etc. I suggest you try bot

Re: Lucene Performance Issues

2006-03-28 Thread Doug Cutting
thomasg wrote: Hi, we are currently intending to implement a document storage / search tool using Jackrabbit and Lucene. We have been approached by a commercial search and indexing organisation called ISYS who are suggesting the following problems with using Lucene. We do have a requirement to st

BooleanQuery containing SpanNearQuery throws ArrayOutOfBoundsException .

2006-03-28 Thread Ramana Jelda
Hi, I have a got strange problem. My searchterm : "mp3 player" Lucene Query : +( +( spanNear([productName:mp, productName:3], 3, true) spanNear([subName:mp, subName:3], 3, true) ) +(productName:player subName:player) ) Throws following lucene BooleanScorer2 exception. Caused by:

Re: Phrase Query query

2006-03-28 Thread Richard Gunderson
Hi Otis Thanks for the information. I'm actually writing something to search files containing code (such as JSP files) so I do expect there will be a few problems like this because I guess Lucene's out-of-the box analyzers are really suited to natural languages. But, I was wondering if you could

Nutch? Solr? Red-piranha?

2006-03-28 Thread Michael Levy
I'm looking for advice on selecting a search application. I'm responsible for developing a new search platform for use in a historical research organization and museum. I've pretty much decided on Lucene as the library for custom servlet apps that would use the Lucene API directly. At the sa

RE: API for setting lock directory

2006-03-28 Thread Koji Sekiguchi
Hi Gopi, I don't know whether org.apache.lucene.lockDir is available at 2.0. You may want to see release note for the system property: http://svn.apache.org/repos/asf/lucene/java/tags/lucene_1_9_1/CHANGES.txt regards, Koji > -Original Message- > From: Gopikrishnan Subramani [mailto:[EM

Re: API for setting lock directory

2006-03-28 Thread Gopikrishnan Subramani
Thanks, Koji. I tried org.apache.lucene.lockDir and found that to be working. But my only concern is if this is the suggested approached or a deprecated one. I wanted to know this to make sure my application is Lucene 2.0 compliant. Thanks again.. Gopi On 3/28/06, Koji Sekiguchi <[EMAIL PROTECT

RE: API for setting lock directory

2006-03-28 Thread Koji Sekiguchi
Hi Gopi, The name of the system property has been changed to org.apache.lucene.lockDir and it should be still available. Thank you, Koji > -Original Message- > From: Gopikrishnan Subramani [mailto:[EMAIL PROTECTED] > Sent: Tuesday, March 28, 2006 8:24 PM > To: java-user@lucene.apache.o

API for setting lock directory

2006-03-28 Thread Gopikrishnan Subramani
Hello, With Lucene 1.4.3 I used to use a different lock directory from the default by setting org.apache.lucene.lockdir system property. But as per http://lucene.apache.org/java/docs/systemproperties.html this is not supported in Lucene 1.9 and above. I don't see any equivalent API for lockDir pro

Re: Lucene Performance Issues

2006-03-28 Thread Eric Jain
thomasg wrote: 1) By default, Lucene only indexes the first 10,000 words from each document. When increasing this default out-of-memory errors can occur. This implies that documents, or large sections thereof, are loaded into memory. ISYS has a very small memory footprint which is not affected by

Lucene Performance Issues

2006-03-28 Thread thomasg
Hi, we are currently intending to implement a document storage / search tool using Jackrabbit and Lucene. We have been approached by a commercial search and indexing organisation called ISYS who are suggesting the following problems with using Lucene. We do have a requirement to store and search l

Re: How to write to and read from the same index

2006-03-28 Thread Patrick Kimber
Hi Nick Have you tried the Lucene Index Accessor contribution? We have a similar update/search pattern and it works very well. http://www.nabble.com/Fwd%3A-Contribution%3A-LuceneIndexAccessor-t17416.html#a47049 Patrick On 28/03/06, Nick Atkins <[EMAIL PROTECTED]> wrote: > I'm using Lucene runn