Index File System Limits

2007-04-24 Thread Andreas Guther
I am currently dealing with lucene indexes of the size of 8 GIG. Searching is fast but retrieving documents slow down the process of returning results to the user. Also the index is updated very frequently, about 3 times a minute and more. This leads to an index that grows very fast in number o

Re: Clustering in MultiSearcher Searchables

2007-04-24 Thread Sawan Sharma
Hi Doron, Yes, we can do it using MultiSearcher.subSearcher(int n). But here we can not get cluster for individual searcher. For this we have to apply a loop from (i = 0 to 3000), which we do not want in our case. We need to show number of hits from each searcher (Without using loop on hits). S

Re: DBSight Turns Free! Instant Lucene Search on Database!

2007-04-24 Thread Chris Lu
Just like your Winamp, Trillian, or other excellent software, its free version can satisfy most of your needs, and advanced features, like scripting, your boss need to pay for it. DBSight is still improving itself. If we can get any support, like how Doug is supported by Yahoo, we would be happy

Re: DBSight Turns Free! Instant Lucene Search on Database!

2007-04-24 Thread James liu
Does it mean DBSight is Free? 2007/4/24, Chris Lu <[EMAIL PROTECTED]>: For those who may be interested, DBSight 1.4.0 now has unlimited index size with Free version! Basically DBSight is more like SOLR + database adapter. You just point it with one or several SQLs to any database, and you can

RE: MultiSearcher w/per-index filtering

2007-04-24 Thread Peter Goldstein
Hi Doron, Thanks for the help. I think you're right. I haven't yet tried this, and I didn't notice that CachingWrapperFilter cached multiple BitSets by IndexReader. So this may be simpler than I thought. I'll give it a whirl and see what happens. Regards, Peter -Original Message-

Re: MultiSearcher w/per-index filtering

2007-04-24 Thread Doron Cohen
Hi Peter, I think this is already taken care of by CachingWrapperFilter - because its caching is (like filtering) by IndexReader, and search by a multiSearcher eventually attempts to filter against each underlying reader, and those "sub-" filters are being cached. So it seems to me that if you ju

Re: Straigtforward stemming example? Dictionary needed?

2007-04-24 Thread damien . mccarthy
I guess there are a few points - it is impossible to stem with total accuracy using rules alone - combining a rule based stemmer with a dictionary could also be error prone. Unrelated words can have the same stem - consider the past tense of see and the stem of sawing ( cutting wood ) - Stemming

Re: Straigtforward stemming example? Dictionary needed?

2007-04-24 Thread Andrew Green
El mar, 24-04-2007 a las 21:49 +0100, [EMAIL PROTECTED] escribió: >> > >> For example, if I search for "eat", I'd like Lucene to find "eating", > >> "eaten", "ate", etc. > > Hi Andrew, > > The example you provide can only partially be performed using a rule based > stemmer, such as those uesd by S

Re: Straigtforward stemming example? Dictionary needed?

2007-04-24 Thread damien . mccarthy
Hi Andrew, The example you provide can only partially be performed using a rule based stemmer, such as those uesd by Snowball. Most stemmers are capable of stemming eating, eats, and eaten to eat. However they will not stem ate to eat. While in theory you could consturuct some form of dictionary

Re: Clustering in MultiSearcher Searchables

2007-04-24 Thread Doron Cohen
Hi Sawan, If I understand the question correctly, you use MultiSearcher over three searchers s[0], s[1], s[2], get some 3000 search results, and for result x (0<=x<3000) need to know if it came from s[0], s[1], or s[2]. If so, take a look at that MultiSearcher.subSearcher(int n) (n would be the

Re: DBSight Turns Free! Instant Lucene Search on Database!

2007-04-24 Thread Chris Lu
Hi, jaf, This is not new and I learned it from Doug. Basically you maintain a mapping of "document id" to values, and collect all the values for each hit in hit collector. Chris Lu - Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.n

Re: Straigtforward stemming example? Dictionary needed?

2007-04-24 Thread Doron Cohen
Hi Andrew, ahg <[EMAIL PROTECTED]> wrote on 24/04/2007 12:18:22: > Hi, all, > > I'm looking for a simple, straightforward example of how to use the > Snowball stemmer to make Lucene search results return all variants of > the terms searched for. > > For example, if I search for "eat", I'd like Lu

Re: DBSight Turns Free! Instant Lucene Search on Database!

2007-04-24 Thread jafarim
Hi Chris, Can you explain how? I know the source is available but perhaps a short summary would be very useful for the list readers. --jaf On 4/24/07, Chris Lu <[EMAIL PROTECTED]> wrote: Hi, Saurabh, It's just one query and returns both hits and categorized counts. Chris --- Saurabh Dani <

Straigtforward stemming example? Dictionary needed?

2007-04-24 Thread ahg
Hi, all, I'm looking for a simple, straightforward example of how to use the Snowball stemmer to make Lucene search results return all variants of the terms searched for. For example, if I search for "eat", I'd like Lucene to find "eating", "eaten", "ate", etc. In particular, I'm not clear on wh

MultiSearcher w/per-index filtering

2007-04-24 Thread Peter Goldstein
All, I'm looking to solve the following problem and I could use some help. My preferred approach appears to be blocked by Java permissioning, and I'm not sure if that's by design or by accident. I have a set of search fixed indices that get built on a 5 hour cycle - these indices are not up

re: DBSight Turns Free! Instant Lucene Search on Database!

2007-04-24 Thread Chris Lu
Hi, Saurabh, It's just one query and returns both hits and categorized counts. Chris --- Saurabh Dani <[EMAIL PROTECTED]> wrote: > Hi Chris, > > How are you showing the hit counts in "Narrow By > Year" options on the left? Is this one query for > each year or a signle query returns both the to

RE: Copy index while updating the index

2007-04-24 Thread Rajendranath, Divya
Would this be costlier than a fssync (filesystem) of the index folder from the primary site to the backup site. How is it different from a normal file sync operation). Would there be any data consistency issues ? One option, is to incrementally reindex the files on primary site to replicate the i

RE: Adding large files to index

2007-04-24 Thread David Xiao
Consider reduce size of per file. Split them into smaller pieces will definitely help indexer working faster. A 50M pure text file is amazing size, very few text files reach that size: 50M. It must be very reasonable if you have to keep all information in such one big file. What you think?

Re: Out of memory exception for big indexes

2007-04-24 Thread Artem Vasiliev
Hi Ivan! btw may be forbidding the sorted search in case of too many results is an option? I did this way in my case. Regards, Artem. On 4/24/07, Artem Vasiliev <[EMAIL PROTECTED]> wrote: Ahhh, you said in your original post that your search matches _all_ the results.. Yup my patch will not h

Re: Out of memory exception for big indexes

2007-04-24 Thread Artem Vasiliev
Ahhh, you said in your original post that your search matches _all_ the results.. Yup my patch will not help much in this case - after all all the values have to be read to be compared while sorting! :) LUCENE-769 patch helps only if result set is significantly less than full index size. Regards

Re: Out of memory exception for big indexes

2007-04-24 Thread Artem Vasiliev
Hello Ivan! It's so sad to me that you had bad results with that patch. :) The discussion in the ticket is out-of-date - the patch was initially in several classes, used WeakHashMap but then it evolved to what it's now - one StoredFieldSortFactory class. I use it in my sharehound app in pretty m

RE: Adding large files to index

2007-04-24 Thread Rajendranath, Divya
But, I am facing the problem even with -Xms 256m and -Xmx 1024m. Yes, the file was not added to the index because java process was already using 1156m of memory, which is much higher than the max heap memory. But, even after waiting for a few minutes till the memory came below the max heap value,

RE: Adding large files to index

2007-04-24 Thread David Xiao
Use java -Xms50m to start your program, that gives a 50M initial heap size. The OutofHeapMemory is because the default heap memory is not enough for your application. -Original Message- From: Divya Rajendranath [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 24, 2007 7:01 PM To: java-use

Adding large files to index

2007-04-24 Thread Divya Rajendranath
Hello All, Could any one help me find solution to the following problem ? I am facing problems while trying to add files of size 50MB to my application. The application has on-demand indexing of documents in place.whenever we add a file to our application, we first put the file details/metadata

How to get term frequency of multi terms and TimeRange?

2007-04-24 Thread SK R
Hi, How to get term frequency of multi terms in particular document? Any API method other than using TermVector may help? Also How to calculate termfreq. of time range. i.e : If my index have a field "TIME" with values in millis (like 1176281188000)., and I want to calculate term freq. of

Re: How to get termfreq. of each doc for wildcard terms?

2007-04-24 Thread SK R
Hi, Anybody have idea about my previous post? Regards RSK On 4/23/07, SK R <[EMAIL PROTECTED]> wrote: Hi, In my application, sometimes I need to find doc Id with term frequency of my terms in my index of multi lines, tokenized & indexed with Standard Analyzer. For this, now I'm using *

Clustering in MultiSearcher Searchables

2007-04-24 Thread Sawan Sharma
Hi all, I am using MultiSearcher to search more then one Index folders. I have one Index searcher array which contains 3 Index searchers... 01. C:\IndexFolder1 02. C:\IndexFolder2 03. C:\IndexFolder3 When I searched in 3 index folders using a MultiSearcher then I got 3000 hits. 1 to 1000 from C

re: DBSight Turns Free! Instant Lucene Search on Database!

2007-04-24 Thread Saurabh Dani
Hi Chris, How are you showing the hit counts in "Narrow By Year" options on the left? Is this one query for each year or a signle query returns both the top 30 results and hit counts for every category? Thanks Saurabh [EMAIL PROTECTED]> Sent: Tuesday,

DBSight Turns Free! Instant Lucene Search on Database!

2007-04-24 Thread Chris Lu
For those who may be interested, DBSight 1.4.0 now has unlimited index size with Free version! Basically DBSight is more like SOLR + database adapter. You just point it with one or several SQLs to any database, and you can have Lucene search! It has Incremental Indexing, Recreating Index, Synch

Re: IndexReader method semantics

2007-04-24 Thread Daniel Noll
Chris Hostetter wrote: : Basically I'm thinking of writing a different kind of IndexReader which : uses a database to return fake terms for things like tags. The idea : would be that it can be slotted in alongside a real index via : ParallelReader in order to provide the fast-changing part. So