Re: Re-Indexing a moving target???

2005-02-01 Thread Nader Henein
details? Yousef Ourabi wrote: Saad, Here is what I got. I will post again, and be more specific. -Y --- Nader Henein <[EMAIL PROTECTED]> wrote: We'll need a little more detail to help you, what are the sizes of your updates and how often are they updated. 1) No just re-open the i

Re: Re-Indexing a moving target???

2005-01-28 Thread Nader Henein
2) It all comes down to your needs, more detail would help us help you. Nader Henein Yousef Ourabi wrote: Hey, We are using lucene to index a moderatly changing database, and I have a couple of questions on a performance strategy. 1) Should we just have one index writer open unil the system comes

Re: QUERYPARSIN & BOOSTING

2005-01-11 Thread Nader Henein
oost Phrase Terms as in the example: "jakarta apache"^4 "jakarta lucene" By default, the boost factor is 1. Although the boost factor must be positive, it can be less than 1 (e.g. 0.2) Regards. Nader Henein Karthik N S w

Re: Advice on indexing content from a database

2005-01-05 Thread Nader Henein
it's all been indexed and then continue on with incremental updates / deletes. Nader Henein [EMAIL PROTECTED] wrote: Hi I'm working on integrating lucene with a cms. All the data is stored in a database. I'm looking at about 2 million records. Any advice on an effective techniqu

Re: time of indexer

2004-12-28 Thread Nader Henein
Download Luke, it makes life easy when you inspect the index, so you an actually look at what you've indexed, as opposed to what you may think you indexed. Nader Daniel Cortes wrote: Hi to everybody, and merry christmas for all(and specially people who that me today are "working" instead of st

Re: index question

2004-12-27 Thread Nader Henein
this: stored/ indexed stored/ un-indexed stored/ un-indexed stored / indexed indexed / un stored Enjoy Nader Henein Daniel Cortes wrote: thks nader I need a general search of documents, it's for this that I ask yours recomendations, because fields are only for info in the searc

Re: index question

2004-12-27 Thread Nader Henein
It comes down to your searching needs, do you need to have your documents searcheable by these fields or do you need a general search of the whole document, your decisions will impact the size of the index and the speed of indexing and searching so give it due thought, start from your GUI requi

Re: MergerIndex + Searchables

2004-12-21 Thread Nader Henein
As obvious as it may seem, you could always store the index ID in which you are indexing the document in the document itself and have that fetched with the search results, or is there something stopping you from doing that. Nader Henein Karthik N S wrote: Hi Guys Apologies... I

Re: LUCENE1.4.1 - LUCENE1.4.2 - LUCENE1.4.3 Exception

2004-12-15 Thread Nader Henein
This is a OS file system error not a Lucene issue (not for this board) , Google it for Gentoo specifically you a get a whole bunch of results one of which is this thread on the Gentoo Forums, http://forums.gentoo.org/viewtopic.php?t=9620 Good Luck Nader Henein Karthik N S wrote: Hi Guys

Re: Opinions: Using Lucene as a thin database

2004-12-14 Thread Nader Henein
ut, as you mentioned, it comes down to why you need the Thin DB for, Lucene is a wonderful search engine, but if I were looking at a fast and dirty relational DB, MySQL wins hands down, put them both together and you've really got something. My 2 cents Nader Henein Kevin L. Cobb wrote: I

Re: HITCOLLECTOR+SCORE+DELIMMA

2004-12-13 Thread Nader Henein
Dude, and I say this with love, it's open source, you've got the code, take the initiative, DIY, be creative and share your findings with the rest of us. Personally I would be interested to see how you do this, keep your changes documented and share. Nader Henein Karthik N S wrot

Re: SEARCH CRITERIA

2004-11-30 Thread Nader Henein
depending on your needs. My two galiuns Nader Henein Karthik N S wrote: Hi Guys Apologies. On yahoo and Altavista ,if searched upon a word like 'kid' returns the search with similar as below. Also try: kid rock, kid games, star wars kid, karate kid More... How to obtain the s

Re: disadvantages

2004-11-21 Thread Nader Henein
You may singe your fingers if you touch the keyboard during indexing Nader Miguel Angel wrote: What are disadvantages the Lucene?? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Optimized??

2004-11-20 Thread Nader Henein
The down and dirty answer is it's like defragmenting your harddrive, you're basically compacting and sorting out index references. What you need to know is that it makes searching so much faster after you've updating the index. Nader Henein Miguel Angel wrote: What`s mean Opt

Re: Need help with filtering

2004-11-16 Thread Nader Henein
Well if the document ID is number (even if it isn't really) you could use a range query, or just rebuild your index using that specific filed as a sorted field but if it numeric be aware that if you use integer it limits how high your numbers can get. nader Edwin Tang wrote: Hello, I have been

Re: _4c.fnm missing

2004-11-16 Thread Nader Henein
That's it, you need to batch your updates, it comes down to do you need to give your users search accuracy to the second, take your database and put an is_dirty row on the master table of the object you're indexing and run a scheduled task every x minutes and have your process read the objects

Re: _4c.fnm missing

2004-11-16 Thread Nader Henein
ng to overwhelm Lucene. What's your update schedule, how big is the index, and after how many updates does the system crash? Nader Henein Luke Shannon wrote: It conistantly breaks when I run more than 10 concurrent incremental updates. I can post the code on Bugzilla (hopefully when I get to the si

Re: Backup strategies

2004-11-16 Thread Nader Henein
We've recently implemented something similar with the backup process creating a file (much like the lock files during indexing) that the IndexWriter recognizes (tweak) and doesn't attempt to start and indexing or a delete while it's there, wasn't that much work actually. Nader Doug Cutting wrot

Re: How to efficiently get # of search results, per attribute

2004-11-13 Thread Nader Henein
boost up speed using RamDirectory if you need more speed from the search, but whichever approach you choose I would recommend that you sit down and do some number crunching to figure out which way to go. Hope this helps Nader Henein Chris Lamprecht wrote: I'd like to implement a searc

Re: UPDATION+MERGERINDEX

2004-11-07 Thread Nader Henein
cents Nader Henein Karthik N S wrote: Hi Guys Apologies. a) 1) SEARCH FOR SUBINDEX IN A OPTIMISED MERGED INDEX 2) DELETE THE FOUND SUBINDEX FROM THE OPTIMISED MERGERINDEX 3) OPTIMISE THE MERGERINDEX 4) ADD A NEW VERSION OF THE SUBINDEX TO THE MERGER INDEX 5) OPTIMISE THE MERGERINDEX

Re: commit lock, graceful handler

2004-11-02 Thread Nader Henein
Graceful, no, I started a discussion on this about two years ago, what I'm doing is a batched indexing so if a crash occurs the next time the application starts up I have an LuceneInit class that goes and ensures that all indecies have no locks on them by simply deleting the lock file and opti

Re: Atomicity in Lucene operations

2004-10-18 Thread Nader Henein
eciate it if you could CC me on the docs or the code. Thanks! Yonik --- Nader Henein <[EMAIL PROTECTED]> wrote: It's pretty integrated into our system at this point, I'm working on Packaging it and cleaning up my documentation and then I'll make it available, I can give you th

Re: simultanous search and indexing

2004-10-17 Thread Nader Henein
you can do both at the same time, it's thread safe, you will face different issues depending on the frequency or your indexing and the load on the search, but that shouldn't come into play till your index gets nice and heavy. So basically code on. Nader Henein Miro Max wrote: hi,

Re: Atomicity in Lucene operations

2004-10-17 Thread Nader Henein
It's pretty integrated into our system at this point, I'm working on Packaging it and cleaning up my documentation and then I'll make it available, I can give you the documents and if you still want the code I'll slap together a ruff copy for you and ship it across. Nader H

Re: Atomicity in Lucene operations

2004-10-15 Thread Nader Henein
We use Lucene over 4 replicated indecies and we have to maintain atomicity on deletion and updates with multiple fallback points. I'll send you the right up, it's too big to CC the entire board. nader henein Christian Rodriguez wrote: Hello guys, I need additions and deletions of do

Re: Encrypted indexes

2004-10-13 Thread Nader Henein
Well, are you "storing" any data for retrieval from the index, because you could encrypt the actual data and then encrypt the search string public key style. Nader Henein Weir, Michael wrote: We need to have index files that can't be reverse engineered, etc. An obvious appro

Re: sorting and score ordering

2004-10-12 Thread Nader Henein
As far as my testing showed, the sort will take priority, because it's basically an opt-in sort as opposed to the defaulted score sort. So you're basically displaying a sorted set over all your results as opposed to sorting the most relevant results. Hope this helps Nader He

Re: Arabic analyzer

2004-10-07 Thread Nader Henein
I'd be happy to help anyone test this out, my Arabic is pretty good. Nader Andrzej Bialecki wrote: Dawid Weiss wrote: nothing to do with each other furthermore, Arabic uses phonetic indicators on each letter called diacritics that change the way you pronounce the word which in turn changes the w

Re: Arabic analyzer

2004-10-07 Thread Nader Henein
(averaging out at 5 searches per second) the Arabic stemming options would not be able to manage user expectations, which is what it comes down to, sometimes theory does not translate well to practice. Nader Henein Dawid Weiss wrote: nothing to do with each other furthermore, Arabic uses phonetic

Re: Arabic analyzer

2004-10-06 Thread Nader Henein
paper from Berkeley that outlines the work and the challenges, http://metadata.sims.berkeley.edu/papers/trec2002.pdf, hope it helps. Nader Henein Scott Smith wrote: Is anyone aware of an open source (non-GPL; i.e.., free for commercial use) Arabic analyzer for Lucene? Does Arabic really require a

Re: Moving from a single server to a cluster

2004-09-08 Thread Nader Henein
be a pleasure, just didn't want to mislead someone down the wrong way. Give me a few days and I'll have the new version up. Nader - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Moving from a single server to a cluster

2004-09-08 Thread Nader Henein
l send you both the early document and the newer version that deals squarely with Lucene in a distributed environment with high volume index. Regards. Nader Henein Ben Sinclair wrote: My application currently uses Lucene with an index living on the filesystem, and it works fine. I'm moving t

Re: Most efficient way to index 14M documents (out of memory/file handles)

2004-07-06 Thread Nader Henein
Here's the thread you want : http://issues.apache.org/eyebrowse/[EMAIL PROTECTED]&msgId=1722573 Nader Henein Kevin A. Burton wrote: I'm trying to burn an index of 14M documents. I have two problems. 1. I have to run optimize() every 50k documents or I run out of file handles. t

Re: incrementally indexing a million documents

2004-06-15 Thread Nader Henein
#x27;s say 5000 files at a time and then deleting them or moving them into another location, it you get 100 million files simply up the precision on the directory to a 3 digit setup or a 4 digit setup (once you automate it, sky's the limit) Hope this helps Nader Henein Michael Wechner wrote:

Re: Devnagari Search?

2004-06-10 Thread Nader Henein
a read it helps clear out some myths. Nader Henein - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

RE: Disappearing segments

2004-04-30 Thread Nader Henein
Could you share you're indexing code, and just to make sure id there anything running on your machine that could delete these files, like an a cron job that'll back up the index. You could go by process of elimination and shut down your server and see if the files disappear, coz if the problem is

RE: read only file system

2004-04-30 Thread Nader Henein
I hate to speak after Otis, but the way we deal with this is by clearing locks on server restart, in case a server crash occurs mid indexing and we also optimize on server restart, it doesn't happen often (God bless Resin) but when it has we faced no problems from Lucene. Just fir the record we h

Re: Multi-Threading

2003-08-19 Thread Nader Henein
Why do you have concurency problems? are you trying to have each user initiate the indexing himself? because that will create issues, how about you put all the new files you want to index in a directory and then have a schedule procedure on the webserver run the lucene indexer on that directory, ou