details?
Yousef Ourabi wrote:
Saad,
Here is what I got. I will post again, and be more
specific.
-Y
--- Nader Henein <[EMAIL PROTECTED]> wrote:
We'll need a little more detail to help you, what
are the sizes of your
updates and how often are they updated.
1) No just re-open the i
2) It all comes down to your needs, more detail would help us help you.
Nader Henein
Yousef Ourabi wrote:
Hey,
We are using lucene to index a moderatly changing
database, and I have a couple of questions on a
performance strategy.
1) Should we just have one index writer open unil the
system comes
oost Phrase Terms as in the example:
"jakarta apache"^4 "jakarta lucene"
By default, the boost factor is 1. Although the boost factor must be
positive, it can be less than 1 (e.g. 0.2)
Regards.
Nader Henein
Karthik N S w
it's all been
indexed and then continue on with incremental updates / deletes.
Nader Henein
[EMAIL PROTECTED] wrote:
Hi
I'm working on integrating lucene with a cms. All the data is stored in a
database. I'm looking at about 2 million records. Any advice on an
effective techniqu
Download Luke, it makes life easy when you inspect the index, so you an
actually look at what you've indexed, as opposed to what you may think
you indexed.
Nader
Daniel Cortes wrote:
Hi to everybody, and merry christmas for all(and specially people who
that me today are "working" instead of st
this:
stored/ indexed
stored/ un-indexed
stored/ un-indexed
stored / indexed
indexed / un stored
Enjoy
Nader Henein
Daniel Cortes wrote:
thks nader
I need a general search of documents, it's for this that I ask yours
recomendations, because fields are only for info in the searc
It comes down to your searching needs, do you need to have your
documents searcheable by these fields or do you need a general search of
the whole document, your decisions will impact the size of the index and
the speed of indexing and searching so give it due thought, start from
your GUI requi
As obvious as it may seem, you could always store the index ID in which
you are indexing the document in the document itself and have that
fetched with the search results, or is there something stopping you from
doing that.
Nader Henein
Karthik N S wrote:
Hi Guys
Apologies...
I
This is a OS file system error not a Lucene issue (not for this board) ,
Google it for Gentoo specifically you a get a whole bunch of results one
of which is this thread on the Gentoo Forums,
http://forums.gentoo.org/viewtopic.php?t=9620
Good Luck
Nader Henein
Karthik N S wrote:
Hi Guys
ut, as you mentioned, it comes down to why you need the Thin DB for,
Lucene is a wonderful search engine, but if I were looking at a fast and
dirty relational DB, MySQL wins hands down, put them both together and
you've really got something.
My 2 cents
Nader Henein
Kevin L. Cobb wrote:
I
Dude, and I say this with love, it's open source, you've got the code,
take the initiative, DIY, be creative and share your findings with the
rest of us.
Personally I would be interested to see how you do this, keep your
changes documented and share.
Nader Henein
Karthik N S wrot
depending on your needs.
My two galiuns
Nader Henein
Karthik N S wrote:
Hi Guys
Apologies.
On yahoo and Altavista ,if searched upon a word like 'kid' returns the
search with
similar as below.
Also try: kid rock, kid games, star wars kid, karate kid More...
How to obtain the s
You may singe your fingers if you touch the keyboard during indexing
Nader
Miguel Angel wrote:
What are disadvantages the Lucene??
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
The down and dirty answer is it's like defragmenting your harddrive,
you're basically compacting and sorting out index references. What you
need to know is that it makes searching so much faster after you've
updating the index.
Nader Henein
Miguel Angel wrote:
What`s mean Opt
Well if the document ID is number (even if it isn't really) you could
use a range query, or just rebuild your index using that specific filed
as a sorted field but if it numeric be aware that if you use integer it
limits how high your numbers can get.
nader
Edwin Tang wrote:
Hello,
I have been
That's it, you need to batch your updates, it comes down to do you need to give
your users search accuracy to the second, take your database and put an
is_dirty row on the master table of the object you're indexing and run a
scheduled task every x minutes and have your process read the objects
ng to overwhelm Lucene.
What's your update schedule, how big is the index, and after how many updates
does the system crash?
Nader Henein
Luke Shannon wrote:
It conistantly breaks when I run more than 10 concurrent incremental
updates.
I can post the code on Bugzilla (hopefully when I get to the si
We've recently implemented something similar with the backup process
creating a file (much like the lock files during indexing) that the
IndexWriter recognizes (tweak) and doesn't attempt to start and indexing
or a delete while it's there, wasn't that much work actually.
Nader
Doug Cutting wrot
boost up speed using RamDirectory if you need more
speed from the search, but whichever approach you choose I would
recommend that you sit down and do some number crunching to figure out
which way to go.
Hope this helps
Nader Henein
Chris Lamprecht wrote:
I'd like to implement a searc
cents
Nader Henein
Karthik N S wrote:
Hi Guys
Apologies.
a)
1) SEARCH FOR SUBINDEX IN A OPTIMISED MERGED INDEX
2) DELETE THE FOUND SUBINDEX FROM THE OPTIMISED MERGERINDEX
3) OPTIMISE THE MERGERINDEX
4) ADD A NEW VERSION OF THE SUBINDEX TO THE MERGER INDEX
5) OPTIMISE THE MERGERINDEX
Graceful, no, I started a discussion on this about two years ago, what
I'm doing is a batched indexing so if a crash occurs the next time the
application starts up I have an LuceneInit class that goes and ensures
that all indecies have no locks on them by simply deleting the lock file
and opti
eciate it if you could CC me on
the docs or the code.
Thanks!
Yonik
--- Nader Henein <[EMAIL PROTECTED]> wrote:
It's pretty integrated into our system at this
point, I'm working on
Packaging it and cleaning up my documentation and
then I'll make it
available, I can give you th
you can do both at the same time, it's thread safe, you will face
different issues depending on the frequency or your indexing and the
load on the search, but that shouldn't come into play till your index
gets nice and heavy. So basically code on.
Nader Henein
Miro Max wrote:
hi,
It's pretty integrated into our system at this point, I'm working on
Packaging it and cleaning up my documentation and then I'll make it
available, I can give you the documents and if you still want the code
I'll slap together a ruff copy for you and ship it across.
Nader H
We use Lucene over 4 replicated indecies and we have to maintain
atomicity on deletion and updates with multiple fallback points. I'll
send you the right up, it's too big to CC the entire board.
nader henein
Christian Rodriguez wrote:
Hello guys,
I need additions and deletions of do
Well, are you "storing" any data for retrieval from the index, because
you could encrypt the actual data and then encrypt the search string
public key style.
Nader Henein
Weir, Michael wrote:
We need to have index files that can't be reverse engineered, etc. An
obvious appro
As far as my testing showed, the sort will take priority, because it's
basically an opt-in sort as opposed to the defaulted score sort. So
you're basically displaying a sorted set over all your results as
opposed to sorting the most relevant results.
Hope this helps
Nader He
I'd be happy to help anyone test this out, my Arabic is pretty good.
Nader
Andrzej Bialecki wrote:
Dawid Weiss wrote:
nothing to do with each other furthermore, Arabic uses phonetic
indicators on each letter called diacritics that change the way you
pronounce the word which in turn changes the w
(averaging out at 5 searches per second) the Arabic stemming
options would not be able to manage user expectations, which is what it
comes down to, sometimes theory does not translate well to practice.
Nader Henein
Dawid Weiss wrote:
nothing to do with each other furthermore, Arabic uses phonetic
paper from Berkeley that
outlines the work and the challenges,
http://metadata.sims.berkeley.edu/papers/trec2002.pdf, hope it helps.
Nader Henein
Scott Smith wrote:
Is anyone aware of an open source (non-GPL; i.e.., free for commercial
use) Arabic analyzer for Lucene? Does Arabic really require a
be a pleasure, just didn't want to mislead someone down the wrong way.
Give me a few days and I'll have the new version up.
Nader
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
l
send you both the early document and the newer version that deals
squarely with Lucene in a distributed environment with high volume index.
Regards.
Nader Henein
Ben Sinclair wrote:
My application currently uses Lucene with an index living on the
filesystem, and it works fine. I'm moving t
Here's the thread you want :
http://issues.apache.org/eyebrowse/[EMAIL PROTECTED]&msgId=1722573
Nader Henein
Kevin A. Burton wrote:
I'm trying to burn an index of 14M documents.
I have two problems.
1. I have to run optimize() every 50k documents or I run out of file
handles. t
#x27;s say 5000 files at
a time and then deleting them or moving them into another location, it
you get 100 million files simply up the precision on the directory to a
3 digit setup or a 4 digit setup (once you automate it, sky's the limit)
Hope this helps
Nader Henein
Michael Wechner wrote:
a read it helps clear out some myths.
Nader Henein
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Could you share you're indexing code, and just to make sure id there
anything running on your machine that could delete these files, like an a
cron job that'll back up the index.
You could go by process of elimination and shut down your server and see if
the files disappear, coz if the problem is
I hate to speak after Otis, but the way we deal with this is by clearing
locks on server restart, in case a server crash occurs mid indexing and we
also optimize on server restart, it doesn't happen often (God bless Resin)
but when it has we faced no problems from Lucene.
Just fir the record we h
Why do you have concurency problems? are you trying to
have each user initiate the indexing himself? because
that will create issues, how about you put all the new
files you want to index in a directory and then have a
schedule procedure on the webserver run the lucene
indexer on that directory, ou
38 matches
Mail list logo