date:20140516

A work around to get matching terms from document - Stemmed and Synonyms

2014-05-16 Thread venkatesham.gu...@igate.com

I am looking for a feature in SOLR that will give me all matched terms in the document when I search with a query term, My SOLR field uses Stemming and Synonym filters as a result of this I am unable find what was matching term in the document from the index.As a work around I have written a progra

Re: How to add machine learning to Apache lucene

2014-05-16 Thread Diego Fernandez

I've actually been wondering about this as well. More specifically, I've been wondering if there's any kind of framework to integrate some sort of learn to rank approach (http://en.wikipedia.org/wiki/Learning_to_rank) to Lucene/Solr. Although a similar result can be accomplished by using boost

Re: writer.updateDocument() not working (possible bug?)

2014-05-16 Thread Michael McCandless

You can retrieve the raw content for each field (assuming you stored it). But then you must re-generate a Document from the raw content yourself, as you did originally. Ie you cannot rely on Lucene to remember schema-like things like boost, the FieldType (how the postings were indexed, whether te

Re: How to locate a Phrase inside text (like a Browser text searcher)

2014-05-16 Thread teko

Wow man!! Forget what I said before!! I did tries using your method... well, to generate the index, really, it's still a bit more slow (1/2 minutes more), but, in query... man, It's work very well, and, fast, very fast!! Really, here is so fast that what generate the bottleneck, is the write insi

Re: Can RAMDirectory work for gigabyte data which needs refreshing of the index all the time?

2014-05-16 Thread Toke Eskildsen

On Wed, 2014-05-07 at 15:46 +0200, Cheng wrote: > I have an index of multiple gigabytes which serves 5-10 threads and needs > refreshing very often. I wonder if RAMDirectory is the good candidate for > this purpose. If not, what kind of directory is better? RAMDirectory will probably give you poor

Re: How to locate a Phrase inside text (like a Browser text searcher)

2014-05-16 Thread teko

Emanuel Buzek, Well, I tried using the method 'ShingleFilter' first, and I thought it worked well, but, at last, it still did not work like I want.. So, I tried use NGram... I created a new analyzer to use it, and, I did a test... Well, it works, but, I still need do some manually validation to e

AW: best choice for ramBufferSizeMB

2014-05-16 Thread Gudrun Siedersleben

Thanks for your answer. At the moment we use one single thread for indexing. Working with several threads is a possibility we should try. Testing with different values for ramBufferSizeMB between 16 MB and 256MB showed that up from 128 MB there was no improvement as you already mentioned. Du

search time & number of segments

2014-05-16 Thread De Simone, Alessandro

Hello everyone, We have a performance issue ever since we stopped optimizing the index. We are using Lucene 4.8 (jvm 32bits for searching, 64bits for indexing) on Windows 2008R2. Now we are letting Lucene handle the merges using the default merge policy (TieredMergePolicy). We have narrowed do

Re: Best practice to map Lucene docids to real ids

2014-05-16 Thread Michael McCandless

On Tue, May 13, 2014 at 1:34 AM, Sven Teichmann wrote: > Hi, > > I also found this response very useful and right now I am playing around > with DocValues. > >> If the default DocValuesFormat isn't fast enough, you can always >> switch to e.g. DirectDocValuesFormat (uses lots of RAM but it just an

RE: best choice for ramBufferSizeMB

2014-05-16 Thread Baldwin, David

Is this true as well for 2.9.2? -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Wednesday, May 14, 2014 8:54 AM To: Lucene Users Subject: Re: best choice for ramBufferSizeMB Generally larger is better, as long as JVM's heap is big enough to allow IW t

Re: Merger performance degradation on 3.6.1

2014-05-16 Thread Robert Muir

addIndexes doesn't call maybeMerge, so i think you are just getting in a situation with too many segments, so applying deletes is slow. can you try calling IndexWriter.maybeMerge() after you call addIndexes? (it wont have immediate impact, you have to do some merges to get your index healthy again

Re: writer.updateDocument() not working (possible bug?)

2014-05-16 Thread Jamie

Michael How do you update a document that resides in the index without having the original document? Jamie On 2014/05/13, 3:30 PM, Michael McCandless wrote: How did you produce the document that you are sending to updateDocument? Are you loading it from IndexReader.document() or IndexSearch

Re: Can RAMDirectory work for gigabyte data which needs refreshing of the index all the time?

2014-05-16 Thread Steven Schlansker

On May 7, 2014, at 6:46 AM, Cheng wrote: > > I have an index of multiple gigabytes which serves 5-10 threads and needs > refreshing very often. I wonder if RAMDirectory is the good candidate for > this purpose. If not, what kind of directory is better? We found that loading and unloading RAMDir

Re: [lucene 4.6] NPE when calling IndexReader#openIfChanged

2014-05-16 Thread Michael McCandless

delGen=-1 means there are no deletions, but the exception makes no sense because up above SegmentReader.java calls si.hasDeletions() which returns delGen != -1 which should have mean Lucene40LiveDocsFormat.readLiveDocs should not have been called. It seems impossible :) What java version? Mike M

Re: How to locate a Phrase inside text (like a Browser text searcher)

2014-05-16 Thread Jack Krupansky

True, for the first two use cases, but as I indicated, the third use case is problematic since the token needs to be split. The n-gram solution does seem to cover it though, sort of. The n-gram solution doesn't cover "good morning, john" or "good morning - john", but that could be handled by h

Re: writer.updateDocument() not working (possible bug?)

2014-05-16 Thread Michael McCandless

reader.document(i) and searcher.doc(i) do the same thing: retrieve the stored fields. But neither method fully preserves indexing information; e.g., boosts are lost, details about how the field was indexed (e.g., DOCS_ONLY, et.c) are lost, etc. You can use the returned document to provide the val

Re: Merger performance degradation on 3.6.1

2014-05-16 Thread Michael McCandless

Hmm, try calling maybeMerge after each .addIndexes? Robert opened this issue to fix addIndexes: https://issues.apache.org/jira/browse/LUCENE-5672 Mike McCandless http://blog.mikemccandless.com On Wed, May 14, 2014 at 11:46 AM, danielv wrote: > Hi, > > We have about 550M records index (~800GB)

Re: How to locate a Phrase inside text (like a Browser text searcher)

2014-05-16 Thread Emanuel Buzek

Hi Teko, sure - I use Lucene though elasticsearch, but I suppose that doesnt make a difference in this situation. I needed something like what you were trying to accomplish - basically to search any substring... wildcarded queries worked but were kind of slow. This is my analyzer that works for me

RE: Issue with Lucene 3.6.1 and MMapDirectory

2014-05-16 Thread Uwe Schindler

Hi, > Now if I don't > close the old index reader I am noticing increases of virtual memory with > every new reindex reopen (this should not be an issue on 64 bit Linux > correct - this is the configuration I am using and the indexes are on a > shared mount NTFS file system ). This always bri

Re: How to add machine learning to Apache lucene

2014-05-16 Thread Koji Sekiguchi

Hi Priyanka, > How can I add Maching Learning Part in Apache Lucene . I think your question is too wide to asnwer because machine learning covers a lot of things... Lucene has already got a text categorization function which is a well known task of NLP and NLP is a part of machine learning. I'v

A work around to get matching terms from document - Stemmed and Synonyms

Re: How to add machine learning to Apache lucene

Re: writer.updateDocument() not working (possible bug?)

Re: How to locate a Phrase inside text (like a Browser text searcher)

Re: Can RAMDirectory work for gigabyte data which needs refreshing of the index all the time?

Re: How to locate a Phrase inside text (like a Browser text searcher)

AW: best choice for ramBufferSizeMB

search time & number of segments

Re: Best practice to map Lucene docids to real ids

RE: best choice for ramBufferSizeMB

Re: Merger performance degradation on 3.6.1

Re: writer.updateDocument() not working (possible bug?)

Re: Can RAMDirectory work for gigabyte data which needs refreshing of the index all the time?

Re: [lucene 4.6] NPE when calling IndexReader#openIfChanged

Re: How to locate a Phrase inside text (like a Browser text searcher)

Re: writer.updateDocument() not working (possible bug?)

Re: Merger performance degradation on 3.6.1

Re: How to locate a Phrase inside text (like a Browser text searcher)

RE: Issue with Lucene 3.6.1 and MMapDirectory

Re: How to add machine learning to Apache lucene

20 matches

Site Navigation

Mail list logo

Footer information