Re: Search Score percentage, Should not be relative to the highest score

2011-01-03 Thread Ahmet Arslan
It is somehow not recommended to convert scores to percentages. http://wiki.apache.org/lucene-java/ScoresAsPercentages   > When using lucene to search documents, the results have a > score based on their relativity to the search term. Inside > lucene, the score > percentage is calculated as a p

Re: Search Score percentage, Should not be relative to the highest score

2011-01-03 Thread Amr ElAdawy
Hi iorixxx, Thanks a lot for your reply I had read the link and I understand the concern, however, the normalization is happening inside lucene. Where the normalizing value is the inverse of the maxScore. I can alter the code to leave the original score, however it is a business requirements to

Re: IOException in updateDocument(term, document) method of IndexWriter

2011-01-03 Thread Michael McCandless
Can you post the full exception that you hit? And maybe a standalone test case showing the problem? Mike On Mon, Jan 3, 2011 at 1:03 AM, Atul Prajapati wrote: > Hi, > > > > we are calling updateDocument(term, document) method on IndexWriter and > after that we are calling close() method of inde

index files naming

2011-01-03 Thread Bernd Fehling
Dear list, some questions about the names of the index files. With an older Lucene/Solr 4.x version from trunk my index looks like: _2t1.fdt _2t1.fdx _2t1.fnm _2t1.frq _2t1.nrm _2t1.prx _2t1.tii _2t1.tis segments_2 segments.gen With a most recent version from trunk it looks like: _3a9.fdt _3a9.fd

Re: Search Score percentage, Should not be relative to the highest score

2011-01-03 Thread Ahmet Arslan
> I had read the link and I understand the concern, however, > the normalization > is happening inside lucene. Where the normalizing value is > the inverse of > the maxScore. > > I can alter the code to leave the original score, however > it is a business > requirements to view the matching percen

Re: index files naming

2011-01-03 Thread Simon Willnauer
Hey Bernd, On Mon, Jan 3, 2011 at 1:35 PM, Bernd Fehling wrote: > Dear list, > > some questions about the names of the index files. > With an older Lucene/Solr 4.x version from trunk my index looks like: > _2t1.fdt > _2t1.fdx > _2t1.fnm > _2t1.frq > _2t1.nrm > _2t1.prx > _2t1.tii > _2t1.tis > seg

Re: Search Score percentage, Should not be relative to the highest score

2011-01-03 Thread Amr ElAdawy
Consider the following. Query: term1 term2 Doc1: term1 term2 Doc2: term1 term2 term3 term4 Doc3: term1 term1 term3 Doc4: term3 term4 For the above documents, Doc1 and Doc2 will b exact match ( as they contain all the terms in the search Query). Doc3 is partially match as it contains term1 only

Re: index files naming

2011-01-03 Thread Bernd Fehling
Hi Simon, thanks a lot for your good explanation. Best wishes, Bernd Am 03.01.2011 13:51, schrieb Simon Willnauer: > Hey Bernd, > > On Mon, Jan 3, 2011 at 1:35 PM, Bernd Fehling > wrote: >> Dear list, >> >> some questions about the names of the index files. >> With an older Lucene/Solr 4.x ve

RE: IOException in updateDocument(term, document) method of IndexWriter

2011-01-03 Thread Atul Prajapati
Hi, Right now we don't have full stack trace for this exception and this issue is not easily replicable. We have updated our code to log full stack trace and once we get this replicated i will post the full stack trace here. If anyone have any idea about this then please let us know so we can inv

Re: Using Lucene to search live, being-edited documents

2011-01-03 Thread Grant Ingersoll
There is also the MemoryIndex, which is in contrib and is designed for one document at a time. That being said, basic grep/regex is probably fast enough. -Grant On Dec 29, 2010, at 9:27 PM, Lance Norskog wrote: > Check out the Instantiated contrib for Lucene. This is an alternative > in-memory

Re: Using Lucene to search live, being-edited documents

2011-01-03 Thread Robert Muir
On Mon, Jan 3, 2011 at 10:16 AM, Grant Ingersoll wrote: > There is also the MemoryIndex, which is in contrib and is designed for one > document at a time.  That being said, basic grep/regex is probably fast > enough. > In cases where you are doing a 'find' in a document similar to what a wordpr

Re: Search Score percentage, Should not be relative to the highest score

2011-01-03 Thread Ahmet Arslan
So, can we say that if you have something that gives you the "how many query terms matched" info, will that satisfy your requirement? Query: term1 term2 Doc1: term1 term2 => n=2 => %100 Doc2: term1 term2 term3 term4 => n=2 => %100 Doc3: term1 term1 term3 => n=1 => %50 Doc4: term2 term3 ter

Indexing large XML dumps

2011-01-03 Thread Alex vB
Hello everybody, I am currently indexing wikipedia dumps and create an index for versioned document collections. As far everything is working fine but I have never thought that single articles of wikipedia would reach a size of around 2 GB! One article for example has 2 versions with an avera

Re: parsing Java log file with Lucene 3.0.3

2011-01-03 Thread Benzion G
Thank you guys! Looks like SimpleAnalyzer is OK for my application. I'm still testing but meanwhile it looks good. -- View this message in context: http://lucene.472066.n3.nabble.com/parsing-Java-log-file-with-Lucene-3-0-3-tp2173046p2190354.html Sent from the Lucene - Java Users mailing list ar