Re: Understanding Lucene's File Format

2010-09-17 Thread Michael McCandless
9 AM > To: solr-user@lucene.apache.org > Subject: Re: Understanding Lucene's File Format > > Yes. > > They are decoded from the deltas in the tii file into absolutes in memory, on > load. > > Note that trunk (w/ flex indexing) has changed this substantially: we store > only

RE: Understanding Lucene's File Format

2010-09-17 Thread Giovanni Fernandez-Kincade
Interesting. Thanks for your help Mike! -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Friday, September 17, 2010 10:29 AM To: solr-user@lucene.apache.org Subject: Re: Understanding Lucene's File Format Yes. They are decoded from the deltas i

Re: Understanding Lucene's File Format

2010-09-17 Thread Michael McCandless
he FreqDelta, ProxDelta, And SkipDelta > stored with each TermInfo are actually absolute? > > -Original Message- > From: Michael McCandless [mailto:luc...@mikemccandless.com] > Sent: Friday, September 17, 2010 5:24 AM > To: solr-user@lucene.apache.org > Subject: Re: U

RE: Understanding Lucene's File Format

2010-09-17 Thread Giovanni Fernandez-Kincade
day, September 17, 2010 5:24 AM To: solr-user@lucene.apache.org Subject: Re: Understanding Lucene's File Format The entry for each term in the terms dict stores a long file offset pointer, into the .frq file, and another long for the .prx file. But, these longs are delta-coded, so as you scan you

Re: Understanding Lucene's File Format

2010-09-17 Thread Michael McCandless
The entry for each term in the terms dict stores a long file offset pointer, into the .frq file, and another long for the .prx file. But, these longs are delta-coded, so as you scan you have to sum up these deltas to get the absolute file pointers. The terms index (once loaded into RAM) has absol

Understanding Lucene's File Format

2010-09-16 Thread Giovanni Fernandez-Kincade
Hi, I've been trying to understand Lucene's file format and I keep getting hung up on one detail - how can Lucene quickly find the frequency data (or proximity data) for a particular term? According to the file formats page on the Lucene website