Thanks! I still can't see what was wrong with my original code--must
have been a dumb typo somewhere--but starting over from that example
now works on indices generated from my real indexing code. I will try
to blog about it next week so there is some sample code up on the web
for anyone else searc
To get stats from the whole index I think you need to come at this
from a different direction. See the 4.0 migration guide for some
details.
With a variation on your code and 2 docs
doc1: foobar qux quote
doc2: foobar qux qux quorum
this code snippet
Fields fields = MultiFields.getFiel
D'oh Thanks!
Does TermsEnum.totalTermFreq() return the per-doc frequencies? It
looks like it empirically, but the documentation refers to corpus
usage, not document.field usage.
Jon
On Thu, Jan 17, 2013 at 10:00 AM, Ian Lea wrote:
> typo time. You need doc2.add(...) not 2 doc.add(...) stat
typo time. You need doc2.add(...) not 2 doc.add(...) statements.
--
Ian.
On Thu, Jan 17, 2013 at 2:49 PM, Jon Stewart
wrote:
> On Thu, Jan 17, 2013 at 9:08 AM, Robert Muir wrote:
>> Which statistics in particular (which methods)?
>
> I'd like to know the frequency of each term in each docume
On Thu, Jan 17, 2013 at 9:08 AM, Robert Muir wrote:
> Which statistics in particular (which methods)?
I'd like to know the frequency of each term in each document. Those
term counts for the most frequent terms in the corpus will make it
into the document vectors for clustering.
Looking at Terms
Which statistics in particular (which methods)?
On Thu, Jan 17, 2013 at 5:10 AM, Jon Stewart
wrote:
> Thanks very much for your reply, Ian.
>
> I am using SlowCompositeReaderWrapper because I am also retrieving the
> term frequency statistics for the corpus (at the end of the day, I am
> doing so
Thanks very much for your reply, Ian.
I am using SlowCompositeReaderWrapper because I am also retrieving the
term frequency statistics for the corpus (at the end of the day, I am
doing some machine learning/document clustering). Despite its name and
warning documentation not to use it, SlowComposi
When I run your code, as is except for using RAMDirectory and setting
up an IndexWriter using StandardAnalyzer
RAMDirectory dir = new RAMDirectory();
Analyzer anl = new StandardAnalyzer(Version.LUCENE_40);
IndexWriterConfig iwcfg = new IndexWriterConfig(Version.LUCENE_40, a
Hello,
I cannot extract document term vectors from an index, and have not
turned up much in some determined googling. In short, when I call
IndexReader.getTermVector(docID, field) or
IndexReader.getTermVectors(docID) and then navigate down to the Terms
for the specified field, I get a null result.