subject:"Document term vectors in Lucene 4"

Re: Document term vectors in Lucene 4

2013-01-18 Thread Jon Stewart

Thanks! I still can't see what was wrong with my original code--must have been a dumb typo somewhere--but starting over from that example now works on indices generated from my real indexing code. I will try to blog about it next week so there is some sample code up on the web for anyone else searc

Re: Document term vectors in Lucene 4

2013-01-18 Thread Ian Lea

To get stats from the whole index I think you need to come at this from a different direction. See the 4.0 migration guide for some details. With a variation on your code and 2 docs doc1: foobar qux quote doc2: foobar qux qux quorum this code snippet Fields fields = MultiFields.getFiel

Re: Document term vectors in Lucene 4

2013-01-17 Thread Jon Stewart

D'oh Thanks! Does TermsEnum.totalTermFreq() return the per-doc frequencies? It looks like it empirically, but the documentation refers to corpus usage, not document.field usage. Jon On Thu, Jan 17, 2013 at 10:00 AM, Ian Lea wrote: > typo time. You need doc2.add(...) not 2 doc.add(...) stat

Re: Document term vectors in Lucene 4

2013-01-17 Thread Ian Lea

typo time. You need doc2.add(...) not 2 doc.add(...) statements. -- Ian. On Thu, Jan 17, 2013 at 2:49 PM, Jon Stewart wrote: > On Thu, Jan 17, 2013 at 9:08 AM, Robert Muir wrote: >> Which statistics in particular (which methods)? > > I'd like to know the frequency of each term in each docume

Re: Document term vectors in Lucene 4

2013-01-17 Thread Jon Stewart

On Thu, Jan 17, 2013 at 9:08 AM, Robert Muir wrote: > Which statistics in particular (which methods)? I'd like to know the frequency of each term in each document. Those term counts for the most frequent terms in the corpus will make it into the document vectors for clustering. Looking at Terms

Re: Document term vectors in Lucene 4

2013-01-17 Thread Robert Muir

Which statistics in particular (which methods)? On Thu, Jan 17, 2013 at 5:10 AM, Jon Stewart wrote: > Thanks very much for your reply, Ian. > > I am using SlowCompositeReaderWrapper because I am also retrieving the > term frequency statistics for the corpus (at the end of the day, I am > doing so

Re: Document term vectors in Lucene 4

2013-01-17 Thread Jon Stewart

Thanks very much for your reply, Ian. I am using SlowCompositeReaderWrapper because I am also retrieving the term frequency statistics for the corpus (at the end of the day, I am doing some machine learning/document clustering). Despite its name and warning documentation not to use it, SlowComposi

Re: Document term vectors in Lucene 4

2013-01-17 Thread Ian Lea

When I run your code, as is except for using RAMDirectory and setting up an IndexWriter using StandardAnalyzer RAMDirectory dir = new RAMDirectory(); Analyzer anl = new StandardAnalyzer(Version.LUCENE_40); IndexWriterConfig iwcfg = new IndexWriterConfig(Version.LUCENE_40, a

Document term vectors in Lucene 4

2013-01-16 Thread Jon Stewart

Hello, I cannot extract document term vectors from an index, and have not turned up much in some determined googling. In short, when I call IndexReader.getTermVector(docID, field) or IndexReader.getTermVectors(docID) and then navigate down to the Terms for the specified field, I get a null result.

Re: Document term vectors in Lucene 4

Re: Document term vectors in Lucene 4

Re: Document term vectors in Lucene 4

Re: Document term vectors in Lucene 4

Re: Document term vectors in Lucene 4

Re: Document term vectors in Lucene 4

Re: Document term vectors in Lucene 4

Re: Document term vectors in Lucene 4

Document term vectors in Lucene 4

9 matches

Site Navigation

Mail list logo

Footer information