___
> From: Grant Ingersoll
> To: user@mahout.apache.org; James Forth
> Sent: Wednesday, June 5, 2013 10:46 AM
> Subject: Re: Dictionary file format in Lucene-Mahout integration
>
>
> {code}
> File dictOutFile = new File(dictOut);
> log.info("Dictionary Output f
rsoll
To: user@mahout.apache.org; James Forth
Sent: Wednesday, June 5, 2013 10:46 AM
Subject: Re: Dictionary file format in Lucene-Mahout integration
{code}
File dictOutFile = new File(dictOut);
log.info("Dictionary Output file: {}", dictOutFile);
Writer writer = Files.ne
{code}
File dictOutFile = new File(dictOut);
log.info("Dictionary Output file: {}", dictOutFile);
Writer writer = Files.newWriter(dictOutFile, Charsets.UTF_8);
DelimitedTermInfoWriter tiWriter = new DelimitedTermInfoWriter(writer,
delimiter, field);
try {
tiWriter.write(termI
AM
Subject: RE: Dictionary file format in Lucene-Mahout integration
Hi James,
The seq2sparse class generate the dictionary in sequence file format with "Key"
as Text and Value as "Intwritable". You might need to generate the dictionary
file in this format.
Thanks
Stuti
-
...@yahoo.com]
Sent: Wednesday, June 05, 2013 9:55 AM
To: user@mahout.apache.org; James Forth
Subject: Re: Dictionary file format in Lucene-Mahout integration
Never used lucene.vector myself, thinking loud here. Assuming that dict.out is
in TextFormat.
You could use 'seqdirectory' to conver
Never used lucene.vector myself, thinking loud here. Assuming that dict.out is
in TextFormat.
You could use 'seqdirectory' to convert dict to a sequencefileformat.
This can then be fed into cvb.
From: James Forth
To: "user@mahout.apache.org"
Sent: Tuesda