Re: Dictionary file format in Lucene-Mahout integration

2013-06-06 Thread Grant Ingersoll
___ > From: Grant Ingersoll > To: user@mahout.apache.org; James Forth > Sent: Wednesday, June 5, 2013 10:46 AM > Subject: Re: Dictionary file format in Lucene-Mahout integration > > > {code} > File dictOutFile = new File(dictOut); > log.info("Dictionary Output f

Re: Dictionary file format in Lucene-Mahout integration

2013-06-06 Thread Suneel Marthi
rsoll To: user@mahout.apache.org; James Forth Sent: Wednesday, June 5, 2013 10:46 AM Subject: Re: Dictionary file format in Lucene-Mahout integration {code} File dictOutFile = new File(dictOut);     log.info("Dictionary Output file: {}", dictOutFile);     Writer writer = Files.ne

Re: Dictionary file format in Lucene-Mahout integration

2013-06-05 Thread Grant Ingersoll
{code} File dictOutFile = new File(dictOut); log.info("Dictionary Output file: {}", dictOutFile); Writer writer = Files.newWriter(dictOutFile, Charsets.UTF_8); DelimitedTermInfoWriter tiWriter = new DelimitedTermInfoWriter(writer, delimiter, field); try { tiWriter.write(termI

Re: Dictionary file format in Lucene-Mahout integration

2013-06-05 Thread Suneel Marthi
AM Subject: RE: Dictionary file format in Lucene-Mahout integration Hi James, The seq2sparse class generate the dictionary in sequence file format with "Key" as Text and Value as "Intwritable". You might need to generate the dictionary file in this format. Thanks Stuti -

RE: Dictionary file format in Lucene-Mahout integration

2013-06-05 Thread Stuti Awasthi
...@yahoo.com] Sent: Wednesday, June 05, 2013 9:55 AM To: user@mahout.apache.org; James Forth Subject: Re: Dictionary file format in Lucene-Mahout integration Never used lucene.vector myself,  thinking loud here. Assuming that dict.out is in TextFormat. You could use 'seqdirectory' to conver

Re: Dictionary file format in Lucene-Mahout integration

2013-06-04 Thread Suneel Marthi
Never used lucene.vector myself,  thinking loud here. Assuming that dict.out is in TextFormat. You could use 'seqdirectory' to convert dict to a sequencefileformat. This can then be fed into cvb. From: James Forth To: "user@mahout.apache.org" Sent: Tuesda