Stuthi, seq2sparse is not the right tool if the input is lucene indexes and one 
would have to go with lucene.vectors for the same given the input.




________________________________
From: Stuti Awasthi <stutiawas...@hcl.com>
To: "user@mahout.apache.org" <user@mahout.apache.org>; James Forth 
<jjamesfo...@yahoo.com> 
Sent: Wednesday, June 5, 2013 5:30 AM
Subject: RE: Dictionary file format in Lucene-Mahout integration
 

Hi James,
The seq2sparse class generate the dictionary in sequence file format with "Key" 
as Text and Value as "Intwritable". You might need to generate the dictionary 
file in this format.

Thanks
Stuti

-----Original Message-----
From: Suneel Marthi [mailto:suneel_mar...@yahoo.com] 
Sent: Wednesday, June 05, 2013 9:55 AM
To: user@mahout.apache.org; James Forth
Subject: Re: Dictionary file format in Lucene-Mahout integration

Never used lucene.vector myself,  thinking loud here. Assuming that dict.out is 
in TextFormat.
You could use 'seqdirectory' to convert dict to a sequencefileformat. 

This can then be fed into cvb.




________________________________
From: James Forth <jjamesfo...@yahoo.com>
To: "user@mahout.apache.org" <user@mahout.apache.org> 
Sent: Tuesday, June 4, 2013 8:00 PM
Subject: Dictionary file format in Lucene-Mahout integration


Hello,


I’m wondering if anyone can help with a question about the dictionary format in
lucene.vector-cvb integration.  I’ve previously used the pathway from text
files:  seqdirectory >
seq2sparse > rowid > cvb  and it works fine.  The
dictionary created by seq2sparse is in sequence file format, and this is 
accepted by cvb.

But when using a pathway from a lucene index:  lucene.vector > cvb  there is a 
problem with cvb throwing the error “dict.out not a SequenceFile”. 
Lucene.vector appears to generate a dictionary in plain text format, but cvb
requires it in sequence file format.

Does anyone know how to use lucence.vector with cvb, which I assume means
obtaining a dictionary as a sequence file from lucene.vector?

Thanks for your help.

James


::DISCLAIMER::
----------------------------------------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended 
for the named recipient(s) only.
E-mail transmission is not guaranteed to be secure or error-free as information 
could be intercepted, corrupted,
lost, destroyed, arrive late or incomplete, or may contain viruses in 
transmission. The e mail and its contents
(with or without referred errors) shall therefore not attach any liability on 
the originator or HCL or its affiliates.
Views or opinions, if any, presented in this email are solely those of the 
author and may not necessarily reflect the
views or opinions of HCL or its affiliates. Any form of reproduction, 
dissemination, copying, disclosure, modification,
distribution and / or publication of this message without the prior written 
consent of authorized representative of
HCL is strictly prohibited. If you have received this email in error please 
delete it and notify the sender immediately.
Before opening any email and/or attachments, please check them for viruses and 
other defects.

----------------------------------------------------------------------------------------------------------------------------------------------------

Reply via email to