Cardinality should be set to whatever the logical dimension of the
vector is -- it shouldn't be arbitrary. It's not like an "initial
size" of a list. If your'e dealing with vectors that have a
potentially unbounded maximum dimension, use Integer.MAX_VALUE.

As the name suggests, the implementation you use is for sparse
vectors, meaning dimensions without value have no representation. It
would be a pretty poor sparse implementation if these were not true.
So, no, the cardinality has no direct effect on memory.

On Fri, Jul 15, 2011 at 1:00 PM, marco turchi <marco.tur...@gmail.com> wrote:
> Hi
> thanks a lot
>
> I have also another problem ( :-) ). As I wrote in the previous email, I'm
> using the RandomAccessSparseVector representation to store sparse vectors. I
> need to sum some of them together, so I use the method plus but it seems
> that it requires the same vector cardinality. I set the initial cardinality
> of each vector to a big value, but I was wondering if it is a huge waste of
> memory or everything is optimized inside the   RandomAccessSparseVector
> class. In case, is there an optimal way to set the cardinality?
>
> Thanks again
> Marco
>
> On Fri, Jul 15, 2011 at 1:50 PM, Sean Owen <sro...@gmail.com> wrote:
>
>> This is simply Euclidean distance squared. Take the square root if you
>> need the simple Euclidean distance.
>>
>> On Fri, Jul 15, 2011 at 12:36 PM, marco turchi <marco.tur...@gmail.com>
>> wrote:
>> > Dear All,
>> > I'm a newcomer in Mahout and I'm try to compute the cosine similarity
>> > between two sparse vectors.
>> > I have loaded them using the class RandomAccessSparseVector. I notice
>> that
>> > there is a method called: getDistanceSquared. Which kind of vector
>> distance
>> > is implemented? Is there a method to compute directly this distance?
>> >
>> > Thanks a lot in advance for your help
>> > Marco
>> >
>>
>

Reply via email to