MemoryDiffStorage is the only class in Mahout that uses
CompactRunningAverage. If all of the compact averages are in two
arrays, the benefits of hardware caching go way up. This seems a more
compelling reason to recode MemoryDiffStorage :)

Lance

On Fri, Mar 25, 2011 at 9:22 PM, Lance Norskog <goks...@gmail.com> wrote:
>> I think that's the bit you're missing.
> 8 bits I'm missing, actually. BA-ZING!
>
> 64-bit implementations of Java align on 8-byte boundaries. A 2-byte
> value plus a 4-byte value will be packed into 8 bytes. An 8-byte
> double and a 4-byte int will add 4 useless bytes with alignment,
> giving 16 bytes..
>
> Including 16 bytes per object, Compact v.s. Full is 24 bytes v.s. 32
> bytes. Since there are 2 unused bytes in the Compact object, the
> 16-bit 'char' counter might as well be a 31-positive-bit counter.
> Similarly, the integer counter in FullRunningAverage could be a long
> instead of an int.
>
> For comparison, an empty String is 40 bytes, or 5 8-byte words.
>
> On Fri, Mar 25, 2011 at 2:33 AM, Sean Owen <sro...@gmail.com> wrote:
>> A char is a 2-byte unsigned value in Java, not 1-byte signed -- I
>> think that's the bit you're missing.
>>
>> Ignoring object overhead, there are 2 (char) + 4 (float) = 6 bytes of
>> payload in CompactRunningAverage. There are 4 (int) + 8 (double) = 12
>> bytes of payload in FullRunningAverage.
>>
>> I don't know what you mean that 'count' should be an int.
>>
>> I do agree that it would be more memory-efficient to make a sort of
>> array of these things, in cases where you have a dense sequence of
>> them. However the use case you cite isn't one of those -- those diffs
>> are sparse. You can add into the object the row/col position but then
>> that just eats up the space savings.
>>
>> On Fri, Mar 25, 2011 at 9:19 AM, Lance Norskog <goks...@gmail.com> wrote:
>>> The CompactRunningAverage class is
>>> 1) not compact, and
>>> 2) not coded right.
>>>
>>> The code using 'count' seems to think it is a short. The code assumes
>>> that shorts top out at 65535,
>>> but shorts are signed and thus top out at 32767.
>>> 'count' is declared as a 'char' so it will never reach 128, let alone 65536.
>>> The 'count' field uses 4 bytes so it should be an int.
>>> Any object has 16 bytes. The CompactRunningAverage has (at least) 24
>>> bytes. FullRunningAverage has (at least) 32 bytes.
>>>
>>> Something that supplies compact running averages should directly
>>> allocate and manage an array of shorts and an array of floats.
>>> MemoryDiffStorage is the only use of CompactRunningAverage in the code
>>> base. It can use "hundreds of thousands" of these counters.
>>> It also needs 8 bytes per counter for each object reference, v.s. 4
>>> bytes per array index.
>>>
>>> --
>>> Lance Norskog
>>> goks...@gmail.com
>>>
>>
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to