> On Aug 4, 2013, at 5:54 PM, Ted Dunning <ted.dunn...@gmail.com> wrote: > > On Sun, Aug 4, 2013 at 5:34 PM, Pat Ferrel <p...@occamsmachete.com> wrote: > >> Actually this brings up another point that I've harped on before. It sure >> would be nice to have a vector representation where you could attache >> arbitrary data to items or vectors. Not so memory efficient but it makes >> things like ID translation and timestamping actions trivial. If these could >> be attached and survive all the Mahout jobs there would be no need for the >> in-memory hashmap I'm using to translate IDs and the actions could be >> timestamped or other metadata could be attached. At present I guess >> everyone knows that only weights are attached to actions/matrix values and >> in some cases names to rows/vectors in DRMs. >> > > This is where we started, actually. The memory cost was fairly massive for > arbitrary objects being attached to sparse matrices. The problem is that > the cost of the annotations isn't amortized very far in long-tail > situations. > No doubt but they are optional so as long as people understand the cost… But maybe you are talking about the cost of merely allowing arbitrary attachments.
> If we restrict our attention to text annotations, then a heavily compressed > form might well be feasible. > That would be fine with me. If we could do ID strings alone that would be super helpful.