subject:"Hash\-coded Vectorization and bogus information"

Re: Hash-coded Vectorization and bogus information

2012-02-13 Thread Ted Dunning

On Tue, Feb 14, 2012 at 2:25 AM, Lance Norskog wrote: > ... > OnlineLogisticRegression allocates DenseVector/DenseMatrix objects- if > it used RandomSparse Vector/Matrix could it operate on million-term > sparse arrays? > Not likely. The feature vectors that come in are sparse and the updates t

Re: Hash-coded Vectorization and bogus information

2012-02-13 Thread Lance Norskog

This is in the context of playing with the example classification scripts in mahout/examples/bin. OnlineLogisticRegression allocates DenseVector/DenseMatrix objects- if it used RandomSparse Vector/Matrix could it operate on million-term sparse arrays? The problem is that seq2sparse has several te

Re: Hash-coded Vectorization and bogus information

2012-02-12 Thread Ted Dunning

If you don't use hashed encoding you lose the single pass nature of the example. Also many real applications require huge vocabularies which make non hashed representations infeasible due to memory use in the logistic regression models. Sent from my iPhone On Feb 12, 2012, at 20:53, Lance No

Re: Hash-coded Vectorization and bogus information

2012-02-12 Thread Lance Norskog

Ah! Ok. The SGD examples in examples/bin/asf-examples.sh and examples/bin/classify-twentynewsgroups.sh both use hash vectorization. Should they use the sparse term vectors instead? The "new" Bayes examples (nbtrain and nbtest) in asf-examples.sh use sparse. On Sun, Feb 12, 2012 at 7:00 AM, Ted Dun

Re: Hash-coded Vectorization and bogus information

2012-02-12 Thread Ted Dunning

Hash coded vectorization *is* a random projection. It is just one that preserves some degree of sparsity. It definitely loses information when you use it to decrease dimension of the input. It does not "add bogus information". SGD doesn't like dense vectors, actually. In fact, one of the nice

Hash-coded Vectorization and bogus information

2012-02-11 Thread Lance Norskog

Does hash-coded vectorization add bogus information compared to sparse term vectors? A more concrete question: would a random projection on the sparse vector give a "better quality" dense vector? (This is in the context of SGD classification, which "likes" dense vectors.) -- Lance Norskog goks...

Re: Hash-coded Vectorization and bogus information

Re: Hash-coded Vectorization and bogus information

Re: Hash-coded Vectorization and bogus information

Re: Hash-coded Vectorization and bogus information

Re: Hash-coded Vectorization and bogus information

Hash-coded Vectorization and bogus information

6 matches

Site Navigation

Mail list logo

Footer information