Re: Feature reduction for LibLinear weights

2013-04-24 Thread Ken Krugler
Hi Ted, On Apr 13, 2013, at 8:46pm, Ted Dunning wrote: On Sat, Apr 13, 2013 at 7:05 AM, Ken Krugler kkrugler_li...@transpac.comwrote: On Apr 12, 2013, at 11:55pm, Ted Dunning wrote: The first thing to try is feature hashing to reduce your feature vector size. Unfortunately

Re: Feature reduction for LibLinear weights

2013-04-24 Thread Ted Dunning
Glad to be able to help. Double hashing would probably allow you to preserve full accuracy at higher compression, but if you are happy, then you might as well be done. On Wed, Apr 24, 2013 at 1:56 PM, Ken Krugler kkrugler_li...@transpac.comwrote: Hi Ted, On Apr 13, 2013, at 8:46pm, Ted

Re: Feature reduction for LibLinear weights

2013-04-18 Thread Ted Dunning
On Wed, Apr 17, 2013 at 2:29 PM, Ken Krugler kkrugler_li...@transpac.comwrote: Though I haven't yet found a good write-up on the value of generating more than one hash - seems like multiple hash values would increase the odds of collisions. It does. But it also increases the chances of

Re: Feature reduction for LibLinear weights

2013-04-17 Thread Ken Krugler
Hi Ted, On Apr 13, 2013, at 8:46pm, Ted Dunning wrote: On Sat, Apr 13, 2013 at 7:05 AM, Ken Krugler kkrugler_li...@transpac.comwrote: On Apr 12, 2013, at 11:55pm, Ted Dunning wrote: The first thing to try is feature hashing to reduce your feature vector size. Unfortunately

Re: Feature reduction for LibLinear weights

2013-04-13 Thread Ted Dunning
The first thing to try is feature hashing to reduce your feature vector size. With multiple probes and possibly with random weights you might be able to drop the size by 10x. Sent from my iPhone On Apr 12, 2013, at 18:30, Ken Krugler kkrugler_li...@transpac.com wrote: Hi all, We're

Re: Feature reduction for LibLinear weights

2013-04-13 Thread Ken Krugler
On Apr 12, 2013, at 11:55pm, Ted Dunning wrote: The first thing to try is feature hashing to reduce your feature vector size. Unfortunately LibLinear takes feature indices directly (assumes they're sequential ints from 0..n-1), so I don't think feature hashing will help here. If I

Re: Feature reduction for LibLinear weights

2013-04-13 Thread Ted Dunning
On Sat, Apr 13, 2013 at 7:05 AM, Ken Krugler kkrugler_li...@transpac.comwrote: On Apr 12, 2013, at 11:55pm, Ted Dunning wrote: The first thing to try is feature hashing to reduce your feature vector size. Unfortunately LibLinear takes feature indices directly (assumes they're sequential

Feature reduction for LibLinear weights

2013-04-12 Thread Ken Krugler
Hi all, We're (ab)using LibLinear (linear SVM) as a multi-class classifier, with 200+ labels and 400K features. This results in a model that's 800MB, which is a bit unwieldy. Unfortunately LibLinear uses a full array of weights (nothing sparse), being a port from the C version. I could do