Yes, the "bigram" in that demo only has two characters, which could
separate different character sets. -Xiangrui
On Wed, Oct 1, 2014 at 2:54 PM, Liquan Pei wrote:
> The program computes hashing bi-gram frequency normalized by total number of
> bigrams then filter out zero values. hashing is a eff
The program computes hashing bi-gram frequency normalized by total number
of bigrams then filter out zero values. hashing is a effective trick of
vectorizing features. Take a look at
http://en.wikipedia.org/wiki/Feature_hashing
Liquan
On Wed, Oct 1, 2014 at 2:18 PM, Soumya Simanta
wrote:
> I'm
I'm trying to understand the intuition behind the features method that
Aaron used in one of his demos. I believe this feature will just work for
detecting the character set (i.e., language used).
Can someone help ?
def featurize(s: String): Vector = {
val n = 1000
val result = new Array[Doub