On Thu, 21 Jan 2016 08:53:10 -0800 (PST)
John Hardin wrote:

> There was an improvement in FP and FN from two tokens. The marginal 
> improvement from three doesn't seem worth it.

The improvement from 2 to 3 is more substantial than from 1 to 2

 287/160 = 1.79

 160/69  = 2.3

Whether any of this is worth it depends on a lot of things. I don't
think it's even obvious whether 3-word tokenization is more resource
intensive than 2-word. Clearly in the limit where ntokens goes to
infinity  3-word will outperform 2-word at the same database size,
which means that it can achieve the same level of performance with a
smaller database. I've no feeling for what value of ntokens that
switches around.


Reply via email to