On Tue, Jan 27, 2009 at 1:36 AM, Hannes Carl Meyer <m...@hcmeyer.com> wrote:
> Yeah, know it, the challenge on this method is the calculation of the score
> and parametrization of thresholds.

Not as worried about score itself as the score thresholds for prediction in/out.

> Is it really neccessary to use Solr for it? Things going much faster with
> Lucene low-level api and much faster if you're loading the classification
> corpus into the RAM.

Good points.  At the moment I'd rather have a daemon with a service
API.. as well as the filtering/tokenization capabilities Solr has
built in.  Probably will attempt to get the corpus' index in memory
via large memory allocation.

If it doesn't scale then I'll either go to Lucene api or implement a
custom inverted index via memcached.

Other note /at the moment/ is that it's not going to be a deeply
hierarchical taxonomy, much less a full indexing of an RDF/OWL
schema.. there are some gotchas for that.

Thanks - Neal

Reply via email to