Hi Neal,

this sounds pretty similar to me. Did a lot of those projects some years ago
(with Lucene low-level API)!

I didn't understand, is the corpus of documents you want to use to classify
fix?

>>previously suggested procedure of 1) store document 2) execute
>>more-like-this and 3) delete document would be too slow.
Do you mean the document to classify?
Why do you then want to put it into the index (very expensive), you just
need the contents of it to build a query!

Regards

Hannes

On Mon, Jan 26, 2009 at 6:29 PM, Neal Richter <nrich...@gmail.com> wrote:

> Hey all,
>
>  I'm in the processing of implementing a system to do 'text
> classification' with Solr.  The basic idea is to take an
> ontology/taxonomy like dmoz of {label: "X", tags: "a,b,c,d,e"}, index
> it and then classify documents into the taxonomy by pushing parsed
> document into the Solr search API.  Why?  Lucene/Solr's ability to do
> weighted term boosting at both search and index time has lots of
> obvious uses here.
>
>  Has anyone worked on this or a similar project yet?  I've seen some
> talk on the list about this area but it's pretty thin... December
> thread "Taxonomy Support on Solr".  I'm assuming Grant Ingersoll is
> looking at similar things with his 'taming text' project.
>
> I store the 'documents' in another repository and they are far too
> dynamic (write intensive) for direct indexing in Solr... so the
> previously suggested procedure of 1) store document 2) execute
> more-like-this and 3) delete document would be too slow.
>
> If people are interested I could start a JIRA issue on this (I do not
> see anything there at the moment).
>
> Thanks - Neal Richter
> http://aicoder.blogspot.com
>

Reply via email to