Re: Item catagorization problem.

Dennis Gearon Thu, 23 Dec 2010 10:16:32 -0800

Doesn't indexing analyzing do this to some degree anyway?

Not sure the alogrithm, but something like: How often, hom much near the top, 
how many differnt forms, subject or object of a sentence. That has to have some 
relevance to what category something is in.

The simplest extension to that would be something like a 'sub vocabulary' cross 
listing. If such and such words were hi relevance, then the subject is about 
this or that.

The smartest categorizer is your users, though. So the best way to make that 
list is to keep track of how close to the top of the search results did a user 
respond to his search results and what were the words, and how many search 
attempts did it take. That's waht netflix does. Their goal is to have users get 
something in theh top three off the first search attempt.

 Dennis Gearon

Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

----- Original Message ----
From: Erick Erickson <erickerick...@gmail.com>
To: solr-user@lucene.apache.org
Sent: Thu, December 23, 2010 10:00:05 AM
Subject: Re: Item catagorization problem.

What you're asking for appears to me to be "auto-categorization", and
there's nothing built into Solr to do this. Somehow you need to analyze
the documents at index time and add the proper categories, but I have
no clue how. This is especially hard with short fields since most
auto-categorization algorithms try to do some statistical analysis
of the document to figure this out.

Best
Erick

On Thu, Dec 23, 2010 at 8:12 AM, Hasnain <hasn...@hotmail.com> wrote:

>
> Hi all,
>
>       I am using solr in my web application for search purposes. However, i
> am having a problem with the default behaviour of the solr search.
>
> From my understanding, if i query for a keyword, let's say "Laptop",
> preference is given to result rows having more occurences of the search
> keyword "Laptop" in the field "name". This, however, is producing
> undesirable scenarios, for example:
>
> 1. I index an item A with "name" value "Sony Laptop".
> 2. I index another item B with "name" value: "Laptop bags for laptops".
> 3. I search for the keyword "Laptop"
>
> According to the default behaviour, precedence would be given to item B
> since the keyword appears more times in the "name" field for that item.
>
> In my schema, i have another field by the name of "Category" and, for
> example's sake, let's assume that my application supports only two
> categories: computers and accessories. Now, what i require is a mechanism
> to
> assign correct categories to the items during item indexing so that this
> field can be used to better filter the search results, item A would belong
> to "Computer" category and item B would belong to "Accessories" category.
> So
> then, searching for "Laptop" would only look for items in the "Computers"
> category and return item A only.
>
> I would like to point out here that setting the category field manually is
> not an option since the data might be in the vicinity of thousands of
> records. I am not asking for an in-depth algorithm. Just a high level
> design
> would be sufficient to set me in the right direction.
>
> thanks.
>
>
> --
> View this message in context:
>http://lucene.472066.n3.nabble.com/Item-catagorization-problem-tp2136415p2136415.html
>l
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Item catagorization problem.

Reply via email to