Gora, thanks for the quick reply.

Yes, I'm aware of the differences between Solr vs. DBMS. We've actually
written some c++ analytical engine that can process through a billion tweets
with multiple facets drill down. We may end up cook our own in the end but
so far solr suites our needs quite well.  The multi-lingual tokenizer and
tika integration are all too addictive.

What you're suggesting is exactly what I'm doing. Trying to use dynamic
fields and copyTo to get all the information into one field, then run the
search over that.

However, this is not good enough.  Allow me to elaborate this using the same
Paris example again.  Let's say two urls, first has 10 people bookmarked and
second has 100. Let's say these two have roughly similar score if we squeeze
them into one single field. Then I'd like to rank the one with more users
higher.

Another way to look at this is PageRank relies on the the number and anchor
text of the incoming link, we're trying to use the number of people and
their keywords/comments as a weight for the link.

Alex


On Fri, Mar 4, 2011 at 6:29 PM, Gora Mohanty <g...@mimirtech.com> wrote:

> On Fri, Mar 4, 2011 at 10:24 AM, Alex Dong <a...@trunk.ly> wrote:
> > Hi there,  I need some advice on how to implement this using solr:
> >
> > We have two tables: urls and bookmarks.
> > - Each url has four fields:  {guid, title, text, url}
> > - One url will have one or more bookmarks associated with it. Each
> bookmark
> > has these: {link.guid, user, tags, comment}
> >
> > I'd like to return matched urls based on not only the "title, text" from
> the
> > url schema, but also some kind of aggregated popularity score based on
> all
> > "bookmarks" for the same url. The popularity score should base on
> > number/frequency of bookmarks that match the query.
> [...]
>
> It is best not to think of Solr as a RDBMS, and not to try to graft
> RDBMS practices on to it. Instead, you should flatten your data,
> e.g., in the above, you could have:
> * Four single-valued fields: guid, title, text, url
> * Four multi-valued fields: bookmark_guid, bookmark_user,
>  bookmark_tags, bookmark_comment
> Your index would contain one record per guid of the URL,
> and you would need to populate the multi-valued bookmark
> fields from all bookmark instances associated with that URL.
>
> Then one could either copy the relevant search fields to a full-text
> search field, and search only on that, or, e.g., search on bookmark_tags
> and bookmark_comment in addition to searching on title, and text.
>
> Regards,
> Gora
>

Reply via email to