Alternatively, have you considered storing(or i should say indexing) the search logs with Solr?

This lets you text search across your search queries. You can perform time range queries with solr as well.

Programmer and UC Irvine Graduate Student
Find a great grad school based on research interests:

On 7/26/10 4:43 PM, Mark wrote:
We are thinking about using Cassandra to store our search logs. Can someone point me in the right direction/lend some guidance on design? I am new to Cassandra and I am having trouble wrapping my head around some of these new concepts. My brain keeps wanting to go back to a RDBMS design.

We will be storing the user query, # of hits returned and their session id. We would like to be able to answer the following questions.

- What is the n most popular queries and their counts within the last x (mins/hours/days/etc). Basically the most popular searches within a given time range. - What is the most popular query within the last x where hits = 0. Same as above but with an extra "where" clause
- For session id x give me all their other queries
- What are all the session ids that searched for 'foos'

We accomplish the above functionality w/ MySQL using 2 tables. One for the raw search log information and the other to keep the aggregate/running counts of queries.

Would this sort of ad-hoc querying be better implemented using Hadoop + Hive? If so, should I be storing all this information in Cassandra then using Hadoop to retrieve it?

Thanks for your suggestions

Reply via email to