Really a database is the way to go. Any modern database should allow
you to check if a value is in there before inserting, so the same
tweet won't go in there twice. Additionally, not every user search has
to use the up to the minute results. They can go back just a little in
time (30 seconds or so) to have batches in case there are multiple
people searching for the same thing.

On Oct 2, 7:36 am, Bjoern <bjoer...@googlemail.com> wrote:
> Hi,
>
> just wondering about a best practice thing. Suppose I show results of
> specific Twitter searches on a web site. How would I go about caching
> the searches?
>
> The naive approach seems to be to first check in my own database, then
> do a twitter search with the since_id parameter to only get results I
> don't already have. Then store the results from twitter in the
> database, too, and return the merged results to the web site.
>
> The problem I see is that if multiple user run the same search on my
> web site, threading issues might occur (as each user starts a separate
> thread on my server). Not only could multiple twitter searches with
> the same since_id be executed (maybe forgivable), but trouble starts
> when said results are to be inserted in my local database. Different
> threads could attempt to insert the same messages into my database.
>
> One simple solution I could imagine: just use the message ids from
> twitter as the primary key in my local database. That way, multiple
> threads saving the same message would just overwrite the message with
> itself. I actually wonder if that is a common solution - to use the
> twitter ids as primary keys (also for users, direct messages...). I
> have kind of arrived at the opinion that this would be the way of
> least resistance, although I feel a bit uneasy about it.
>
> An alternative that came to my mind might be to have single threaded
> background jobs do the copying of the search results from twitter to
> my database, and only show the results from my cache to the web site.
> This would cause some lag in the time the search results would appear,
> but it would not be too bad. However, if I have a lot of different
> searches, it would become infeasible to update all of them
> periodically. It would become necessary to only trigger an update when
> a user does the search. At that point things might get overly
> complicated: presumably I would need some kind of Ajax solution to
> trigger the "caching" with the first request, show a spinner while the
> updating of my local db is going on, and then show the results from my
> local cache/db. The trickiest part being to prevent the starting of
> multiple update tasks for the same search.
>
> All in all the simple solution might be the better way to go?
>
> Would be interested in hearing your opinions, experiences and
> solutions!
>
> Thanks!
>
> Björn

Reply via email to