[SiliconBeach] Re: Niche Search problems & finding Help

Samuel Bishop Mon, 01 Jun 2009 01:21:49 -0700

Interesting but it didnt take long to notice they have the same  
limitations i wanted to work around as well.
you cant search for something like '25%', you just get results for '25'
which is a major limitation really.
Why on earth twitter bothered supporting unicode i have no idea. You  
cant search for anything besides alphanumerics & probably a half dozen  
or less symbols which makes them nearly impossible to find again  
without wasting chars on hashtags you shouldn't need.


Id love to dig deeper into the concept of building a twtitter index.  
But time, money and the lack of both do not give me much chance to do  
10% of the things i think of.
Despite a very clean concept of what the index would enable & possibly  
make bucket-loads of money off.

I suspect that topsy is seeking to deal with it the same way twitter  
is. Irony here is that the rank by followers has been done before, all  
these guys are doing its weighting the search results using the same  
method... and someone gave them 15 Million to work on it... I come up  
with crap like that and think it would make a good feature. What is  
with these people building whole companies of stuff that should simply  
be a damn feature... Ill cut myself off there before i get into a rant  
about stupid people with too much money & their lucky buddies cashing  
in.
On 01/06/2009, at 2:12 PM, David Jones wrote:

> BTW - this seems related - one of the founders is out of the  
> security/reputation space (cloudmark/Vipul's razor)
> http://blogs.wsj.com/venturecapital/2009/05/27/topsy-bets-on-real-time-twitter-search-with-15m-backing/
>
>
> On Sun, May 31, 2009 at 6:00 PM, Andrew J <ajes...@gmail.com> wrote:
>
> Hi Samuel,
>
> Excuse what may seem like a bit of a ramble, but have been thinking
> about these sort of issues myself of late. My thoughts:
>
> If you really need an complete index, and can't do a low-pass filter
> to keep the volume of data manageable, then you are talking about a
> fairly serious amount of CPU, bandwidth, and storage. I rather doubt
> you'll find this anywhere for free, even within the fairly generous
> free provisions that GAE provides. Presumably your index will need
> some pre-processing as well (indexing, stemming, scoring etc.?), in
> real-time, so you can add some CPU cost there too. You aren't building
> a web crawler, but you'll be facing many of the same kind of
> challenges.
>
> I'd tentatively suggest looking at something built on Hadoop (incl.
> the Solr/Lucene/Nutch family of projects). If you were prepared to
> abandon GAE (at least for the indexing and search query functions)
> then you could probbably build a Nutch cluster on EC2 without much
> expense, and (although I haven't) I'm sure many people have trodden
> this ground already. Certainly several people have set up Hadoop
> clusters on EC2. Hadoop's scalability is unparalleled (Yahoo! and
> Facebook both run 10,000+ node clusters, across multiple data
> centres), and as well as offering a map/reduce infrastructure for
> running your Lucene indexing operations on, it also offers a DB-esque
> platform simillar to Google Data or Amazon SDB for running your search
> queries on too.
>
> Alternatively, you could look at farming out your index processing to
> Amazon's new Hadoop API (which saves you having to set up a Hadoop
> cluster yourself) and then store the index with GAE datastore and
> query it natively inside your application. This may prove a little
> less effort, and would offload a lot of CPU to Amazon (which,
> admittedly you'd still have to pay for). I doubt this would be
> feasible in the long term though because of the volume of data you'd
> have to shift between Amazon and GAE and the costs associated with it.
> Plus, you'd have to cache and bulk-transform your data to make it
> feasibly work in API format, and this probably wouldn't fit well with
> real-time query goals.
>
> And incidentally, while all this might not be trivial to set up, it
> would make a great project, and potentially a valuable one for many
> people. Having an alternative twitter index as an open platform with
> deeper capability than search.twitter opens up a bunch of
> possibilities.
>
> Cheers, AJ
>
>
>
>
> >
>


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Silicon Beach 
Australia mailing list.

No lurkers! It is expected that you introduce yourself: 
http://groups.google.com/group/silicon-beach-australia/browse_thread/thread/99938a0fbc691eeb

To post to this group, send email to
silicon-beach-australia@googlegroups.com
To unsubscribe from this group, send email to
silicon-beach-australia+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/silicon-beach-australia?hl=en?hl=en
-~----------~----~----~----~------~----~------~--~---

[SiliconBeach] Re: Niche Search problems & finding Help

Reply via email to