[SiliconBeach] Re: Niche Search problems & finding Help

Nathan de Vries Mon, 01 Jun 2009 02:44:37 -0700

Hi Samuel,

I know you said up-front that your email was going to be a rant, but I  
think you need to take a deep breath and consider for a moment that  
not everyone thinks your ideas are as valuable as you do. Ideas are a  
dime-a-dozen unless you can bring them to life, and you've said  
yourself that you don't have the technical expertise to create your  
all-powerful Twitter index. It's also not fair to take pot-shots at  
other startups on a start-up focused mailing list, especially when  
you're not giving them the opportunity to defend themselves against  
your "suspicions".


Oh, and Twitter probably supports Unicode because they understand that  
their service might appeal to non-English speaking countries, which  
consists of more people than those who want to search for "20%".  
Adding to this, your email implies that unicode content cannot be  
searched, but that's clearly not true:

   http://search.twitter.com/search?q=脳カベ

If you think you can provide a more flexible search service with a  
more powerful search index, including the infrastructure to keep it  
running, Twitter allows you to do so via their publicly available  
"spritzer" stream API. If your service is deemed worthy, you'll get  
access to the "firehose" API. This is precisely how Summize built  
their product, which was subsequently acquired and made part of  
Twitter's main offering.

In short: if you think you've got such a great idea, go do it.


Cheers,

Nathan de Vries



On 01/06/2009, at 6:21 PM, Samuel Bishop wrote:

> Interesting but it didnt take long to notice they have the same  
> limitations i wanted to work around as well.
> you cant search for something like '25%', you just get results for  
> '25'
> which is a major limitation really.
> Why on earth twitter bothered supporting unicode i have no idea. You  
> cant search for anything besides alphanumerics & probably a half  
> dozen or less symbols which makes them nearly impossible to find  
> again without wasting chars on hashtags you shouldn't need.
>
> Id love to dig deeper into the concept of building a twtitter index.  
> But time, money and the lack of both do not give me much chance to  
> do 10% of the things i think of.
> Despite a very clean concept of what the index would enable &  
> possibly make bucket-loads of money off.
>
> I suspect that topsy is seeking to deal with it the same way twitter  
> is. Irony here is that the rank by followers has been done before,  
> all these guys are doing its weighting the search results using the  
> same method... and someone gave them 15 Million to work on it... I  
> come up with crap like that and think it would make a good feature.  
> What is with these people building whole companies of stuff that  
> should simply be a damn feature... Ill cut myself off there before i  
> get into a rant about stupid people with too much money & their  
> lucky buddies cashing in.
> On 01/06/2009, at 2:12 PM, David Jones wrote:
>
>> BTW - this seems related - one of the founders is out of the  
>> security/reputation space (cloudmark/Vipul's razor)
>> http://blogs.wsj.com/venturecapital/2009/05/27/topsy-bets-on-real-time-twitter-search-with-15m-backing/
>>
>>
>> On Sun, May 31, 2009 at 6:00 PM, Andrew J <ajes...@gmail.com> wrote:
>>
>> Hi Samuel,
>>
>> Excuse what may seem like a bit of a ramble, but have been thinking
>> about these sort of issues myself of late. My thoughts:
>>
>> If you really need an complete index, and can't do a low-pass filter
>> to keep the volume of data manageable, then you are talking about a
>> fairly serious amount of CPU, bandwidth, and storage. I rather doubt
>> you'll find this anywhere for free, even within the fairly generous
>> free provisions that GAE provides. Presumably your index will need
>> some pre-processing as well (indexing, stemming, scoring etc.?), in
>> real-time, so you can add some CPU cost there too. You aren't  
>> building
>> a web crawler, but you'll be facing many of the same kind of
>> challenges.
>>
>> I'd tentatively suggest looking at something built on Hadoop (incl.
>> the Solr/Lucene/Nutch family of projects). If you were prepared to
>> abandon GAE (at least for the indexing and search query functions)
>> then you could probbably build a Nutch cluster on EC2 without much
>> expense, and (although I haven't) I'm sure many people have trodden
>> this ground already. Certainly several people have set up Hadoop
>> clusters on EC2. Hadoop's scalability is unparalleled (Yahoo! and
>> Facebook both run 10,000+ node clusters, across multiple data
>> centres), and as well as offering a map/reduce infrastructure for
>> running your Lucene indexing operations on, it also offers a DB-esque
>> platform simillar to Google Data or Amazon SDB for running your  
>> search
>> queries on too.
>>
>> Alternatively, you could look at farming out your index processing to
>> Amazon's new Hadoop API (which saves you having to set up a Hadoop
>> cluster yourself) and then store the index with GAE datastore and
>> query it natively inside your application. This may prove a little
>> less effort, and would offload a lot of CPU to Amazon (which,
>> admittedly you'd still have to pay for). I doubt this would be
>> feasible in the long term though because of the volume of data you'd
>> have to shift between Amazon and GAE and the costs associated with  
>> it.
>> Plus, you'd have to cache and bulk-transform your data to make it
>> feasibly work in API format, and this probably wouldn't fit well with
>> real-time query goals.
>>
>> And incidentally, while all this might not be trivial to set up, it
>> would make a great project, and potentially a valuable one for many
>> people. Having an alternative twitter index as an open platform with
>> deeper capability than search.twitter opens up a bunch of
>> possibilities.
>>
>> Cheers, AJ

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Silicon Beach 
Australia mailing list.

No lurkers! It is expected that you introduce yourself: 
http://groups.google.com/group/silicon-beach-australia/browse_thread/thread/99938a0fbc691eeb

To post to this group, send email to
silicon-beach-australia@googlegroups.com
To unsubscribe from this group, send email to
silicon-beach-australia+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/silicon-beach-australia?hl=en?hl=en
-~----------~----~----~----~------~----~------~--~---

[SiliconBeach] Re: Niche Search problems & finding Help

Reply via email to