Re: HTTP or RMI, Jini, JavaSpaces for distributed search

Walter Underwood Fri, 21 Sep 2007 11:32:24 -0700

Please don't switch to RMI. We've spent the past year converting
our entire middle tier from RMI to HTTP. We are so glad that we
no longer have any RMI servers.


The big advantage of HTTP is that there are hundreds, maybe
thousands, of engineers working on making it fast, on tools for it,
on caches, etc.

If you really need more compact responses, I would recommend
coding the JSON output in Python marshal format. That is compact,
fast, and easy to parse. We used that for a Java client in Ultraseek.

wunder

On 9/21/07 11:08 AM, "Yonik Seeley" <[EMAIL PROTECTED]> wrote:

> I wanted to take a step back for a second and think about if HTTP was
> really the right choice for the transport for distributed search.
> 
> I think the high-level approach in SOLR-303 is the right way to go
> about it, but I'm unsure if HTTP is the right transport.
> 
> Pro HTTP:
>   - using HTTP allows one to use an http load-balancer to distribute
> load across multiple copies of the same shard by assigning a VIP
> (virtual IP) to each shard.
>   - because you do pretty much everything by hand, you know that there
> isn't some hidden limitation that will jump out and bite you later.
> 
> Cons HTTP:
>  - you end up doing everything by hand... connection handling, request
> serialization, response parsing, etc...
>  - goes through normal servlet channels... every sub-request will be
> logged to the access logs, slowing things down.
> - more network bandwidth used unless we come up with a new
> BinaryResponseWriter and Parser
> 
> Currently, SOLR-303 uses and parses the XML response format, which has
> some serious downsides:
> - response size limits scalability and how deep in responses you can go...
>   If you want to retrieve documents 5000 through 5009, even though the
> user only requested 10 documents, the top-level searcher needs to get
> the top 5009 documents from *each* shard... and that can quickly
> exhaust the network bandwidth of the NIC.  XML parsing on the order of
> nShards*5009 entries won't be any picnic either.
> 
> I'm thinking the load-balancing of HTTP is overrated also, because
> it's inflexible.  Adding another shard requires adding another VIP in
> the load-balancer, and changing which servers have which shards or
> adding new copies of a shard also requires load-balancer
> configuration.  Everything points to Solr being able to do the
> load-balancing itself in the future, and there wouldn't seem to be
> much benefit to using a load-balancer w/ VIPS for each shard vs having
> Solr do it.
> 
> So even if we stuck with HTTP, Solr would need
>  - a binary protocol to minimize network bandwidth use
>  - load balancing across shard copies itself
> 
> Given that, would it make sense to just go with RMI instead?
> And perhaps leverage some other higher level services (Jini? JavaSpaces?)
> 
> I'd like to hear from people with more experience with RMI & friends,
> and what the potential downsides are to using these technologies.
> 
> -Yonik

Re: HTTP or RMI, Jini, JavaSpaces for distributed search

Reply via email to