Re: HTTP transport?

2009-11-12 Thread Patrick Hunt
One additional benefit of using HTTP is that people are always working to improve performance, and not only optimizing servers -- Google's SPDY: http://www.readwriteweb.com/archives/spdy_google_wants_to_speed_up_the_web.php Multiplexed requests, compressed headers, etc... Patrick Doug Cutting

Re: HTTP transport?

2009-10-14 Thread Kan Zhang
On 10/14/09 9:37 AM, "Doug Cutting" wrote: > Kan Zhang wrote: >> One problem I see with using HTTP is that it's expensive to provide data >> encryption. We're currently adding 2 authentication mechanisms (Kerberos and >> DIGEST-MD5) to our existing RPC. Both of them can provide data encryption

Re: HTTP transport?

2009-10-14 Thread Doug Cutting
Kan Zhang wrote: One problem I see with using HTTP is that it's expensive to provide data encryption. We're currently adding 2 authentication mechanisms (Kerberos and DIGEST-MD5) to our existing RPC. Both of them can provide data encryption for subsequent communication over the authenticated chan

Re: HTTP transport?

2009-10-13 Thread Kan Zhang
On 10/9/09 12:56 PM, "Doug Cutting" wrote: > Sanjay Radia wrote: >> Will the RPC over HTTP be transparent so that that we can replace with a >> different layer if needed? > > Yes. > >> My worry was the separation of data and checksums; someone had mentioned >> that one could do this over 2 R

Re: HTTP transport?

2009-10-10 Thread Scott Carey
On 10/9/09 10:49 AM, "Doug Cutting" wrote: > >> It is an interesting question how much we >> depend on being able to answer queries out of order. There are some >> parts of the code where overlapping requests from the same client >> matter. In particular, the terasort scheduler uses threads to

Re: HTTP transport?

2009-10-09 Thread Doug Cutting
Sanjay Radia wrote: Will the RPC over HTTP be transparent so that that we can replace with a different layer if needed? Yes. My worry was the separation of data and checksums; someone had mentioned that one could do this over 2 RPCs - that is not transparent. That was suggested as a possibi

Re: HTTP transport?

2009-10-09 Thread Sanjay Radia
> iThreadedHttpConnectionManager.html#setMaxConnectionsPerHost%28int%29 > > Connections are not free of course, but Jetty has been benchmarked at > 20,000 concurrent connections: > > http://cometdaily.com/2008/01/07/2-reasons-that-comet-scales/ > >> In short, I think

Re: HTTP transport?

2009-10-09 Thread Doug Cutting
omet-scales/ In short, I think that an HTTP transport is great for playing with, but I don't think you can assume it will work as the primary transport. I agree, we cannot assume it. But it's easy to try it and see how it fares. Any investment in getting it working is perhaps not

Re: HTTP transport?

2009-10-08 Thread Owen O'Malley
icular, the terasort scheduler uses threads to access the namenode. That would stop providing any pipelining, which I believe would be significant. In short, I think that an HTTP transport is great for playing with, but I don't think you can assume it will work as the primary transport. -- Owen

Re: HTTP transport?

2009-10-05 Thread Scott Carey
>> With respect to Avro/Hadoop, I suspect requests from clients to be time >> clustered. > > That was my thought as well. The thing that gets me is that in the case > of Hadoop (and the related subprojects) the clients utilizing this > particular HTTP connection are probably going to be pretty sm

Re: HTTP transport?

2009-10-05 Thread Scott Carey
On 10/5/09 1:47 PM, "Ryan Rawson" wrote: > I have a question about these headers... will they impact the ability to do > many, but small, rpcs? Imagine you'd need to support 5,000 to 50,000 > rpcs/second. Would this help or hinder? > As long as the HTTP response and request fit in one networ

Re: HTTP transport?

2009-10-05 Thread Eric Sammer
Scott Carey wrote: > Even in the beacon case, if the browser is likely to send another request > shortly, it cuts the effective network latency in half. Which is generally not the case in the beacon / ad server use case. That was the only point I was making. That's besides the point, though. I thi

Re: HTTP transport?

2009-10-05 Thread Scott Carey
On 10/5/09 1:53 PM, "Eric Sammer" wrote: > Ryan: > > Certainly keep alive will help in this case, if that's what you're > referring to. The server holds the socket for N seconds or M requests, > which ever comes first. What you're saving with KA is the connection > setup / tear down. If you hav

Re: HTTP transport?

2009-10-05 Thread Doug Cutting
Sanjay Radia wrote: What about out of order exchange. Will we be able to support that with http transport? Out-of-order exchange was originally added to Hadoop's RPC when it was a part of Nutch. It's an important optimization for distributed search, but it's not clear how

Re: HTTP transport?

2009-10-05 Thread Eric Sammer
Ryan Rawson wrote: > That's good to know. I thought ka would help... but I was also talking about > the overhead of a header where the payload is smaller than the framing. Eg: > 8 byte requests, excluding which rpc. This seems like we could be hurt since > the headers are potentially 5x the size of

Re: HTTP transport?

2009-10-05 Thread Ryan Rawson
That's good to know. I thought ka would help... but I was also talking about the overhead of a header where the payload is smaller than the framing. Eg: 8 byte requests, excluding which rpc. This seems like we could be hurt since the headers are potentially 5x the size of our payload/request params

Re: HTTP transport?

2009-10-05 Thread Eric Sammer
Ryan: Certainly keep alive will help in this case, if that's what you're referring to. The server holds the socket for N seconds or M requests, which ever comes first. What you're saving with KA is the connection setup / tear down. If you have a lot of cases where the client makes a single request

Re: HTTP transport?

2009-10-05 Thread Ryan Rawson
I have a question about these headers... will they impact the ability to do many, but small, rpcs? Imagine you'd need to support 5,000 to 50,000 rpcs/second. Would this help or hinder? On Oct 5, 2009 4:44 PM, "Eric Sammer" wrote: Doug Cutting wrote: > More or less. Except we can probably arrange

Re: HTTP transport?

2009-10-05 Thread Eric Sammer
Doug Cutting wrote: > More or less. Except we can probably arrange to omit most of those > response headers except Content-Length. Are any others strictly required? Content-Type and Server are probably unavoidable. Some of the others are extremely helpful during development / debugging / etc. It

Re: HTTP transport?

2009-10-05 Thread Sanjay Radia
e transport or choosing a server. Agreed. Hence the main advantages that remain for http transport are 1) language independent spec for the protocol. The message headers will be in avro so that is easy and the message exchange should be fairly straightforward. I see this as a minor advantage

Re: HTTP transport?

2009-09-30 Thread Sanjay Radia
On Sep 29, 2009, at 2:08 PM, Doug Cutting wrote: ... Alternately, we could try to make Avro's RPC more HTTP-friendly, and pull stuff out of Avro's payload into HTTP headers. The downside of that would be that, if we still wish to support non-HTTP transports, we'd end up with duplicated logic.

Re: HTTP transport?

2009-09-29 Thread Ryan Rawson
I wanted to chime in on a few things, since avro is a candidate for the HBase RPC. I am not sure that "browser compatibility" is a legitimate requirement for this kind of thing. It is at odds with high performance in a number of areas, and isn't the driving factor for using HTTP anyways. Security

Re: HTTP transport?

2009-09-29 Thread Scott Carey
On 9/29/09 2:57 PM, "stack" wrote: > On Tue, Sep 29, 2009 at 2:08 PM, Doug Cutting wrote: > >> >> Alternately, we could try to make Avro's RPC more HTTP-friendly, and pull >> stuff out of Avro's payload into HTTP headers. The downside of that would >> be that, if we still wish to support n

Re: HTTP transport?

2009-09-29 Thread Scott Carey
BTW, java.net.UrlConnection is the likely bottleneck there - it stinks performance-wise. The Apache commons http client is much faster. Try out using Jmeter and switch from one connector to the other for an example. On 9/29/09 4:17 PM, "Doug Cutting" wrote: stack wrote: > So, are we're talk

Re: HTTP transport?

2009-09-29 Thread Devaraj Das
Out of curiosity, do we have such numbers for the current hadoop RPC? On 9/29/09 4:17 PM, "Doug Cutting" wrote: stack wrote: > So, are we're talking about doing something like following for a > request/response: > > GET /avro/org.apache.hadoop.hbase.RegionServer HTTP/1.1 > Host: www.example.c

Re: HTTP transport?

2009-09-29 Thread Doug Cutting
Raghu Angadi wrote: Does this mean current Avro RPC transport (an improved version of Hadoop RPC) can still exist as long as it supported by developers? Sure, folks can create new transports for Avro. There is, for example, in Hadoop Common some code that tunnels Avro RPCs inside Hadoop RPCs.

Re: HTTP transport?

2009-09-29 Thread Doug Cutting
stack wrote: So, are we're talking about doing something like following for a request/response: GET /avro/org.apache.hadoop.hbase.RegionServer HTTP/1.1 Host: www.example.com HTTP/1.1 200 OK Date: Mon, 23 May 2005 22:38:34 GMT Server: Apache/1.3.3.7 (Unix) (Red-Hat/Linux) Last-Modified:

Re: HTTP transport?

2009-09-29 Thread Raghu Angadi
-129 and it seems like a great example of using HTTP transport. Does this mean current Avro RPC transport (an improved version of Hadoop RPC) can still exist as long as it supported by developers? Where does security lie : Avro or Transport layer? If it is part of Avro : transport layer does

Re: HTTP transport?

2009-09-29 Thread stack
On Tue, Sep 29, 2009 at 2:08 PM, Doug Cutting wrote: > > Alternately, we could try to make Avro's RPC more HTTP-friendly, and pull > stuff out of Avro's payload into HTTP headers. The downside of that would > be that, if we still wish to support non-HTTP transports, we'd end up with > duplicated

Re: HTTP transport?

2009-09-29 Thread Doug Cutting
stack wrote: What do you think the path on the first line look like? Will it be a method name or will it be customizable? Avro RPC currently includes the message name in the payload, so, unless that changes, for Avro RPC, we'd probably use a different URL per protocol. As a convention we mig

Re: HTTP transport?

2009-09-29 Thread stack
On Tue, Sep 29, 2009 at 12:43 PM, Doug Cutting wrote: > > The question I'm asking now is about the wire format, whether we wish to > precede each RPC request with something like "GET > /avro/org.apache.hadoop.hdfs.NameNode HTTP/1.1\n" and each response with > "HTTP/1.1 200 OK\n", plus a couple of

Re: HTTP transport?

2009-09-29 Thread Doug Cutting
Sanjay Radia wrote: Wrt connection pooling/async servers: Can't we use the same libraries that Jetty and Tomcat use? Grizzly? Grizzly also supports HTTP. Choosing Grizzly is independent of choosing HTTP as a wire transport or choosing a server. The question I'm asking now is about the wi

Re: HTTP transport?

2009-09-29 Thread Sanjay Radia
On Sep 28, 2009, at 3:42 PM, Doug Cutting wrote: Owen O'Malley wrote: > I've got concerns about this. Both tactical and strategic. The tactical > problem is that I need to get security (both Kerberos and token) in to > 0.22. I'd really like to get Avro RPC into 0.22. I'd like both to be >

Re: HTTP transport?

2009-09-28 Thread Doug Cutting
Owen O'Malley wrote: I've got concerns about this. Both tactical and strategic. The tactical problem is that I need to get security (both Kerberos and token) in to 0.22. I'd really like to get Avro RPC into 0.22. I'd like both to be done roughly in 5 months. If you switch off of the current RPC

Re: HTTP transport?

2009-09-28 Thread Sanjay Radia
On Sep 11, 2009, at 2:41 PM, Doug Cutting wrote: I'm considering an HTTP-based transport for Avro as the preferred, high-performance option. HTTP has lots of advantages. In particular, it already has - lots of authentication, authorization and encryption support; - highly optimized server

Re: HTTP transport?

2009-09-28 Thread Owen O'Malley
On Sep 11, 2009, at 2:41 PM, Doug Cutting wrote: I'm considering an HTTP-based transport for Avro as the preferred, high-performance option. I've got concerns about this. Both tactical and strategic. The tactical problem is that I need to get security (both Kerberos and token) in to 0.22

Re: HTTP transport?

2009-09-28 Thread Doug Cutting
Scott Carey wrote: HTTP is very useful and typically performs very well. It has lots of things built-in too. In addition to what you mention, it has a caching mechanism built-in, range queries, and all sorts of ways to tag along state if needed. To top it off there are a lot of testing and de

Re: HTTP transport?

2009-09-25 Thread Scott Carey
Ok, I have some thoughts on this. I might be misinterpreting some use cases here however. HTTP is very useful and typically performs very well. It has lots of things built-in too. In addition to what you mention, it has a caching mechanism built-in, range queries, and all sorts of ways to t

HTTP transport?

2009-09-11 Thread Doug Cutting
I'm considering an HTTP-based transport for Avro as the preferred, high-performance option. HTTP has lots of advantages. In particular, it already has - lots of authentication, authorization and encryption support; - highly optimized servers; - monitoring, logging, etc. Tomcat and other ser