Hadoop HDFS supports an HTTP API, WebHDFS. The protocol [1] is pretty basic. It supports most of the operations exposed through the Hadoop FileSystem [2] API. There is a Java implementation [3] returning an OutputStream matching the contract [4] from FileSystem. The important point is that the stream supports hflush() and hsync() from a Syncable interface [5]:
* hflush() commits buffers to all replicas and informs the name service of the updated length (visible to other clients) * hsync() also ensures that all replicas are persistent on disk. The current implementation uses a straightforward chunked encoding writing to a servlet, which creates an HDFS client to write the replicas using the binary HDFS protocol. Both hflush() and hsync() are currently noops in WebHDFS. Looking at the RFC, a natural place to add a hflush/hsync to a chunked stream being PUT to WebHDFS would use the chunk-extension field [6]. Just adding ;flush or ;sync after the chunk size could signal the HDFS client on the other side to perform the corresponding operation on the HDFS stream. It's also backwards- and forwards-compatible with existing clients and servers. There's an unfortunate constraint that the stream couldn't call hflush() or hsync() without data to send (since a zero-length chunk would be the end of the stream, unless I've misread the RFC), but I think we can live with that or find another workaround. The goal is to extend the existing REST protocol to support flush() i.e., make the data written during a PUT/POST visible to other clients at intermediate points in the stream. These points are not known when the stream is created. I'm very open to being disabused of the soundness of the chunk-extension approach. Unfortunately, every client I've found doesn't expose chunk extensions and most servers discard chunk extensions before calling any handler. So even if it is an appropriate change to the protocol, employing chunk-extensions could require a significant rewrite. We've found some of the other solutions for bidirectional pipes- WebSockets, BOSH [13], etc.- but all of these seem like overkill for such a humble case. Moreover, each of these seems to be designed for low-latency operations, not a long-lived stream that publishes intermediate updates (e.g., HDFS streams used as a WAL). I looked at Apache HttpClient, netty, and the bits used by java.net.HttpURLConnection. I couldn't find an API to this part of the RFC [6] in any of them. I found a workaround for HttpClient to consider data returned from a chunked encoding as identity on GET [7], which (presumably) would allow the handler to process the chunk-extension data before it is discarded. The ChunkEncoder [8] does not seem to have any hooks for adding chunk-extensions. WebHDFS currently uses java.net.HttpURLConnection and its own, custom logic for composing requests, retries, etc. The HTTP protocol has been in HDFS since 2011 [9], so I hoped that there was a way to make the existing bits work. From what I found [12], it doesn't look like there's a hook to add data to the chunk-extension field during the write. There is also work to support HTTP/2 as a transfer protocol in HDFS [10,11]. Should we press to transition WebHDFS to adopt this, instead of trying to force flush/sync into HTTP/1.1? Is there a canonical approach? Thanks for considering this. -C References ========== [1] http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html [2] https://git1-us-west.apache.org/repos/asf?p=hadoop.git;f=hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java;hb=HEAD [3] https://git1-us-west.apache.org/repos/asf?p=hadoop.git;f=hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java;hb=HEAD [4] https://git1-us-west.apache.org/repos/asf?p=hadoop.git;f=hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FSDataOutputStream.java;hb=HEAD [5] https://git1-us-west.apache.org/repos/asf?p=hadoop.git;f=hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Syncable.java;hb=HEAD [6] http://tools.ietf.org/html/rfc2616#section-3.6.1 Chunked-Body = *chunk last-chunk trailer CRLF chunk = chunk-size [ chunk-extension ] CRLF chunk-data CRLF chunk-size = 1*HEX last-chunk = 1*("0") [ chunk-extension ] CRLF chunk-extension= *( ";" chunk-ext-name [ "=" chunk-ext-val ] ) chunk-ext-name = token chunk-ext-val = token | quoted-string chunk-data = chunk-size(OCTET) trailer = *(entity-header CRLF) [7] http://mail-archives.apache.org/mod_mbox/hc-httpclient-users/201406.mbox/%3C1402132048.19574.0.camel@ubuntu%3E [8] http://svn.apache.org/viewvc/httpcomponents/httpcore/tags/4.4.1/httpcore-nio/src/main/java/org/apache/http/impl/nio/codecs/ChunkEncoder.java?view=markup#l112 [9] https://issues.apache.org/jira/browse/HDFS-2284 [10] https://issues.apache.org/jira/browse/HDFS-7966 [11] https://issues.apache.org/jira/browse/HDFS-8671 [12] http://opensourcejavaphp.net/java/openjdk/sun/net/www/http/ChunkedOutputStream.java.html#70 [13] http://xmpp.org/extensions/xep-0124.html#technique --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org For additional commands, e-mail: users-h...@httpd.apache.org