Hadoop HDFS supports an HTTP API, WebHDFS. The protocol [1] is pretty
basic. It supports most of the operations exposed through the Hadoop
FileSystem [2] API. There is a Java implementation [3] returning an
OutputStream matching the contract [4] from FileSystem. The important
point is that the stream supports hflush() and hsync() from a Syncable
interface [5]:

* hflush() commits buffers to all replicas and informs the name
  service of the updated length (visible to other clients)
* hsync() also ensures that all replicas are persistent on disk.

The current implementation uses a straightforward chunked encoding
writing to a servlet, which creates an HDFS client to write the
replicas using the binary HDFS protocol. Both hflush() and hsync() are
currently noops in WebHDFS.

Looking at the RFC, a natural place to add a hflush/hsync to a chunked
stream being PUT to WebHDFS would use the chunk-extension field [6].
Just adding ;flush or ;sync after the chunk size could signal the HDFS
client on the other side to perform the corresponding operation on the
HDFS stream. It's also backwards- and forwards-compatible with
existing clients and servers. There's an unfortunate constraint that
the stream couldn't call hflush() or hsync() without data to send
(since a zero-length chunk would be the end of the stream, unless I've
misread the RFC), but I think we can live with that or find another
workaround.

The goal is to extend the existing REST protocol to support flush()
i.e., make the data written during a PUT/POST visible to other clients
at intermediate points in the stream. These points are not known when
the stream is created. I'm very open to being disabused of the
soundness of the chunk-extension approach.

Unfortunately, every client I've found doesn't expose chunk extensions
and most servers discard chunk extensions before calling any handler.
So even if it is an appropriate change to the protocol, employing
chunk-extensions could require a significant rewrite. We've found some
of the other solutions for bidirectional pipes- WebSockets, BOSH [13],
etc.- but all of these seem like overkill for such a humble case.
Moreover, each of these seems to be designed for low-latency
operations, not a long-lived stream that publishes intermediate
updates (e.g., HDFS streams used as a WAL).

I looked at Apache HttpClient, netty, and the bits used by
java.net.HttpURLConnection. I couldn't find an API to this part of the
RFC [6] in any of them.

I found a workaround for HttpClient to consider data returned from a
chunked encoding as identity on GET [7], which (presumably) would
allow the handler to process the chunk-extension data before it is
discarded. The ChunkEncoder [8] does not seem to have any hooks for
adding chunk-extensions.

WebHDFS currently uses java.net.HttpURLConnection and its own, custom
logic for composing requests, retries, etc. The HTTP protocol has been
in HDFS since 2011 [9], so I hoped that there was a way to make the
existing bits work. From what I found [12], it doesn't look like
there's a hook to add data to the chunk-extension field during the
write.

There is also work to support HTTP/2 as a transfer protocol in HDFS
[10,11]. Should we press to transition WebHDFS to adopt this, instead
of trying to force flush/sync into HTTP/1.1? Is there a canonical
approach?

Thanks for considering this. -C


References
==========
[1] 
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html
[2] 
https://git1-us-west.apache.org/repos/asf?p=hadoop.git;f=hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java;hb=HEAD
[3] 
https://git1-us-west.apache.org/repos/asf?p=hadoop.git;f=hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java;hb=HEAD
[4] 
https://git1-us-west.apache.org/repos/asf?p=hadoop.git;f=hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FSDataOutputStream.java;hb=HEAD
[5] 
https://git1-us-west.apache.org/repos/asf?p=hadoop.git;f=hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Syncable.java;hb=HEAD
[6] http://tools.ietf.org/html/rfc2616#section-3.6.1
      Chunked-Body   = *chunk
                       last-chunk
                       trailer
                       CRLF

      chunk          = chunk-size [ chunk-extension ] CRLF
                       chunk-data CRLF
      chunk-size     = 1*HEX
      last-chunk     = 1*("0") [ chunk-extension ] CRLF

      chunk-extension= *( ";" chunk-ext-name [ "=" chunk-ext-val ] )
      chunk-ext-name = token
      chunk-ext-val  = token | quoted-string
      chunk-data     = chunk-size(OCTET)
      trailer        = *(entity-header CRLF)

[7] 
http://mail-archives.apache.org/mod_mbox/hc-httpclient-users/201406.mbox/%3C1402132048.19574.0.camel@ubuntu%3E
[8] 
http://svn.apache.org/viewvc/httpcomponents/httpcore/tags/4.4.1/httpcore-nio/src/main/java/org/apache/http/impl/nio/codecs/ChunkEncoder.java?view=markup#l112
[9] https://issues.apache.org/jira/browse/HDFS-2284
[10] https://issues.apache.org/jira/browse/HDFS-7966
[11] https://issues.apache.org/jira/browse/HDFS-8671
[12] 
http://opensourcejavaphp.net/java/openjdk/sun/net/www/http/ChunkedOutputStream.java.html#70
[13] http://xmpp.org/extensions/xep-0124.html#technique

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
For additional commands, e-mail: users-h...@httpd.apache.org

Reply via email to