Daniel,

On 6/27/23 12:56, Daniel Andres Pelaez Lopez wrote:
Christopher,

El mar, 27 jun 2023 a las 9:33, Christopher Schultz (<
ch...@christopherschultz.net>) escribió:

Daniel,

On 6/26/23 16:15, Daniel Andres Pelaez Lopez wrote:
El lun, 26 jun 2023 a las 14:53, Mark Thomas (<ma...@apache.org>)
escribió:

On 26/06/2023 20:34, Christopher Schultz wrote:
Daniel,

On 6/26/23 12:47, Daniel Andres Pelaez Lopez wrote:
Hi Tomcat community,

I have a requirement where we want to manually decode a Chunked
Transfer
Encoding (CTE) stream using CoyoteInputStream to have access to the
chunk
size. This means I want to use CoyoteInputStream.read method and get
the
whole CTE bytes. Saying it in another way: we want to decode the CTE
at
hand skipping Tomcat defaults.

Dumb question: why?

Not a dumb question at all. It is the key question. I'm curious as to
what the answer is.

Mark


Not dumb question at all. Let me expand the use case: we are working on
an
HTTP origin (Tomcat) for video streaming using HLS and DASH. Our video
packager generates video segments of X size, and each segment is also
divided into fragments (CMAF). The segment size is fixed, but the
fragment
size is variable. Our packager transfers the segment meanwhile it
generates
it, a fragment at the time (a chunk), in a CTE, to the HTTP origin
(Tomcat). Now, video players want to download the segment, but as for the
HLS spec, we require to transfer the segment to the video player, as we
received, a fragment a the time. To be able of sending a fragment at the
time, we need to know its size, which is implicit inside the CTE (each
chunk declares the chunk size).

Our current implementation sends the segment using CTE to the video
players, but we cannot guarantee we are sending a fragment by chunk.

This is why having access to each chunk and its size will help us.

Thanks for the details. I think I've got it, but I want to clarify a
little bit.

Is your video-chunk-generator producing anything HTTP-related? It almost
sounds like Tomcat is a reverse-proxy and your video-generator is the
origin. Maybe you are just generating byte[] from the video-generator?

Or maybe your video-generator is UPLOADING the chunks to the HTTP
server? It's not entirely clear to me, and the details matter.


Thanks for staying in the conversation.

The packager (video-chunk-generator) sends an HTTP PUT with
Transfer-Encoding: chunked header, the content is a video segment, where
each chunk is a fragment, so, yes, the video-chunk-generator uploads the
segment in chunks to the Tomcat server (origin)

Sorry for the confusion regarding the word "origin", that is a video
streaming term that doesn't matter for the question.

Yes, that's important information to have: in HTTPD, the "origin" is the web server which actually has the desired resource. Contrast that with a reverse proxy, etc.

It sounds like you are trying to optimize things such that video-chunk
size ends up being equal to the HTTP-chunk size. Is that the real goal?


The video-chunk-generator does it for us, it sends each video fragment as
an HTTP chunk. What we want to optimize is not the transfer from the
video-chunk-generator to the server, but from the server to its clients.
Clients will do an HTTP GET against the server to grab the segment, that
GET we want to optimize in a way that we keep the fragment-by-chunk
strategy, using Transfer-Encoding: chunked. This is why, accessing each
chunk size when the video-chunk-generator does the PUT, and saving that
info in the server, we can use it when clients do a GET, to assure we
transfer the same way we received.

Is there no way to observe the video-chunk-size by looking at the raw bytes of the video file itself? Take the MP3 audio format, with which I'm more familiar. MP3 frame lengths can be computed based upon some information at the start of each frame including the version number, bit rate, sample rate, etc. So by reading a few bytes into the file, you know how big each chunk would need to be. Then you can bush the bytes and go to the next chunk, etc.

If you can do that with your files, there is no reason to record the chunk-sizes that you got at the time of upload unless you just want the download to be as absolutely screaming-fast as possible and you don't want to perform any mathematical operations at all during the download (though you will presumably have to read a file from storage, which has a much higher cost than a little bit of math IMHO).

Let's assume you CAN determine chunk-size from your source file. You can get Tomcat to chunk your file the same way just like this:

public void goGet(HttpServletRequest request, HttpServletResponse response) throws IOException {

  response.setHeader("Transfer-Encoding", "chunked");
  response.setBufferSize(MAXIMUM_VIDEO_FRAME_SIZE); // This is important

  InputStream video = ...; // You figure this out
  OutputStream out = response.getOutputStream();

  boolean eof = false;

  byte[] buffer = new buffer[1024]; // Or something appropriate

  while(!eof) {
    int c = video.read(buffer);

    if(-1 == c) {
      eof = true;
    } else {
      int chunkSize = getChunkSize(buffer);

      chunkSize =- c; // We have already read c bytes from video

      out.write(buffer, 0, c);

      for(i=c; i<chunkSize; ++i) { // TODO: Optimize this copy operation
        out.write(i);
      }

      out.flush(); // This triggers Tomcat to generate a chunked
                   // response
    }
  }
}

There are lots of way the above code can fail, etc. and so it needs to be much more robust, but I just wanted you to get the general idea.

There are two very important things in the code:

1. The line which sets the output buffer size. If you use the default buffer size, Tomcat may (okay, WILL) "chunk" the response in the middle of your video-chunk of a video-chunk can get bigger than the current buffer size. So you need to make sure that doesn't happen.

Or, maybe it's okay if that happens, but you want to minimize the number of times that happens or you waste bytes, cycles, etc.

2. You must call ServletOutputStream.flush, which is how Tomcat knows to actually chunk the response.

In that case, you want to force the chunk size to something specific,
rather than just trying to see what the chunk size is.

How you do that depends on whether your video-generator is sending data
in the *request entity* in e.g. PUT or POST or if you are fetching the
data in a *response entity*.

I *think* you want to inspect chunk-size of an upload-to-Tomcat, but I
want to be sure. Might this be easier to do on the client to force a
certain chunk-size?

You are right, we want to inspect the chunk-size of an upload to Tomcat. We
have no control over the video-chunk-generator, so, the only way to know
the fragment/chunk size they are sending is by inspecting the CTE.

The only way to know the chunk size THEY are sending is to inspect and record it. But you don't really care what they send; instead you care what chunk-size to use for your Tomcat responses. They *should* be the same thing, but I wanted to re-frame (hah!) the problem to me more accurate, because I think you are trying to solve problem X (how to observe inbound chunk size) when you really want to solve problem Y (optimize outbound chunk size).

Finally... for video, perhaps a Websocket connection would be better
since there is less protocol-overhead once the ws connection is
established?

True, but the video-chunk-generator only offers two ways of transfer: HTTP
PUT or writing to disk. The second option was discarded as we will need to
listen to file system events and do some magic there, which we don't need
to do for the HTTP PUT, as the protocol/Tomcat guarantee when the transfer
starts and ends.

Sounds good to me. Plus, if you use HTTP then you can de-couple the services easily at any time.

The current flow from the point of view of CoyoteInputStream is:
CoyoteInputStream.read -> Request.read -> ChunkedInputFilter.read.

ChunkedInputFilter handles the CTE decoding and the read method only
returns the chunks, with no other information, like chunk size.

I found that the method Request.setInputBuffer might allow to set a
different InputBuffer implementation, for instance, the
IdentityInputFilter, which I understand returns all the stream bytes,
with
no decoding. However, not sure if this is the right way and which
consequences might have.

I would like to know if there are other ways to override the CTE
behavior,
any help would be appreciated.

A problem I can see is that you are working with a blocking streaming
interface e.g. read(byte[]) and you also want to get the chunk size.
When? The chunk-size can change for every chunk, so if you call
getChunkSize() before the read() and after the read(), they may be
different if the read() returns data from multiple chunks. It may have
changed multiple times between read() was called and when it completed.

If you want to always size byte byte[] to read full-chunks at once ...
I
guess I would again ask "why?"

Would it be sufficient for ChunkedInputFilter to maybe send an
event-notification each time a chunk boundary was crossed? For example:

public interface ChunkListener {
     public void chunkStarted(ChunkedInputFilter source, long offset,
long
length);
     public void chunkFinished(ChunkedInputFilter source, long offset,
long length);
}

Then, every time the Filter begins or ends a chunk it could notify your
code and you can do whatever you want with that information.

You might be able to subclass the (somewhat confusingly-named)
ChunkInputFilter and bolt-on your own logic like what I have above.



Yes, a listener like that looks great. Any more clues on how to inject my
own ChunkInputFilter implementation in Tomcat configuration? seems quite
hard to do it well.  Also, the listener must be linked by HTTP request.

I think doing so would require some internal support for messing-around
with the chain of objects that handle the requests. I don't think you
can do this "on your own". One option would be for us to add the ability
to register a "ChunkListener" with the ChunkInputFilter but honestly
this is a pretty odd use-case and having that code running on every
server worldwide seems like a waste. The other option would be to allow
you to specify your own ChunkInputFilter class at some point during
server initialization, which seems like a much better option.


I totally agree Tomcat shouldn't add anything specific regarding this
uncommon use case, I am happy having a workaround. Specifying my own
ChunkInputFilter seems the way to go, I have access to the Request object
(which Spring Boot can inject), so, using Request.setInputBuffer should be
enough? I am a little concerned about playing with Tomcat defaults, but not
many options on my plate.

One more frame-challenge (a bit of an intentional joke, there) for you: why bother "optimizing" the HTTP chunk-size? Most networking components and software work with buffers of sized sizes and end up naturally filling and emptying those buffers on a schedule that is pretty regular. By introducing an artificial "chunk size" which likely doesn't match any of those, you are definitely making things more complicated... but is it actually *improving* anything?

If you have a 1MB video (small, I know) and it's video-chunked into segments of weird sizes like 1243, 6873, 2341, 7654, and 8790 bytes, does it matter to the client/recipient if they get HTTP-chunks of those exact same sizes or if they get HTTP-chunks which are all, say, 4096 bytes in size (except the final chunk, which will be short)?

Most media-players download several frames in advance of actually starting playing and continue to buffer throughout the playback. Additionally, any decent player will not just do something naive like this:

HTTP GET /movies/guardians_of_the_galaxy.h264

And download the entire file. Instead, the player will most likely make a range-request like this:

HTTP GET /movies/guardians_of_the_galaxy.h264
Range: bytes=0-1023

Then the server sends the first 1k of data and the client decides what to do, next. The client makes many of these requests as playback continues. This allows the user to pause, scrub-around the timeline, rewind, etc. without ever download the entire file each time.

I'm making a lot of assumptions about your usage of this service, but I think you may be trying to solve a problem that doesn't need to be solved... at least not the way you think it needs to be solved.

-chris

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to