Re: Tomcat 10.1.x: Using CoyoteInputStream to read a Chunked Transfer Encoding (CTE) stream, manually, skiping ChunkedInputFilter

Daniel Andres Pelaez Lopez Tue, 27 Jun 2023 09:57:04 -0700

Christopher,

El mar, 27 jun 2023 a las 9:33, Christopher Schultz (<
ch...@christopherschultz.net>) escribió:


> Daniel,
>
> On 6/26/23 16:15, Daniel Andres Pelaez Lopez wrote:
> > El lun, 26 jun 2023 a las 14:53, Mark Thomas (<ma...@apache.org>)
> escribió:
> >
> >> On 26/06/2023 20:34, Christopher Schultz wrote:
> >>> Daniel,
> >>>
> >>> On 6/26/23 12:47, Daniel Andres Pelaez Lopez wrote:
> >>>> Hi Tomcat community,
> >>>>
> >>>> I have a requirement where we want to manually decode a Chunked
> Transfer
> >>>> Encoding (CTE) stream using CoyoteInputStream to have access to the
> >> chunk
> >>>> size. This means I want to use CoyoteInputStream.read method and get
> the
> >>>> whole CTE bytes. Saying it in another way: we want to decode the CTE
> at
> >>>> hand skipping Tomcat defaults.
> >>>
> >>> Dumb question: why?
> >>
> >> Not a dumb question at all. It is the key question. I'm curious as to
> >> what the answer is.
> >>
> >> Mark
> >>
> >>
> > Not dumb question at all. Let me expand the use case: we are working on
> an
> > HTTP origin (Tomcat) for video streaming using HLS and DASH. Our video
> > packager generates video segments of X size, and each segment is also
> > divided into fragments (CMAF). The segment size is fixed, but the
> fragment
> > size is variable. Our packager transfers the segment meanwhile it
> generates
> > it, a fragment at the time (a chunk), in a CTE, to the HTTP origin
> > (Tomcat). Now, video players want to download the segment, but as for the
> > HLS spec, we require to transfer the segment to the video player, as we
> > received, a fragment a the time. To be able of sending a fragment at the
> > time, we need to know its size, which is implicit inside the CTE (each
> > chunk declares the chunk size).
> >
> > Our current implementation sends the segment using CTE to the video
> > players, but we cannot guarantee we are sending a fragment by chunk.
> >
> > This is why having access to each chunk and its size will help us.
>
> Thanks for the details. I think I've got it, but I want to clarify a
> little bit.
>
> Is your video-chunk-generator producing anything HTTP-related? It almost
> sounds like Tomcat is a reverse-proxy and your video-generator is the
> origin. Maybe you are just generating byte[] from the video-generator?
>
> Or maybe your video-generator is UPLOADING the chunks to the HTTP
> server? It's not entirely clear to me, and the details matter.
>

Thanks for staying in the conversation.

The packager (video-chunk-generator) sends an HTTP PUT with
Transfer-Encoding: chunked header, the content is a video segment, where
each chunk is a fragment, so, yes, the video-chunk-generator uploads the
segment in chunks to the Tomcat server (origin)

Sorry for the confusion regarding the word "origin", that is a video
streaming term that doesn't matter for the question.


> It sounds like you are trying to optimize things such that video-chunk
> size ends up being equal to the HTTP-chunk size. Is that the real goal?
>

The video-chunk-generator does it for us, it sends each video fragment as
an HTTP chunk. What we want to optimize is not the transfer from the
video-chunk-generator to the server, but from the server to its clients.
Clients will do an HTTP GET against the server to grab the segment, that
GET we want to optimize in a way that we keep the fragment-by-chunk
strategy, using Transfer-Encoding: chunked. This is why, accessing each
chunk size when the video-chunk-generator does the PUT, and saving that
info in the server, we can use it when clients do a GET, to assure we
transfer the same way we received.


>
> In that case, you want to force the chunk size to something specific,
> rather than just trying to see what the chunk size is.
>
> How you do that depends on whether your video-generator is sending data
> in the *request entity* in e.g. PUT or POST or if you are fetching the
> data in a *response entity*.
>
> I *think* you want to inspect chunk-size of an upload-to-Tomcat, but I
> want to be sure. Might this be easier to do on the client to force a
> certain chunk-size?
>

You are right, we want to inspect the chunk-size of an upload to Tomcat. We
have no control over the video-chunk-generator, so, the only way to know
the fragment/chunk size they are sending is by inspecting the CTE.


>
> Finally... for video, perhaps a Websocket connection would be better
> since there is less protocol-overhead once the ws connection is
> established?
>

True, but the video-chunk-generator only offers two ways of transfer: HTTP
PUT or writing to disk. The second option was discarded as we will need to
listen to file system events and do some magic there, which we don't need
to do for the HTTP PUT, as the protocol/Tomcat guarantee when the transfer
starts and ends.


>
> >>>> The current flow from the point of view of CoyoteInputStream is:
> >>>> CoyoteInputStream.read -> Request.read -> ChunkedInputFilter.read.
> >>>>
> >>>> ChunkedInputFilter handles the CTE decoding and the read method only
> >>>> returns the chunks, with no other information, like chunk size.
> >>>>
> >>>> I found that the method Request.setInputBuffer might allow to set a
> >>>> different InputBuffer implementation, for instance, the
> >>>> IdentityInputFilter, which I understand returns all the stream bytes,
> >>>> with
> >>>> no decoding. However, not sure if this is the right way and which
> >>>> consequences might have.
> >>>>
> >>>> I would like to know if there are other ways to override the CTE
> >>>> behavior,
> >>>> any help would be appreciated.
> >>>
> >>> A problem I can see is that you are working with a blocking streaming
> >>> interface e.g. read(byte[]) and you also want to get the chunk size.
> >>> When? The chunk-size can change for every chunk, so if you call
> >>> getChunkSize() before the read() and after the read(), they may be
> >>> different if the read() returns data from multiple chunks. It may have
> >>> changed multiple times between read() was called and when it completed.
> >>>
> >>> If you want to always size byte byte[] to read full-chunks at once ...
> I
> >>> guess I would again ask "why?"
> >>>
> >>> Would it be sufficient for ChunkedInputFilter to maybe send an
> >>> event-notification each time a chunk boundary was crossed? For example:
> >>>
> >>> public interface ChunkListener {
> >>>     public void chunkStarted(ChunkedInputFilter source, long offset,
> long
> >>> length);
> >>>     public void chunkFinished(ChunkedInputFilter source, long offset,
> >>> long length);
> >>> }
> >>>
> >>> Then, every time the Filter begins or ends a chunk it could notify your
> >>> code and you can do whatever you want with that information.
> >
> >> You might be able to subclass the (somewhat confusingly-named)
> >>> ChunkInputFilter and bolt-on your own logic like what I have above.
> >>>
> >
> >
> > Yes, a listener like that looks great. Any more clues on how to inject my
> > own ChunkInputFilter implementation in Tomcat configuration? seems quite
> > hard to do it well.  Also, the listener must be linked by HTTP request.
>
> I think doing so would require some internal support for messing-around
> with the chain of objects that handle the requests. I don't think you
> can do this "on your own". One option would be for us to add the ability
> to register a "ChunkListener" with the ChunkInputFilter but honestly
> this is a pretty odd use-case and having that code running on every
> server worldwide seems like a waste. The other option would be to allow
> you to specify your own ChunkInputFilter class at some point during
> server initialization, which seems like a much better option.
>

I totally agree Tomcat shouldn't add anything specific regarding this
uncommon use case, I am happy having a workaround. Specifying my own
ChunkInputFilter seems the way to go, I have access to the Request object
(which Spring Boot can inject), so, using Request.setInputBuffer should be
enough? I am a little concerned about playing with Tomcat defaults, but not
many options on my plate.


> -chris
>










>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org
>
>

-- 
Daniel Andrés Pelaez López
Master’s Degree in IT Architectures, Universidad de los Andes.
Software Construction Specialist, Universidad de los Andes.
Bachelor's Degree in Computer Sciences, Universidad del Quindio.
e. estigm...@gmail.com

Re: Tomcat 10.1.x: Using CoyoteInputStream to read a Chunked Transfer Encoding (CTE) stream, manually, skiping ChunkedInputFilter

Reply via email to