Re: Question about state processor data outputs

Robert Metzger Thu, 15 Apr 2021 23:54:00 -0700

Hi,
I assumed you are using the DataStream API, because you mentioned the
streaming sink. But you also mentioned the state processor API (which I
ignored a bit).


I wonder why you are using the state processor API. Can't you use the
streaming job that created the state also for writing it to files using the
StreamingFileSink?

If you want to stick to the DataSet API, then I guess you have to implement
a custom (File)OutputFormat.


On Fri, Apr 16, 2021 at 5:37 AM Chen-Che Huang <acmic...@gmail.com> wrote:

> Hi Robert,
>
> Thanks for your code. It's really helpful!
>
> However, with the readKeyedState api of state processor, we get dataset
> for our data instead of datastream and it seems the dataset doesn't support
> streamfilesink (not addSink method like datastream). If not, I need to
> transform the dataset to a datastream. I'm not sure it's doable based on
> https://www.alibabacloud.com/blog/deep-insights-into-flink-sql-flink-advanced-tutorials_596628.
> If it's doable, then I'll be able to solve our problem with applying
> streamfilesink to the transformed dataset.
>
> Best wishes,
> Chen-Che Huang
>
> On 2021/04/15 19:23:43, Robert Metzger <rmetz...@apache.org> wrote:
> > Hey Chen-Che Huang,
> >
> > I guess the StreamingFileSink is what you are looking for. It is
> documented
> > here:
> >
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/streamfile_sink.html
> > I drafted a short example (that is not production ready), which does
> > roughly what you are asking for:
> > https://gist.github.com/rmetzger/7d5dbdaa118c63f5875c8c9520cc311d
> >
> > Hope this helps!
> >
> > Best,
> > Robert
> >
> >
> > On Thu, Apr 15, 2021 at 11:33 AM Chen-Che Huang <acmic...@gmail.com>
> wrote:
> >
> > > Hi all,
> > >
> > > We're going to use state processor to make our keyedstate data to be
> > > written to different files based on the keys. More specifically, we
> want
> > > our data to be written to files key1.txt, key2.txt, ..., and keyn.txt
> where
> > > the value with the same key is stored in the same file. In each file,
> the
> > > data may be stored as follows. As far as I know, I need to implement
> my own
> > > Sink (org.apache.flink.streaming.api.functions.sink.RichSinkFunction)
> to
> > > meet the requirement. However, I wonder is there a native way to
> achieve
> > > this without implementing my own Sink because using official solution
> is
> > > usually more efficient and reliable than doing it by myself.  Many
> thanks
> > > for any comment.
> > >
> > > key1.txt
> > > key1 value11
> > > key1 value21
> > > key1 value31
> > >
> > > key2.txt
> > > key2 value21
> > > key2 value22
> > > key2 value23
> > >
> > > Best wishes,
> > > Chen-Che Huang
> > >
> >
>

Re: Question about state processor data outputs

Reply via email to