Re: [DISCUSS] Structuring Java based DSLs

2018-12-12 Thread Xinyu Liu
Agree with Kenn on this. From our SamzaRunner point of view, we would like Beam SQL to be self-contained and flexible enough for our users to use it in different scenarios, e.g. pure SQL and embeded in different SDKs. We are also extremely interested in the DataFrame-like API mentioned above. To di

[DISCUSSION] ParDo Async Java API

2019-01-22 Thread Xinyu Liu
Hi, guys, As more users try out Beam running on the SamzaRunner, we got a lot of asks for an asynchronous processing API. There are a few reasons for these asks: - The users here are experienced in asynchronous programming. With async frameworks such as Netty and ParSeq and libs like async

Re: [DISCUSSION] ParDo Async Java API

2019-01-22 Thread Xinyu Liu
en more robust against poor future >> use could be process(@Element InputT element, @Output >> OutputReceiver>). In this way, the process method >> itself will be async chained, rather than counting on the user to do the >> right thing. >> >> We should see how

Re: [DISCUSSION] ParDo Async Java API

2019-01-22 Thread Xinyu Liu
s/flink/flink-docs-release-1.7/dev/stream/operators/asyncio.html On Tue, Jan 22, 2019 at 5:23 PM Reuven Lax wrote: > > > On Tue, Jan 22, 2019 at 5:08 PM Xinyu Liu wrote: > >> @Steve: it's good to see that this is going to be useful in your use >> cases as wel

Re: [DISCUSSION] ParDo Async Java API

2019-01-22 Thread Xinyu Liu
42 PM Kenneth Knowles wrote: > > > On Tue, Jan 22, 2019, 17:23 Reuven Lax >> >> >> On Tue, Jan 22, 2019 at 5:08 PM Xinyu Liu wrote: >> >>> @Steve: it's good to see that this is going to be useful in your use >>> cases as well. Thanks for sharing

Re: [DISCUSSION] ParDo Async Java API

2019-01-30 Thread Xinyu Liu
simply register tasks. > Providing a future/promise API is even more disciplined. > >>>>> >> > >>>>> >> Kenn > >>>>> >> > >>>>> >> On Wed, Jan 23, 2019 at 10:35 AM Scott Wegner > wrote: > &g

Re: New contributor

2019-01-30 Thread Xinyu Liu
Welcome and glad to see you here, Tao! Xinyu On Wed, Jan 30, 2019 at 12:00 PM Kenneth Knowles wrote: > Done. Welcome! > > Kenn > > On Wed, Jan 30, 2019 at 11:53 AM Tao Feng wrote: > >> Hi, >> >> I would like to contribute to beam and work on some tickets in my spare >> time. Could you please a

Re: Integration of python/portable runner tests for Samza runner

2019-04-24 Thread Xinyu Liu
Thanks for the useful pointers! We are looking forward to integrating both Portable and Python-specific tests for Samza runner. A few questions: - For portable running tests: by looking at the portableValidatesRunnerTask in flink_job_server.gradle, it seems it's the same set of Java tests but usin

Re: Integration of python/portable runner tests for Samza runner

2019-04-25 Thread Xinyu Liu
ber > of issues with Docker on Jenkins and wanted to lower build time for > PreCommit tests. Loopback means that an embedded Python environment will > be started which listens on localhost. It's comparable to Java's > EmbeddedSdkHarness. > > -Max > > On 24.04.19 20:10, X

Re: Pointers on Contributing to Structured Streaming Spark Runner

2019-09-13 Thread Xinyu Liu
Hi, Etienne, The slides are very informative! Thanks for sharing the details about how the Beam API are mapped into Spark Structural Streaming. We (LinkedIn) are also interested in trying the new SparkRunner to run Beam pipeine in batch, and contribute to it too. From my understanding, seems the f

Re: Pointers on Contributing to Structured Streaming Spark Runner

2019-09-18 Thread Xinyu Liu
! My comments are inline: > > Le vendredi 13 septembre 2019 à 12:16 -0700, Xinyu Liu a écrit : > > Hi, Etienne, > > The slides are very informative! Thanks for sharing the details about how > the Beam API are mapped into Spark Structural Streaming. > > > Thanks ! >

Re: [spark structured streaming runner] merge to master?

2019-10-10 Thread Xinyu Liu
+1 for merging to master. It's going to help a lot for us to try it out, and also contribute back for the missing features. Thanks, Xinyu On Thu, Oct 10, 2019 at 6:40 AM Alexey Romanenko wrote: > +1 for merging this new runner too (even if it’s not 100% ready for the > moment) in case if it doe

Re: Strict timer ordering in Samza and Portable Flink Runners

2019-10-23 Thread Xinyu Liu
Hi, Jan, Thanks for reporting this. I assigned BEAM-8459 to myself and will take a look soon. Thanks, Xinyu On Wed, Oct 23, 2019 at 2:54 AM Jan Lukavský wrote: > Hi, > > as part of [1] a new set of validatesRunner tests has been introduced. > T

Re: ***UNCHECKED*** Re: Samza Runner

2018-01-31 Thread xinyu liu
Thanks Kenneth to merge the Samza BEAM runner to the feature branch! We will work on the other items (docs, example, capability matrix ..) to get it to the master. Thanks, Xinyu On Fri, Jan 26, 2018 at 9:28 AM, Kenneth Knowles wrote: > Regarding merging directly to master, I agree that the code

Support non-keyed stateful ParDo

2018-04-25 Thread Xinyu Liu
Hi, I am working on adding the stateful ParDo to the upcoming BEAM Samza runner, and realized that the state for each ParDo processElement() is not only associated with the window of the element, but also the key of the element. Chatted with Kenneth over email about this design decision, which has

Re: Support non-keyed stateful ParDo

2018-04-25 Thread Xinyu Liu
tates, and look them up later and do some computation. The elements will be in the same window, but doesn't need to be of the same key. Thanks, Xinyu On Wed, Apr 25, 2018 at 6:02 PM, Robert Bradshaw wrote: > On Wed, Apr 25, 2018 at 5:45 PM Xinyu Liu wrote: > > > Hi, > >

Re: Support non-keyed stateful ParDo

2018-04-25 Thread Xinyu Liu
, 2018 at 6:31 PM, Kenneth Knowles wrote: > #2 could be accomplished with a convenience composite, yes? > > On Wed, Apr 25, 2018, 18:28 Xinyu Liu wrote: > >> @Robert: for your questions: >> >> 1) Side input won't work for us since it returns the whole collection

Support close of the iterator/iterable created from MapState/SetState

2018-05-10 Thread Xinyu Liu
Hi, folks, I'm in the middle of implementing the MapState and SetState in our Samza runner. We noticed that the state returns the Java Iterable for reading entries, keys, etc. For state backed by file-based kv store like rocksDb, we need to be able to let users explicitly close iterator/iterable t

Re: Support close of the iterator/iterable created from MapState/SetState

2018-05-10 Thread Xinyu Liu
ter a > certain amount of inactivity or uses weak references. > > On Thu, May 10, 2018 at 3:07 PM Xinyu Liu wrote: > >> Hi, folks, >> >> I'm in the middle of implementing the MapState and SetState in our Samza >> runner. We noticed that the state returns the Ja

Re: Support close of the iterator/iterable created from MapState/SetState

2018-05-10 Thread Xinyu Liu
er then having 100s or 1000s of users suffering > through a more complicated API. > > On Thu, May 10, 2018 at 3:44 PM Xinyu Liu wrote: > >> Load/evict blocks will help reduce the cache memory footprint, but we >> still won't be able to release the underlying resources. We

Re: Support close of the iterator/iterable created from MapState/SetState

2018-05-11 Thread Xinyu Liu
> release, and potentially masked errors that are hard to debug. It is less > error-prone than WeakReference, which is asking for trouble when objects > are collected en masse. Anecdotally I have heard that performance of this > kind of approach is poor, but I haven't experienc

Re: Support close of the iterator/iterable created from MapState/SetState

2018-05-12 Thread Xinyu Liu
g in Java will force a bunch of uninitialized declarations >> outside the try-with-resources block, kind of a lot of boilerplate LOC >> >> One thing that is good about your proposal is that the iterable could >> have some transparent caches that all free up together. >>

Re: Support close of the iterator/iterable created from MapState/SetState

2018-05-13 Thread Xinyu Liu
re a snapshot? Doesn't a native RocksDb iterator > require a snapshot to have well-defined contents? As you can tell, I don't > know enough about RocksDb details to be sure of my suggestions. > > Kenn > > [1] https://issues.apache.org/jira/browse/BEAM-2980 > [2] https://

Re: Support close of the iterator/iterable created from MapState/SetState

2018-05-14 Thread Xinyu Liu
Element CompletionStage element, ...) { >>> element.thenApply(...) >>> } >>> } >>> >>> The framework will automatically create the CompletionStage, and the >>> process method can specify a pipeline of asynchronous operations to perfor

Re: Support close of the iterator/iterable created from MapState/SetState

2018-05-14 Thread Xinyu Liu
unner's >> implementation path will be phased out. So should we expand this discussion >> to how the portability APIs enable the SDK and runner to collaborate to >> achieve this use case? It seems like the interaction you need is that the >> runner can tell that the SD

Re: Support close of the iterator/iterable created from MapState/SetState

2018-05-15 Thread Xinyu Liu
gt;>>>> threaded and scoped to a single DoFn. There's no one else who can write >>>>>> the >>>>>> state. If a BagState is read and written and read again, the user-facing >>>>>> logic should be unaware of the resources and not ha

Re: Support close of the iterator/iterable created from MapState/SetState

2018-05-15 Thread Xinyu Liu
I am happy to chat about it over hangout or slack too. Let's talk offline to set it up if needed. Thanks, Xinyu On Tue, May 15, 2018 at 10:51 AM, Xinyu Liu wrote: > For Samza runner, it's always processes key+window pairs serially. To > answer Luke's question: > >

[PROPOSAL] Merge samza-runner to master

2018-06-18 Thread Xinyu Liu
Hi, Folks, On behalf of the Samza team, I would like to propose to merge the samza-runner branch into master. The branch was created on Jan when we first introduced the Samza Runner [1], and we've been adding features and refining it afterwards. Now the runner satisfies the criteria outlined in [2

Re: [PROPOSAL] Merge samza-runner to master

2018-06-18 Thread Xinyu Liu
s wrote: >>> >>>> One thing that will be necessary is porting the build to Gradle. >>>> >>>> Kenn >>>> >>>> On Mon, Jun 18, 2018 at 11:57 AM Xinyu Liu >>>> wrote: >>>> >>>>> Hi, Folks, >

Re: [PROPOSAL] Merge samza-runner to master

2018-06-21 Thread Xinyu Liu
I updated the merge PR with the gradle integration (there was some Jenkins Java tests failure with google cloud quota issues. It seems not related to this patch). Please feel free to ping me if anything else is needed. Thanks, Xinyu On Mon, Jun 18, 2018 at 5:44 PM, Xinyu Liu wrote: > @Kenn

Re: [PROPOSAL] Merge samza-runner to master

2018-06-22 Thread Xinyu Liu
portability API, but given that it's still somewhat of a moving >>> target, and you have ongoing work in this direction, that may not be a >>> hard requirement. >>> >>> I'm a bit concerned that there is are only two contributors (but the >>> gi

Re: [PROPOSAL] Merge samza-runner to master

2018-06-25 Thread Xinyu Liu
ho triggers the green button so this happens? >>> >>> >>> >>> >>> On Sat, Jun 23, 2018 at 6:43 AM Jean-Baptiste Onofré >>> wrote: >>> >>>> +1 >>>> >>>> As the build is fine, it makes sense to merge pre

Re: Going on leave for a bit

2018-06-26 Thread Xinyu Liu
Congrats! Enjoy the time without sleep. Thanks, Xinyu On Tue, Jun 26, 2018 at 10:12 AM, Griselda Cuevas wrote: > Enjoy the time off Kenn! > > > On Tue, 26 Jun 2018 at 12:14, Kai Jiang wrote: > >> Congrats! Enjoy your family time. >> >> Best, >> Kai >> >> On Tue, Jun 26, 2018, 09:11 Alan Myrvol

Samza runner committer support

2018-06-28 Thread Xinyu Liu
Hi, All, Our Samza runner has recently been merged to master, and Kenn has been extremely instrumental during the whole process, e.g. design decisions, feature requests and code reviews. We would like thank him for all the support he has been given to us! Given Kenn is going to be on leave soon,

Re: Samza runner committer support

2018-06-30 Thread Xinyu Liu
elp but I may not be >>> the best person. >>> >>> On Thu, Jun 28, 2018 at 12:33 PM, Xinyu Liu >>> wrote: >>> >>>> Hi, All, >>>> >>>> Our Samza runner has recently been merged to master, and Kenn has been >>>

Re: Process JobBundleFactory for portable runner

2018-08-22 Thread Xinyu Liu
We are also interested in this Process JobBundleFactory as we are planning to fork a process to run python sdk in Samza runner, instead of using docker container. So this change will be helpful to us too. On the same note, we are trying out portable_runner.py to submit a python job. Seems it will c

Re: Status of IntelliJ with Gradle

2018-08-22 Thread Xinyu Liu
We experienced the same issues too in intellij after switching to latest version. I did the trick Luke mentioned before to include the beam-model-fn-execution and beam-model-job-management jars in the dependent modules to get around compilation. But I cannot get the vendored protobuf working. Seems

Re: Donating the Dataflow Worker code to Apache Beam

2018-09-13 Thread Xinyu Liu
Big +1 (non-googler). >From Samza Runner's perspective, we are very happy to see dataflow worker code so we can learn and compete :). Thanks, Xinyu On Thu, Sep 13, 2018 at 11:34 AM Suneel Marthi wrote: > +1 (non-googler) > > This is a great 👍 move > > Sent from my iPhone > > On Sep 13, 2018, a

Update state after firing

2018-10-09 Thread Xinyu Liu
Hi, guys, Current triggering allows us to either discard the state or accumulate the state after a window pane is fired. We use the extractOutput() in CombinFn to return the output value after the firing. All these have been working well for us. We do have a use case which seems not handled here:

Re: Update state after firing

2018-10-09 Thread Xinyu Liu
: > >> Have you considered using Beam's state API for this? >> >> On Tue, Oct 9, 2018 at 11:03 AM Xinyu Liu wrote: >> >>> Hi, guys, >>> >>> Current triggering allows us to either discard the state or accumulate >>> the state after a win

Re: Update state after firing

2018-10-09 Thread Xinyu Liu
e old data. Thanks, Xinyu On Tue, Oct 9, 2018 at 11:54 AM Reuven Lax wrote: > 2) is simply a bug that nobody has ever gotten around to fixing. Stateful > ParDo should support merging windows such as sessions. > > On Tue, Oct 9, 2018 at 11:40 AM Xinyu Liu wrote: > >> We do

Beam Samza Runner status update

2018-10-10 Thread Xinyu Liu
Hi, All, It's been over four months since we added the Samza Runner to Beam, and we've been making a lot of progress after that. Here I would like to update your guys and share some really good news happening here at LinkedIn: 1) First Beam job in production @LInkedIn! After a few rounds of testi

Re: Beam Samza Runner status update

2018-10-12 Thread Xinyu Liu
the future. > > > > -Rui > > > > On Wed, Oct 10, 2018 at 11:10 AM Jean-Baptiste Onofré > > mailto:j...@nanthrax.net>> wrote: > > > > Thanks for sharing and congra

Re: Performance of BeamFnData between Python and Java

2018-11-08 Thread Xinyu Liu
By looking at the gRPC dashboard published by the benchmark[1], it seems the streaming ping-pong operations per second for gRPC in python is around 2k ~ 3k qps. This seems quite low compared to gRPC performance in other languages, e.g. 600k qps for Java and Go. Is it expected to run multiple sdk_wo

Re: Performance of BeamFnData between Python and Java

2018-11-08 Thread Xinyu Liu
ce Python is limited to a single CPU core. > > [1] > https://performance-dot-grpc-testing.appspot.com/explore?dashboard=5652536396611584&widget=490377658&container=1286539696 > > > > On Wed, Nov 7, 2018 at 5:24 PM Xinyu Liu wrote: > >> By looking at the gRPC

Re: Performance of BeamFnData between Python and Java

2018-11-09 Thread Xinyu Liu
larger > bundles (we started with single element bundles). Default in the Flink > runner now is to cap bundles at 1000 elements or 1 second, whatever comes > first. With that, I have seen decent throughput for the pipeline above (~ > 5000k elements per second with single SDK worker). >

Re: Bay Area Apache Beam Kickoff!

2018-11-21 Thread Xinyu Liu
This is awesome! Glad to learn the latest Beam SQL and meet you guys there. Thanks, Xinyu On Tue, Nov 20, 2018 at 9:07 PM Jean-Baptiste Onofré wrote: > Nice !! > > Unfortunately I won't be able to be there. But good luck and I'm sure it > will be a great meetup ! > > Regards > JB > > On 20/11/2

Re: [Discuss] Propose Calcite Vendor Release (1.22.0)

2020-03-05 Thread Xinyu Liu
Thanks, Rui! We've been waiting for the new version of Calcite which has the fix to unflatten the fields. Seems this version will come with it. Thanks, Xinyu On Thu, Mar 5, 2020 at 12:41 AM Ismaël Mejía wrote: > The calcite vote already passed so this is good to go, thanks for > volunteering Ru

Running Beam python pipeline on Spark

2020-06-03 Thread Xinyu Liu
Hi, folks, I am trying to do some experiment to run a simple "hello world" python pipeline on a remote Spark cluster on Hadoop. So far I ran the SparkJobServerDriver on the Yarn application master and managed to submit a python pipeline to it. SparkPipelineRunner was able to run the portable pipel

Re: Running Beam python pipeline on Spark

2020-06-03 Thread Xinyu Liu
> > https://github.com/lyft/flinkk8soperator/blob/bb8834d69e8621d636ef2085fdc167a9d2c2bfa3/examples/beam-python/src/beam_example/pipeline.py#L16-L17 > > Thomas > > > On Wed, Jun 3, 2020 at 5:48 PM Xinyu Liu wrote: > >> Hi, folks, >> >> I am trying to do s

Re: Running Beam pipeline using Spark on YARN

2020-06-23 Thread Xinyu Liu
I am doing some prototyping on this too. I used spark-submit script instead of the rest api. In my simple setup, I ran SparkJobServerDriver.main() directly in the AM as a spark job, which will submit the python job to the default spark master url pointing to "local". I also use --files in the spark

DoFn @Setup with PipelineOptions

2021-03-01 Thread Xinyu Liu
Hi, all, Currently the @Setup method signature in DoFn does not support any arguments. This is a bit cumbersome to use for use cases such as creating a db connection, rest client or fetch some resources, where we would like to read the configs from the PipelineOptions during setup. Shall we suppor

Re: DoFn @Setup with PipelineOptions

2021-03-01 Thread Xinyu Liu
oFn constructor or as a > variable in the containing scope? Do you only know the option after the > pipeline is completely constructed so you need to make the switch at > runtime? Makes sense. I think passing options to @Setup is useful and > harmless. > > Kenn > > On Mon, Mar 1

Re: DoFn @Setup with PipelineOptions

2021-03-02 Thread Xinyu Liu
I created a ticket to track this: https://issues.apache.org/jira/browse/BEAM-11914. Thanks everyone for the comments! Thanks, Xinyu On Mon, Mar 1, 2021 at 4:45 PM Xinyu Liu wrote: > The reason for not passing it in directly is that we have a large amount > of configs here at LinkedIn so

Python Dataframe API issue

2021-03-25 Thread Xinyu Liu
Hi, folks, I am playing around with the Python Dataframe API, and seemly got an schema issue when converting pcollection to dataframe. I wrote the following code for a simple test: import apache_beam as beam from apache_beam.dataframe.convert import to_dataframe from apache_beam.dataframe.convert

Re: Python Dataframe API issue

2021-03-25 Thread Xinyu Liu
gt;>> >>>> This could be https://issues.apache.org/jira/browse/BEAM-11929 >>>> >>>> On Thu, Mar 25, 2021 at 4:26 PM Robert Bradshaw >>>> wrote: >>>> >>>>> This is definitely wrong. Looking into what's going on he

Re: Question about transformOverride

2021-04-21 Thread Xinyu Liu
@Chamikara: Yuhong and I are working on Samza Runner, and we are looking for a way to swap the transform for ease of use in testing. @Reuven: Your approach will work for this case, but we do have quite a few read transforms here and we have to plug this code in each of time with some testing logic

Re: Question about transformOverride

2021-04-21 Thread Xinyu Liu
> > Runners providing features to make it easier to test the way you describe, > though does sound very useful, but it does require the runner be aware of > each transform to be overridden, possibly increasing the runners dependency > surface. > > On Wed, Apr 21, 2021, 9:31 AM

Re: [PROPOSAL] Projection pushdown in Beam Java

2021-08-06 Thread Xinyu Liu
Very happy to see we will have pushdown optimizations for java pipelines! Thanks for sharing the proposal. Thanks, XInyu On Fri, Aug 6, 2021 at 9:26 AM Alexey Romanenko wrote: > Thanks Kyle, very promising. I left some comments. > > — > Alexey > > On 5 Aug 2021, at 19:59, Luke Cwik wrote: > >

Re: [ANNOUNCE] New committer: Ke Wu

2022-05-31 Thread Xinyu Liu
Congrats! Xinyu On Mon, May 30, 2022 at 7:46 AM Evan Galpin wrote: > Congrats Ke! > > - Evan > > > On Mon, May 30, 2022 at 4:11 AM Jan Lukavský wrote: > >> Congrats Ke! >> >> Jan >> On 5/29/22 04:12, Yi Pan wrote: >> >> Congrats, Ke! >> >> -Yi >> >> On Sat, May 28, 2022 at 6:57 PM Robert Burk