Re: Regarding the over window query support from Beam SQL

2021-03-05 Thread Rui Wang
Li > *Reply-To: *"user@beam.apache.org" > *Date: *Tuesday, March 2, 2021 at 3:37 PM > *To: *Rui Wang > *Cc: *"user@beam.apache.org" > *Subject: *Re: Regarding the over window query support from Beam SQL > > > > Hi Rui, > > > > Thanks

Re: Regarding the over window query support from Beam SQL

2021-03-02 Thread Rui Wang
On Tue, Mar 2, 2021 at 9:24 AM Tao Li wrote: > + Rui Wang. Looks like Rui has been working on this jira. > > > > > > *From: *Tao Li > *Date: *Monday, March 1, 2021 at 9:51 PM > *To: *"user@beam.apache.org" > *Subject: *Regarding the over window que

Re: Apache Beam SQL and UDF

2021-02-10 Thread Rui Wang
The problem that I can think of is maybe before the async call is completed, the UDF life cycle has reached to the end. -Rui On Wed, Feb 10, 2021 at 12:34 PM Talat Uyarer wrote: > Hi, > > We plan to use UDF on our sql. We want to achieve some kind of > filtering based on internal states. We

Re: Beam SQL UDF with variable arguments list?

2021-01-26 Thread Rui Wang
Yes I think Calcite does not support varargs in for scalar function (so in UDF). Please check this JIRA: https://issues.apache.org/jira/browse/CALCITE-2772 -Rui On Tue, Jan 26, 2021 at 2:04 AM Niels Basjes wrote: > Hi, > > I want to define a Beam SQL user defined function that accepts a

Re: [ANNOUNCE] Beam 2.25.0 Released

2020-10-26 Thread Rui Wang
Thank you Robin! -Rui On Mon, Oct 26, 2020 at 11:44 AM Pablo Estrada wrote: > Thanks Robin! > > On Mon, Oct 26, 2020 at 11:06 AM Robin Qiu wrote: > >> The Apache Beam team is pleased to announce the release of version 2.25.0. >> >> Apache Beam is an open source unified programming model to

Re: How to integrate Beam SQL windowing query with KafkaIO?

2020-08-27 Thread Rui Wang
an correctly materialize the window output as > expected. Thank you so much for your help! > > Thanks & Regards, > Minreng > > > On Mon, Aug 24, 2020 at 1:58 PM Rui Wang wrote: > >> Hi, >> >> I checked the query in your SO question and I think th

Re: How to integrate Beam SQL windowing query with KafkaIO?

2020-08-24 Thread Rui Wang
Hi, I checked the query in your SO question and I think the SQL usage is correct. My current guess is that the problem is how does watermark generate and advance in KafkaIO. It could be either the watermark didn't pass the end of your SQL window for aggregation or the data was lagging behind the

Re: Out-of-orderness of window results when testing stateful operators with TextIO

2020-08-23 Thread Rui Wang
Current Beam model does not guarantee an ordering after a GBK (i.e. Combine.perKey() in your). So you cannot expect that the C step sees elements in a specific order. As I recall on Dataflow runner, there is very limited ordering support. Hi +Reuven Lax can share your insights about it? -Rui

Re: Error when using @DefaultCoder(AvroCoder.class)

2020-07-08 Thread Rui Wang
Tried some code search in Beam repo but I didn't find the exact line of code that throws your exception. However, I believe for Java Classes you used in primitives (ParDo, CombineFn) and coders, it's very likely you need to make them serializable (i.e. implements Serializable). -Rui On Wed,

Re: [ANNOUNCE] Beam 2.20.0 Released

2020-04-16 Thread Rui Wang
Note that due to a bug on infrastructure, the website change failed to publish. But 2.20.0 artifacts are available to use right now. -Rui On Thu, Apr 16, 2020 at 11:45 AM Rui Wang wrote: > The Apache Beam team is pleased to announce the release of version 2.20.0. > > Apache Beam i

[ANNOUNCE] Beam 2.20.0 Released

2020-04-16 Thread Rui Wang
. -- Rui Wang, on behalf of The Apache Beam team

Re: Unbounded input join Unbounded input then write to Bounded Sink

2020-02-24 Thread Rui Wang
-to-stream > join will do a CoGroupByKey so it will wait. But SQL may in the future > adopt a better join for this case that can output records with lower > latency. > > It may be a bigger question whether HCatalogIO.write() has all the knobs > you would like. > > Kenn > &g

Re: Unbounded input join Unbounded input then write to Bounded Sink

2020-02-24 Thread Rui Wang
Sorry please remove " .apply(Window.into(FixedWindows.of(1 minute))" from the query above. -Rui On Mon, Feb 24, 2020 at 5:26 PM Rui Wang wrote: > I see. So I guess I wasn't fully understand the requirement: > > Do you want to have a 1-min window join on two unbounde

Re: Beam SQL Nested Rows are flatten by Calcite

2020-02-24 Thread Rui Wang
for it. > > Talat > > On Sat, Feb 15, 2020 at 6:26 PM Rui Wang wrote: > >> Because Calcite flattened Row so BeamSQL didn't need to deal with nested >> Row structure (as they were flattened in LogicalPlan). >> >> Depends on how that patch works. Nested row

Re: Unbounded input join Unbounded input then write to Bounded Sink

2020-02-24 Thread Rui Wang
SQL does not support such joins with your requirement: write to sink after every 1 min after window closes. You might can use state and timer API to achieve your goal. -Rui On Mon, Feb 24, 2020 at 9:50 AM shanta chakpram wrote: > Hi, > > I am trying to join inputs from Unbounded Sources

Re: Beam SQL Nested Rows are flatten by Calcite

2020-02-15 Thread Rui Wang
you mean they were flattened before by calcite or Does beam flatten > them too ? > > > > On Fri, Feb 14, 2020 at 1:21 PM Rui Wang wrote: > >> Nested row types might be less well supported (r.g. Row) because they >> were flattened before anyway. >> >> &

Re: Beam SQL Nested Rows are flatten by Calcite

2020-02-14 Thread Rui Wang
che/beam/sdk/extensions/sql/impl/rel/BeamCalcRel.java#L167 > > On Fri, Feb 14, 2020 at 10:33 AM Rui Wang wrote: > >> Calcite has improved to reconstruct ROW back in the output. See [1]. Beam >> need to update Calcite dependency to > 1.21 to adopt that. >> >> >&g

Re: Beam SQL Nested Rows are flatten by Calcite

2020-02-14 Thread Rui Wang
Calcite has improved to reconstruct ROW back in the output. See [1]. Beam need to update Calcite dependency to > 1.21 to adopt that. [1]: https://jira.apache.org/jira/browse/CALCITE-3138 -Rui On Thu, Feb 13, 2020 at 9:05 PM Talat Uyarer wrote: > Hi, > > I am trying to Beam SQL. But

Re: [ANNOUNCE] Beam 2.18.0 Released

2020-01-28 Thread Rui Wang
Thank you Udi for taking care of Beam 2.18.0 release! -Rui On Tue, Jan 28, 2020 at 10:59 AM Udi Meiri wrote: > The Apache Beam team is pleased to announce the release of version 2.18.0. > > Apache Beam is an open source unified programming model to define and > execute data processing

Re: apache beam 2.16.0 ?

2019-09-18 Thread Rui Wang
Hi, You can search(or ask) in dev@ for the progress of 2.16.0. The answer is the release of 2.16.0 is ongoing and it will be released once blockers are solved. -Rui On Wed, Sep 18, 2019 at 9:34 PM Yu Watanabe wrote: > Hello. > > I would like to use 2.16.0 to diagnose container problem,

Re: Slowly changing lookup cache as a Table in BeamSql

2019-07-16 Thread Rui Wang
Another approach is to let BeamSQL support it natively, as the title of this thread says: "as a Table in BeamSQL". We might be able to define a table with properties that says this table return a PCollectionView. By doing so we will have a trigger based PCollectionView available in SQL rel nodes,

Re: pubsub -> IO

2019-07-15 Thread Rui Wang
+user@beam.apache.org -Rui On Mon, Jul 15, 2019 at 6:55 AM Chaim Turkel wrote: > Hi, > I am looking to write a pipeline that read from a mongo collection. > I would like to listen to a pubsub that will have a object that will > tell me which collection and which time frame. > Is there

Re: [Java] Using a complex datastructure as Key for KV

2019-07-11 Thread Rui Wang
t; > - Shannon > > On Thu, Jul 11, 2019 at 4:59 PM Shannon Duncan > wrote: > >> Will do. Thanks. A new coder for deterministic Maps would be great in the >> future. Thank you! >> >> On Thu, Jul 11, 2019 at 4:58 PM Rui Wang wrote: >> >>> I think

Re: [Java] Using a complex datastructure as Key for KV

2019-07-11 Thread Rui Wang
:55 PM Shannon Duncan wrote: > So ArrayList doesn't work either, so just a standard List? > > On Thu, Jul 11, 2019 at 4:53 PM Rui Wang wrote: > >> Shannon, I agree with Mike on List is a good workaround if your element >> within list is deterministic and you are eager to mak

Re: [Java] Using a complex datastructure as Key for KV

2019-07-11 Thread Rui Wang
together I > might be able to figure that out. I can duplicate MapCoder and try to make > changes, but how will beam know to pick up that coder for a groupByKey? > > Thanks! > Shannon > > On Thu, Jul 11, 2019 at 4:42 PM Rui Wang wrote: > >> It could be just straightforward

Re: [Java] Using a complex datastructure as Key for KV

2019-07-11 Thread Rui Wang
It could be just straightforward to create a SortedMapCoder for TreeMap. Just add checks on map instances and then change verifyDeterministic. If this is a common need we could just submit it into Beam repo. [1]:

Re:

2019-06-24 Thread Rui Wang
imitive types. In my case the value is of type Row. > So can you let me know whether this is supported or this is a bug ? > > > *Thanks & Regards,* > > *Vishwas * > > > On Thu, Jun 20, 2019 at 11:25 PM Rui Wang wrote: > >> Oops I made a mistake, I didn't work

Re: AvroUtils converting generic record to Beam Row causes class cast exception

2019-06-19 Thread Rui Wang
PR/8376 is merged and it should be in 2.14.0 release. -Rui On Mon, Apr 22, 2019 at 10:49 AM Rui Wang wrote: > I see. I created this PR [1] to ask feedback from the reviewer who knows > better on Avro in Beam. > > -Rui > > > [1]: https://github.com/apache/beam/pull/8376

Re: SQL massively more resource-intensive? Memory leak?

2019-06-04 Thread Rui Wang
unning it, but by the end the memory usage was high and > the CPU about 100% so it seems to be the same problem. > > Worth noting perhaps that when I use the DirectRunner I have to turn > enforceImmutability off because of > https://issues.apache.org/jira/browse/BEAM-1714 > > On 20

Re: SQL massively more resource-intensive? Memory leak?

2019-06-03 Thread Rui Wang
Ha sorry I was only reading screenshots but ignored your other comments. So count fn indeed worked. Can I ask if your sql pipeline works on direct runner? -Rui On Mon, Jun 3, 2019 at 10:39 AM Rui Wang wrote: > BeamSQL actually only converts SELECT COUNT(*) query to the Java pipel

Re: SQL massively more resource-intensive? Memory leak?

2019-06-03 Thread Rui Wang
BeamSQL actually only converts SELECT COUNT(*) query to the Java pipeline that calls Java's builtin Count[1] transform. Could you implement your pipeline by Count transform to see whether this memory issue still exists? By doing so we could narrow down problem a bit. If using Java directly

Re: Question about unbounded in-memory PCollection

2019-05-07 Thread Rui Wang
Does TestStream.java [1] satisfy your need? -Rui [1] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/TestStream.java On Tue,

Re: AvroUtils converting generic record to Beam Row causes class cast exception

2019-04-22 Thread Rui Wang
to > Long* > } > > private static Object convertDateTimeStrict (Long value, Schema.FieldType > fieldType) { > checkTypeName(fieldType.getTypeName(), TypeName.DATETIME, " > dateTime"); > return new Instant(value); <-- *Creates a JodaTime > In

Re: AvroUtils converting generic record to Beam Row causes class cast exception

2019-04-15 Thread Rui Wang
onnect.version": 1, > "connect.name": > "org.apache.kafka.connect.data.Timestamp" > } > } > > *Attribute type in generated class:* > private org.joda.time.DateTime timeOfRelease; > > > So not sure why this type cas

Re: AvroUtils converting generic record to Beam Row causes class cast exception

2019-04-15 Thread Rui Wang
Read from the code and seems like as the logical type "timestamp-millis" means, it's expecting millis in Long as values under this logical type. So if you can convert joda-time to millis before calling "AvroUtils.toBeamRowStrict(genericRecord, this.beamSchema)", your exception will gone. -Rui

Re: Pipeline manager/scheduler frameworks

2019-02-08 Thread Rui Wang
Apache Airflow is a scheduling system that can help manage data pipelines. I have seen Airflow is used to manage a few thousand hive/spark/presto pipelines. -Rui On Fri, Feb 8, 2019 at 4:08 PM Sridevi Nookala < snook...@parallelwireless.com> wrote: > Hi, > > > Our analytics app has many data

Re: How to call Oracle stored proc in JdbcIO

2019-02-05 Thread Rui Wang
ing like this: > > {call procedure_name(?, ?, ?)} > > But then question is what do you expect from it? > > BTW JdbcIO is just a very simple ParDo which you can create your own when > dealing with anything special from oracle. > > Best regards > > JC > > > >

Re: How to call Oracle stored proc in JdbcIO

2019-02-05 Thread Rui Wang
Assuming this is a missing feature. Created https://jira.apache.org/jira/browse/BEAM-6525 to track it. -Rui On Fri, Jan 25, 2019 at 10:35 AM Rui Wang wrote: > Hi Community, > > There is a stackoverflow question [1] asking how to call Oracle stored > proc in Beam via JdbcIO. I kn

Re: First steps: Beam SQL over ProtoBuf

2019-02-01 Thread Rui Wang
It's awesome! Proto to/from Row will be a great utility! -Rui On Fri, Feb 1, 2019 at 9:59 AM Alex Van Boxel wrote: > Hi all, > > I got my first test pipeline running with *Beam SQL over ProtoBuf*. I was > so excited I need to shout this to the world ;-) > > >

How to call Oracle stored proc in JdbcIO

2019-01-25 Thread Rui Wang
Hi Community, There is a stackoverflow question [1] asking how to call Oracle stored proc in Beam via JdbcIO. I know very less on JdbcIO and Oracle, so just help ask here to say if anyone know: does JdbcIO support call stored proc? If there is no such support, I will create a JIRA for it and

Re: Suggestion or Alternative simples to read file from FTP

2019-01-03 Thread Rui Wang
For the calling external service, it's described in [1] as a pattern which has a small sample of code instruction. However, why not write a script to prepare the data first and then write a pipeline to process it? 1.

Re: BigQueryIO failure handling for writes

2018-11-16 Thread Rui Wang
To the first issue your are facing: In BeamSQL, we tried to solve the similar requirement. BeamSQL supports reading JSON format message from Pubsub, writing to Bigquery and writing messages that fail to parse in another Pubsub topic. BeamSQL uses the pre-processing transform to parse JSON

Re: coder issue?

2018-08-17 Thread Rui Wang
you got the >> exception. >> >> By initializing the Accum class you should be able to fix this problem. >> (e.g. String line = "";, instead of only String line;) >> >> Hope this helps, >> Robin >> >> On Thu, Aug 16, 2018 at 1

Re: coder issue?

2018-08-16 Thread Rui Wang
Sorry I forgot to attach the PR link: https://github.com/apache/beam/pull/6154/files#diff-7358f3f0511940ea565e6584f652ed02R342 -Rui On Thu, Aug 16, 2018 at 12:13 PM Rui Wang wrote: > Hi Mahesh, > > I think I had the same NPE when I explored self defined combineFn. I think > yo

Re: coder issue?

2018-08-16 Thread Rui Wang
Hi Mahesh, I think I had the same NPE when I explored self defined combineFn. I think your combineFn might still need to define a coder to help Beam run it in distributed environment. Beam tries to invoke coder somewhere and then throw a NPE as there is no one defined. Here is a PR I wrote that