Re: Schema Aware PCollection stabilization

2020-02-05 Thread Alex Van Boxel
Oh, I thought it was. I've set anyone with a link can edit... as long as there is no spam detected. Added you explicitly. _/ _/ Alex Van Boxel On Thu, Feb 6, 2020 at 2:01 AM Brian Hulette wrote: > Can you open up the document for commenting or editing? > > On Wed, Feb 5, 2020 at 4:48 AM Alex

Re: SQL PostCommit failure: ClassCastException: java.lang.Integer cannot be cast to java.lang.Long

2020-02-05 Thread Tomo Suzuki
(My understanding) The test ensures the CSV data stored in GCS should be readable through Datacatalog. It fails because an Integer value in the CSV was read as Long as per Datacatalog. > setting up from scratch is a good idea. I agree. Furthermore, it would be nice if it can test different type-

Re: Deterministic field ordering in derived schemas

2020-02-05 Thread Reuven Lax
Let's understand the use case first. My concern was with making SchemaCoder compatible between different invocations of a pipeline, and that's why I introduced encoding_position. This allows the field id to change, but we can preserve the same encoding_position. However this is internal to a pipel

Re: Deterministic field ordering in derived schemas

2020-02-05 Thread Kenneth Knowles
Are we in danger of reinventing protobuf's practice of giving fields numbers? (this practice itself almost certainly used decades before protobufs creation). Could we just use the same practice? Schema fields already have integer IDs and "encoding_position" (see https://github.com/apache/beam/blob

Re: SQL PostCommit failure: ClassCastException: java.lang.Integer cannot be cast to java.lang.Long

2020-02-05 Thread Kenneth Knowles
I think that was me... sorry! Is this a test where it is important that the data is pre-existing? Otherwise I would say that setting up from scratch is a good idea. Does anyone have context on it? I am happy to take on the small bit of coding, since I broke it. Kenn On Wed, Feb 5, 2020 at 1:22 P

Re: Schema Aware PCollection stabilization

2020-02-05 Thread Brian Hulette
Can you open up the document for commenting or editing? On Wed, Feb 5, 2020 at 4:48 AM Alex Van Boxel wrote: > I have the feeling people want to bring Schema Aware PCollection out of > experimental. I've started a document to follow up this track, please could > all interested parties add the op

[PROPOSAL] Add licenses and notices to SDK docker images

2020-02-05 Thread Hannah Jiang
Hello I wrote a design document about adding licenses and notices for third party dependencies to SDK docker images. I reviewed several tools for this purpose, please recommend other tools if anything in your mind, I am happy to review those as well. Link: https://s.apache.org/eauq6 Any kind of c

Re: [DISCUSS] Autoformat python code with Black

2020-02-05 Thread Robert Bradshaw
No, perhaps not. I agree there's consensus, just wondering what the next steps should be to get this in. (The presubmits look like they're all passing, with the exception of some breakage in java that should be completely unrelated. Of course there's already merge conflicts...) On Wed, Feb 5, 2020

Re: Jenkins jobs not running for my PR 10438

2020-02-05 Thread Rui Wang
Tried to trigger tests in the PR. Let's continue following up there. -Rui On Wed, Feb 5, 2020 at 4:07 PM Tomo Suzuki wrote: > Hi Beam Committers, > > Would you run the precommit checks for > https://github.com/apache/beam/pull/10769 > with the following 6 additional commands (one command per

Re: Jenkins jobs not running for my PR 10438

2020-02-05 Thread Tomo Suzuki
Hi Beam Committers, Would you run the precommit checks for https://github.com/apache/beam/pull/10769 with the following 6 additional commands (one command per comment) ? Run Java PostCommit Run Java HadoopFormatIO Performance Test Run BigQueryIO Streaming Performance Test Java Run Dataflow Valida

Re: [DISCUSS] Autoformat python code with Black

2020-02-05 Thread Ahmet Altay
Do we need a formal vote? There is consensus on this thread and on the PR. On Wed, Feb 5, 2020 at 3:37 PM Robert Bradshaw wrote: > The PR is looking good. Should we call a vote? > > On Mon, Jan 27, 2020 at 11:03 AM Robert Bradshaw > wrote: > > > > Thanks. I commented on the PR. I think if we're

Re: [DISCUSS] Autoformat python code with Black

2020-02-05 Thread Robert Bradshaw
The PR is looking good. Should we call a vote? On Mon, Jan 27, 2020 at 11:03 AM Robert Bradshaw wrote: > > Thanks. I commented on the PR. I think if we're going this route we > should add a pre-commit, plus instructions on how to run the tool > (similar to spotless). > > On Mon, Jan 27, 2020 at 1

Re: Enabling a new Jenkins job

2020-02-05 Thread Heejong Lee
Fixed. Seed job was overridden by another scheduled seed job. Thanks, Udi! On Wed, Feb 5, 2020 at 2:04 PM Heejong Lee wrote: > I created a new Jenkins job in my PR[1] and the new job shows "This > project is currently disabled"[2]. Does anybody know how to enable the > new job? > > [1]: https:/

Enabling a new Jenkins job

2020-02-05 Thread Heejong Lee
I created a new Jenkins job in my PR[1] and the new job shows "This project is currently disabled"[2]. Does anybody know how to enable the new job? [1]: https://github.com/apache/beam/pull/10758 [2]: https://builds.apache.org/job/beam_PostCommit_XVR_Spark/

Re: SQL PostCommit failure: ClassCastException: java.lang.Integer cannot be cast to java.lang.Long

2020-02-05 Thread Brian Hulette
So it looks like the schema for `integ_test_small_csv_test_1` was updated yesterday around the same time that PR#10563 went in, and it no longer matches the schema we expect in the test. I'm just going to change it back for now. I am curious who changed it and why, if the perpetrator is on this li

Re: SQL PostCommit failure: ClassCastException: java.lang.Integer cannot be cast to java.lang.Long

2020-02-05 Thread Tomo Suzuki
Brian, Thank you! (I don't need access as long as it's resolved) On Wed, Feb 5, 2020 at 4:05 PM Brian Hulette wrote: > > I can take a look at this. > > I think the access issue (and your problem running locally) is because you > need access to apache-beam-testing. I'm not sure if we have any f

Re: SQL PostCommit failure: ClassCastException: java.lang.Integer cannot be cast to java.lang.Long

2020-02-05 Thread Brian Hulette
I can take a look at this. I think the access issue (and your problem running locally) is because you need access to apache-beam-testing. I'm not sure if we have any formal process for that. Brian On Wed, Feb 5, 2020 at 5:31 AM Tomo Suzuki wrote: > HI Beam developers, > > Can somebody help thi

Re: A new reworked Elasticsearch 7+ IO module

2020-02-05 Thread Chamikara Jayalath
On Wed, Feb 5, 2020 at 6:35 AM Etienne Chauchot wrote: > Still there is something I don't agree with is that IOs can be tested on > mock. We don't really test IO behavior with mocks: there is always special > behaviors that cannot be reproduced in mocks (split, load, with corner > cases etc...).

Re: Deterministic field ordering in derived schemas

2020-02-05 Thread Luke Cwik
The Java compiler doesn't know about whether a field was added or removed when compiling source to class so there is no way for it to provide an ordering that puts "new" fields at the end and the source specification doesn't allow for users to state the field ordering that should be used. You can a

Re: Deterministic field ordering in derived schemas

2020-02-05 Thread Reuven Lax
I have yet to figure out a way to make Schema inference deterministically ordered, because Java reflection provides no guaranteed ordering (I suspect that the JVM returns functions by iterating over a hash map, or something of that form). Ideas such as "sort all the fields" actually makes things wo

Re: Deterministic field ordering in derived schemas

2020-02-05 Thread Robert Bradshaw
+1 to standardizing on a deterministic ordering for inference if none is imposed by the structure. On Wed, Feb 5, 2020, 8:55 AM Gleb Kanterov wrote: > There are Beam schema providers that use Java reflection to get fields for > classes with fields and auto-value classes. It isn't relevant for PO

Deterministic field ordering in derived schemas

2020-02-05 Thread Gleb Kanterov
There are Beam schema providers that use Java reflection to get fields for classes with fields and auto-value classes. It isn't relevant for POJOs with "creators", because function arguments are ordered. We cache instances of schema coders, but there is no guarantee that it's deterministic between

Re: A new reworked Elasticsearch 7+ IO module

2020-02-05 Thread Etienne Chauchot
Still there is something I don't agree with is that IOs can be tested on mock. We don't really test IO behavior with mocks: there is always special behaviors that cannot be reproduced in mocks (split, load, with corner cases etc...). There was in the past IOs that were tested using mocks and th

SQL PostCommit failure: ClassCastException: java.lang.Integer cannot be cast to java.lang.Long

2020-02-05 Thread Tomo Suzuki
HI Beam developers, Can somebody help this SQL PostCommit integration test failure? https://issues.apache.org/jira/browse/BEAM-9253 (since https://github.com/apache/beam/pull/10563) SQL PostCommit failure: ClassCastException: java.lang.Integer cannot be cast to java.lang.Long (Why not?) The dif

Schema Aware PCollection stabilization

2020-02-05 Thread Alex Van Boxel
I have the feeling people want to bring Schema Aware PCollection out of experimental. I've started a document to follow up this track, please could all interested parties add the open issues they have to the document. We can discuss the open issues in the doc and resolve and/or create JIRA tickets

Re: [PROPOSAL] Beam Schema Options

2020-02-05 Thread Alex Van Boxel
I would appreciate if someone would look at the following PR and get it to master: https://github.com/apache/beam/pull/10413# a lot of work needs to follow, but if we have the base already on master the next layers can follow. As a reminder, this is the base proposal: https://docs.google.com/docu

Re: A new reworked Elasticsearch 7+ IO module

2020-02-05 Thread Jean-Baptiste Onofre
Hi, We talked in the past about multiple/single module. IMHO the always preferred goal is to have a single module. However, it’s tricky when we have such difference, including on the user facing API. So, I would go with module per version, or use a specified version for a target Beam release.

Re: A new reworked Elasticsearch 7+ IO module

2020-02-05 Thread Etienne Chauchot
Hi all, We had a long discussion with Ludovic about this IO. I'd like to share with you to keep you informed and also gather your opinions 1. regarding version support: ES v2 is no more maintained by Elastic since 2018/02 so we plan to remove it from the IO. In the past we already retired ve