On Fri, May 15, 2020 at 8:10 PM Kenneth Knowles wrote:
>
>
> On Fri, May 15, 2020 at 5:25 PM Brian Hulette wrote:
>
>> After thinking about this more I've softened on it some, but I'm still a
>> little wary. I like Kenn's suggestion:
>>
>> > - it is OK to convert known logical types like MillisI
On Fri, May 15, 2020 at 5:25 PM Brian Hulette wrote:
> After thinking about this more I've softened on it some, but I'm still a
> little wary. I like Kenn's suggestion:
>
> > - it is OK to convert known logical types like MillisInstant or
> NanosInstant to a ZetaSQL "TIMESTAMP" or Calcite SQL "TI
On Fri, May 15, 2020 at 5:25 PM Brian Hulette wrote:
> After thinking about this more I've softened on it some, but I'm still a
> little wary. I like Kenn's suggestion:
>
> > - it is OK to convert known logical types like MillisInstant or
> NanosInstant to a ZetaSQL "TIMESTAMP" or Calcite SQL "TI
Thanks everyone. I was able to collect a lot of good feedback from everyone
who contributed. I am going to wrap it up for now and label the design as
"Design Finalized (Unimplemented)".
I really believe we have made a much better design than I initially wrote
up. I couldn't have done it without th
Sounds like a good plan to me, but I haven't been the one monitoring this
spreadsheet (or twitter for that matter). Spam is a concern, but
everything is moderated so I think we try it out and see if the volume is
really high enough to be an issue.
On Fri, May 15, 2020 at 4:46 PM Kenneth Knowles w
After thinking about this more I've softened on it some, but I'm still a
little wary. I like Kenn's suggestion:
> - it is OK to convert known logical types like MillisInstant or
NanosInstant to a ZetaSQL "TIMESTAMP" or Calcite SQL "TIMESTAMP WITH LOCAL
TIME ZONE"
> - it is OK to convert unknown lo
I like having easier notifications. It would be great if the notifications
had the content also. I get notifications on the spreadsheet, but since I
have to click through to look at them there is a little bit of friction.
1. Is it still easy to add the other columns to record LGTM and when they
ha
Hi Steve,
Yes that's correct.
On Fri, May 15, 2020 at 2:11 PM Steve Niemitz wrote:
> ah! ok awesome, I think that was the piece I was misunderstanding. So I
> _can_ use a SDF to split the work initially (like I was manually doing in
> #1), but it just won't be further split dynamically on data
@all
We are receiving a few emails from interested applicants - feel no pressure
to respond to them. I will be monitoring the dev list and respond
accordingly. If they have specific questions regarding either of the
projects, I will direct them to the mentors.
Stay safe,
Aizhamal
On Mon, May 11,
Hi all,
I wanted to propose some improvements to the existing process of proposing
tweets for the Apache Beam Twitter account.
Currently the process requires people to request edit access, and then add
tweets on the spreadsheet [1]. I use it a lot because I know the process
well, but I think this
I would not describe the base type as the "wire type." If that were true,
then the only base type we should support should be byte array.
My simple point is that this is no different than normal schema fields. You
will find many normal schemas containing data encoded into other field
types. You wi
My understanding is that the base type is effectively the wire format at
the type, where the logical type is the in-memory representation for Java.
For org.joda.time.Instant, this is just a wrapper around the underlying
Long. However for the Date logical type, the LocalDate type has struct as
the i
ah! ok awesome, I think that was the piece I was misunderstanding. So I
_can_ use a SDF to split the work initially (like I was manually doing in
#1), but it just won't be further split dynamically on dataflow v1 right
now. Is my understanding there correct?
On Fri, May 15, 2020 at 5:03 PM Luke
On Fri, Apr 24, 2020 at 11:56 AM Brian Hulette wrote:
> When we created the portable representation of schemas last summer we
> intentionally did not include DATETIME as a primitive type [1], even though
> it already existed as a primitive type in Java [2]. There seemed to be
> consensus around a
#3 is the best when you implement @SplitRestriction on the SDF.
The size of each restriction is used to better balance the splits within
Dataflow runner v2 so it is less susceptible to the too many or unbalanced
split problem.
For example, if you have 4 workers and make 20 splits, the splits will
Thanks for the replies so far. I should have specifically mentioned above,
I am building a bounded source.
While I was thinking this through, I realized that I might not actually
need any fancy splitting, since I can calculate all my split points up
front. I think this goes well with Ismaël's su
This seems like a good idea. This stuff is all still marked "experimental"
for exactly this reason. This is a case where the name fits perfectly. Both
SQL and schemas are new and still working towards a form that can be
supported indefinitely without layers of workarounds that will never
quiesce. I
For the ones without the label, someone would need to use blame and track
back to why it was sickbayed.
On Fri, May 15, 2020 at 1:08 PM Kenneth Knowles wrote:
> There are 101 instances of @Ignore, and I've listed them below. A few
> takeaways:
>
> - highly concentrated in ZetaSQL, and then seco
There are 101 instances of @Ignore, and I've listed them below. A few
takeaways:
- highly concentrated in ZetaSQL, and then second tier in various state
tests specific to a runner
- there are not that many overall, so I'm not sure a report will add much
- they do not all have Jiras
- they do n
I just started having an issue that looks similar this morning. I'm trying
out running the Python SqlTransform tests with fn_runner (currently they
only execute continuously on Flink and Spark), but I'm running into
occasional failures. The errors always come from either python or java
attempting t
Hi Luke and Beam committers,
Would you check this PR to use Linkage Checker's exclusion file?
https://github.com/apache/beam/pull/11674
This script used to use "diff" command to identify new linkage errors by
comparing line by line. With this PR, it identifies new linkage errors in
an appropriate
Respected sir/mam,
I came around the projects proposed by Apache Beam for Season of Docs 2020.
I am a newbie to organisation but really liked the ideas of projects and
would love to start contributing and prepare my proposal for Season of Docs.
Please guide me through. Where should I start and th
For the Bounded case if you do not have a straight forward way to split at
fractions, or simply if you do not care about Dynamic Work Rebalancing. You can
get away implementing a simple DoFn (without Restrictions) based implementation
and evolve from it. More and more IOs at Beam are becoming DoFn
Lateness should never be introduced inside a pipeline - generally late data
can only come from a source. If data was not dropped as late earlier in
the pipeline, it should not be dropped after the file write. I suspect that
this is a bug in how the Flink runner handles the Reshuffle transform, but
Dear Apace,
My name is Amr Maghraby, I am a new graduate from AAST college got the
first rank on my class with CGPA 3.92 and joined the international
competition in the US called ROV got the second worldwide and last summer I
have involved in Google Summer of code 2019 and did good work also, I
par
Thanks to Kyle we captured some additional logging for
https://issues.apache.org/jira/browse/BEAM-9975. I spent a little time
looking at it and found two different issues (see details in the comments):
https://issues.apache.org/jira/browse/BEAM-10006 - PipelineOptions can pick
up definitions from u
If it is an unbounded source then SDF is a winner since you are not giving
up anything with it when compared to the legacy UnboundedSource API since
Dataflow doesn't support dynamic splitting of unbounded SDFs or
UnboundedSources (only initial splitting). You gain the ability to compose
sources and
I'm going to be writing a new IO (in java) for reading files in a custom
format, and want to make it splittable. It seems like I have a choice
between the "legacy" source API, and newer experimental SDF API. Is there
any guidance on which I should use? I can likely tolerate some API churn
as wel
Hey,
I created a transform method in Java and now I want to use it in Python
using Cross-language.
I got pretty stuck with the following problem:
p
| GenerateSequence(...)
|ExternalTransform(...)
*=> is working like a charm *
p
| Create(...)
| ExternalTransform(...)
*=> getting assert pardo_payl
Fixed, thanks for spotting that! One of the regex wasn't properly
interpreted in the latest version of Grafana, but now it should be OK.
On Thu, May 14, 2020 at 11:58 PM Pablo Estrada wrote:
> I noticed that postcommit status dashboard shows 0/1 values - I remember
> it used to show green/red fo
Hi Jose,
thank you for putting the effort to get example which demonstrate your
problem.
You are using a streaming pipeline and it seems that watermark in
downstream already advanced further, so when your File pane arrives, it is
already late. Since you define that lateness is not tolerated, it i
31 matches
Mail list logo