Re: [Proposal] Kaskada DSL and FnHarness for Temporal Queries

2023-06-12 Thread Ben Chambers
the Generative AI world at the >> "Generative AI Meetup" Wednesday afternoon - if the doc Ben linked to (or >> GenAI) is interesting to you and you'll be at the conference I'd love to >> touch base in person! >> >> -Ryan >> >> On

[Proposal] Kaskada DSL and FnHarness for Temporal Queries

2023-06-12 Thread Ben Chambers
Hello Beam! Kaskada has created a query language for expressing temporal queries, making it easy to work with multiple streams and perform temporally correct joins. We’re looking at taking our native, columnar execution engine and making it available as a PTransform and FnHarness for use with Apac

Re: Java SDK Extensions

2018-10-03 Thread Ben Chambers
On Wed, Oct 3, 2018 at 12:16 PM Jean-Baptiste Onofré wrote: > Hi Anton, > > jackson is the json extension as we have XML. Agree that it should be > documented. > > Agree about join-library. > > sketching is some statistic extensions providing ready to use stats > CombineFn. > > Regards > JB > > O

Re: [portablility] metrics interrogations

2018-09-12 Thread Ben Chambers
I think there is a confusion of terminology here. Let me attempt to clarify as a (mostly) outsider. I think that Etienne is correct, in that the SDK harness only reports the difference associated with a bundle. So, if we have a metric that measures execution time, the SDK harness reports the time

Re: Existing transactionality inconsistency in the Beam Java State API

2018-05-24 Thread Ben Chambers
I think Kenn's second option accurately reflects my memory of the original intentions: 1. I remember we we considered either using the Future interface or calling the ReadableState interface a future, and explicitly said "no, future implies asynchrony and that the value returned by `get` won't cha

Re: What is the future of Reshuffle?

2018-05-21 Thread Ben Chambers
+1 to introducing alternative transforms even if they wrap Reshuffle The benefits of making them distinct is that we can put appropriate Javadoc in place and runners can figure out what the user is intending and whether Reshuffle or some other implementation is appropriate. We can also see which o

Re: Beam parent SNAPSHOTs are not being built

2018-04-25 Thread Ben Chambers
It looks like the problem is that your project uses beam-examples-parent as its parent. This was the suggested way of creating a Maven project because we published a POM and this provided an easy way to make sure that derived projects inherited the same options and versions as Beam used. With the

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

2018-04-17 Thread Ben Chambers
That sounds like a very reasonable choice -- given the discussion seemed to be focusing on the differences between these two categories, separating them will allow the proposal (and implementation) to address each category in the best way possible without needing to make compromises. Looking forwa

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

2018-04-12 Thread Ben Chambers
Le 12 avr. 2018 08:06, "Robert Bradshaw" a écrit : > >> By "type" of metric, I mean both the data types (including their >> encoding) and accumulator strategy. So sumint would be a type, as would >> double-distribution. >> >> On Wed, Apr 11, 2018 a

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

2018-04-11 Thread Ben Chambers
t; feature > that I was proposing we kick down the road (and may never get to). What I'm > suggesting is that we support custom metrics of standard type. > > On Wed, Apr 11, 2018 at 5:52 PM Ben Chambers wrote: > >> The metric api is designed to prevent user defined metric

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

2018-04-11 Thread Ben Chambers
The metric api is designed to prevent user defined metric types based on the fact they just weren't used enough to justify support. Is there a reason we are bringing that complexity back? Shouldn't we just need the ability for the standard set plus any special system metrivs? On Wed, Apr 11, 2018

Re: org.apache.beam.sdk.values.TupleTag#genId and stacktraces?

2018-04-10 Thread Ben Chambers
I believe it doesn't need to be stable across refactoring, only across all workers executing a specific version of the code. Specifically, it is used as follows: 1. Create a pipeline on the user's machine. It walks the stack until the static initializer block, which provides an ID. 2. Send the pip

Re: About the Gauge metric API

2018-04-07 Thread Ben Chambers
tric? What > parameter combos are supported? > 2. Should the SDK support different Metric variables contributing to the > same Metric value? If so, how does the SDK communicate that? > 3. Should runtime-only scopes be exposed to the user? Useful for > Distributions? > 4. What sho

Re: About the Gauge metric API

2018-04-06 Thread Ben Chambers
Beam should support some sort of label / tag / worker-id for >> aggregation of Gauges (maybe other metrics?) >> >> -P. >> >> On Fri, Apr 6, 2018 at 11:21 AM Ben Chambers >> wrote: >> >>> Gauges are incredibly useful for exposing the current st

Re: About the Gauge metric API

2018-04-06 Thread Ben Chambers
th are same to the user. >>>> >>>> On Fri, Apr 6, 2018 at 9:31 AM Pablo Estrada >>>> wrote: >>>> >>>>> Hi Ben : D >>>>> >>>>> Sure, that's reasonable. And perhaps I started the discussion in the &

Re: About the Gauge metric API

2018-04-06 Thread Ben Chambers
See for instance how gauge metrics are handled in Prometheus, Datadog and Stackdriver monitoring. Gauges are perfect for use in distributed systems, they just need to be properly labeled. Perhaps we should apply a default tag or allow users to specify one. On Fri, Apr 6, 2018, 9:14 AM Ben

Re: About the Gauge metric API

2018-04-06 Thread Ben Chambers
Some metrics backend label the value, for instance with the worker that sent it. Then the aggregation is latest per label. This makes it useful for holding values such as "memory usage" that need to hold current value. On Fri, Apr 6, 2018, 9:00 AM Scott Wegner wrote: > +1 on the proposal to supp

Re: (java) stream & beam?

2018-03-13 Thread Ben Chambers
e need a very custom > thing and we are done for me. > > Le 13 mars 2018 19:26, "Ben Chambers" a écrit : > >> I think the existing rationale (not introducing lots of special fluent >> methods) makes sense. However, if we look at the Java Stream API, we >> p

Re: (java) stream & beam?

2018-03-13 Thread Ben Chambers
I think the existing rationale (not introducing lots of special fluent methods) makes sense. However, if we look at the Java Stream API, we probably wouldn't need to introduce *a lot* of fluent builders to get most of the functionality. Specifically, if we focus on map, flatMap, and collect from th

Re: @TearDown guarantees

2018-02-18 Thread Ben Chambers
s. > > Le 18 févr. 2018 20:07, "Reuven Lax" a écrit : > >> >> >> On Sun, Feb 18, 2018 at 10:50 AM, Romain Manni-Bucau < >> rmannibu...@gmail.com> wrote: >> >>> >>> >>> Le 18 févr. 2018 19:28, "Ben Chambers" a éc

Re: @TearDown guarantees

2018-02-18 Thread Ben Chambers
It feels like his thread may be a bit off-track. Rather than focusing on the semantics of the existing methods -- which have been noted to be meet many existing use cases -- it would be helpful to focus on more on the reason you are looking for something with different semantics. Some possibilitie

Re: untyped pipeline API?

2018-01-30 Thread Ben Chambers
It sounds like in your specific case you're saying that the same encoding can be viewed by the Java type system two different ways. For instance, if you have an object Person that is convertible to JSON using Jackson, than that JSON encoding can be viewed as either a Person or a Map looking at the

Re: [DISCUSS] State of the project: Feature roadmap for 2018

2018-01-30 Thread Ben Chambers
hould focus on is making >>>> simple things simple. Beam is very powerful, but it doesn't always >>>> make easy things easy. Features like schema'd PCollections could go a >>>> long way here. Also fully fleshing out/smoothing our runner >>>> p

Re: [DISCUSS] State of the project: Feature roadmap for 2018

2018-01-30 Thread Ben Chambers
ity story is part of this too. >>> >>> For beam 3.x we could also reason about if there's any complexity that >>> doesn't hold its weight (e.g. side inputs on CombineFns). >>> >>> On Mon, Jan 22, 2018 at 9:20 PM, Jean-Baptiste Onofré >>>

Re: [PROPOSAL] Add a blog post for every new release

2018-01-29 Thread Ben Chambers
+1. Release notes from jira may be hard to digest for people outside the project. A short summary for may be really helpful for those considering Beam or loosely following the project to get an understanding of what is happening. It also presents a good opportunity share progress externally, which

[DISCUSS] State of the project: Feature roadmap for 2018

2018-01-22 Thread Ben Chambers
Thanks Davor for starting the state of the project discussions [1]. In this fork of the state of the project discussion, I’d like to start the discussion of the feature roadmap for 2018 (and beyond). To kick off the discussion, I think the features could be divided into several areas, as follows:

Re: Callbacks/other functions run after a PDone/output transform

2017-12-15 Thread Ben Chambers
en its last pane has fired. I could see this be a property on the View > transform itself. In terms of implementation - I tried to figure out how > side input readiness is determined, in the direct runner and Dataflow > runner, and I'm completely lost and would appreciate some help. &g

Re: Callbacks/other functions run after a PDone/output transform

2017-12-04 Thread Ben Chambers
This would be absolutely great! It seems somewhat similar to the changes that were made to the BigQuery sink to support WriteResult ( https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteResult.java ). I find it helpfu

Re: makes bundle concept usable?

2017-11-30 Thread Ben Chambers
log | Github | LinkedIn > > > 2017-11-30 18:43 GMT+01:00 Ben Chambers : > > Beam includes a GroupIntoBatches transform (see > > > https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupIntoBatches.java > ) > > which I be

Re: makes bundle concept usable?

2017-11-30 Thread Ben Chambers
Beam includes a GroupIntoBatches transform (see https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupIntoBatches.java) which I believe was intended to be used as part of such a portable IO. It can be used to request that elements are divided in

Re: [DISCUSS] Updating contribution guide for gitbox

2017-11-28 Thread Ben Chambers
One risk to "squash and merge" is that it may lead to commits that don't have clean descriptions -- for instance, commits like "Fixing review comments" will show up. If we use (a) these would also show up as separate commits. It seems like there are two cases of multiple commits in a PR: 1. Multip

Re: [DISCUSS] Thinking about Beam 3.x roadmap and release schedule

2017-11-28 Thread Ben Chambers
Strong +1 to both increasing the frequency of minor releases and also putting together a road map for the next major release or two. I think it would be great to communicate to the community the direction Beam is taking in the future -- what things will users be able to do with 3.0 or 4.0 that the

Re: [DISCUSSION] Runner agnostic metrics extractor?

2017-11-27 Thread Ben Chambers
I think discussing a runner agnostic way of configuring how metrics are extracted is a great idea -- thanks for bringing it up Etienne! Using a thread that polls the pipeline result relies on the program that created and submitted the pipeline continuing to run (eg., no machine faults, network pro

Re: [VOTE] Release 2.2.0, release candidate #2

2017-11-08 Thread Ben Chambers
This seems to be the second thread entitled "[VOTE] Release 2.2.0, release candidate #2". The subject and description refer to release candidate #2, however the artifacts mention v2.2.0-RC3. Which release candidate is this vote thread for? On Wed, Nov 8, 2017 at 12:52 PM Jean-Baptiste Onofré wrot

Re: [DISCUSS] Move away from Apache Maven as build tool

2017-10-30 Thread Ben Chambers
I think both Gradle and Bazel are worth exploring. Gradle is definitely more common in the wild, but Bazel may be a better fit for the large mixture of languages being developed in one codebase within Beam. It might be helpful for us to list what functionality we want from such a tool, and then hav

Re: Java pre/post commit test suite breakage

2017-10-23 Thread Ben Chambers
I believe license errors are detected by Apache RAT, which are part of the release profile. I believe you need to run "mvn clean verify -Prelease" to enable anything associated with that profile. On Mon, Oct 23, 2017 at 11:46 AM Valentyn Tymofieiev wrote: > Hi Beam-Dev, > > It's been >5 days sin

Re: TikaIO Refactoring

2017-10-04 Thread Ben Chambers
It looks like ReadableFile#open does currently decompress the stream, but it seems like we could add a ReadableFile#openRaw(...) or something like that which didn't implicitly decompress. Then libraries such as Tika which want the *actual* file content could use that method. Would that address your

Re: TikaIO concerns

2017-09-22 Thread Ben Chambers
PCollection that is then processed in some other way. On Fri, Sep 22, 2017 at 10:18 AM Allison, Timothy B. wrote: > Do tell... > > Interesting. Any pointers? > > -Original Message----- > From: Ben Chambers [mailto:bchamb...@google.com.INVALID] > Sent: Friday, September 2

Re: TikaIO concerns

2017-09-22 Thread Ben Chambers
Regarding specifically elements that are failing -- I believe some other IO has used the concept of a "Dead Letter" side-output,, where documents that failed to process are side-output so the user can handle them appropriately. On Fri, Sep 22, 2017 at 9:47 AM Eugene Kirpichov wrote: > Hi Tim, >

Re: Proposal: Unbreak Beam Python 2.1.0 with 2.1.1 bugfix release

2017-09-19 Thread Ben Chambers
Any elaboration or jira issues describing what is broken? Any proposal for what changes need to happen to fix it? On Tue, Sep 19, 2017, 5:49 PM Chamikara Jayalath wrote: > +1 for cutting 2.1.1 for Python SDK only. > > Thanks, > Cham > > On Tue, Sep 19, 2017 at 5:43 PM Robert Bradshaw > > wrote:

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-13 Thread Ben Chambers
One possible issue with this is that updating a thread local is likely to be much more expensive than passing an additional argument. Also, not all code called from within the DoFn will necessarily be in the same thread (eg., sometimes we create a pool of threads for doing work). It may be *more* c

Re: Adding Serializable implements to SDK classes like WindowedValue etc...

2017-08-24 Thread Ben Chambers
If the user classes are not Serializable, how does adding Serializable to the WindowedValue help? The user class, which is stored in a field in the WindowedValue will still be non-serializable, and thus cause problems. On Thu, Aug 24, 2017 at 11:59 AM Kobi Salant wrote: > right, if it is not kry

Re: Adding Serializable implements to SDK classes like WindowedValue etc...

2017-08-24 Thread Ben Chambers
I think as Luke pointed out, one problem with Serializable is that it isn't guaranteed to be stable across different JVM versions. The only drawback I see here from the user perspective is that if I have already written a custom coder to handle my type in some reasonable way, this is going to igno

Re: [PROPOSAL] "Requires deterministic input"

2017-08-10 Thread Ben Chambers
d make sure > we > > > > >> document that it is explicitly the input elements that will be > > > replayed, > > > > >> and bundles and other operational are still arbitrary. > > > > >> > > > > >> > > > &g

Re: [PROPOSAL] "Requires deterministic input"

2017-08-09 Thread Ben Chambers
an annotation or change > a type to indicate that the input PCollection should be > well-defined/deterministically-replayable. I don't have a really strong > opinion. > > Kenn > > On Tue, Mar 21, 2017 at 4:53 PM, Ben Chambers > > wrote: > > > Allowing an

Re: [DISCUSS] Beam pipeline logical and physical DAGs visualization.

2017-08-04 Thread Ben Chambers
I think with GraphViz you can put a label on the "subgraph" that represents each composite. This would allow each leaf transform to be just the name of that leaf within the composite, rather than full name. Is that what you were suggesting Robert? On Thu, Aug 3, 2017 at 10:06 PM Robert Bradshaw w

Re: How to test a transform against an inaccessible ValueProvider?

2017-07-19 Thread Ben Chambers
I also reported something similar to this as https://issues.apache.org/jira/browse/BEAM-2577. That issue was reported because we don't have any tests that use a runner and attempt to pass ValueProviders in. This means that we've found bugs such as NestedValueProviders used with non-serializable ano

Re: Proposal and plan: new TextIO features based on SDF

2017-07-12 Thread Ben Chambers
Regarding changing the coder -- keep in mind that there may be persisted state somewhere, so we can't just change the coder once this is used. If the processing of scanning for modified and new files reported the last-modified-time, could we use that and have the SDF report KV with the last-modifi

Re: [DISCUSS] Bridge beam metrics to underlying runners to support metrics reporters?

2017-06-23 Thread Ben Chambers
I think the distinction between metrics being reported (the API we give users to create Metrics) from getting metrics out of (currently either using the PipelineResult to query results or connecting to an external metric store) is important. There is an additional distinction to be made in data pr

Re: How to create a custom trigger?

2017-06-07 Thread Ben Chambers
There hasn't been a need for user defined triggers and we've found it is really hard to create triggers with proper behavior. Can you elaborate on why you are trying to use triggers to understand watermarks in this way? It's not clear how this trigger would be useful to that understanding. On Wed

Re: [DISCUSS] Source Watermark Metrics

2017-06-06 Thread Ben Chambers
n not express it. > > Best,JingsongLee > -- > From:Ben Chambers > Time:2017 Jun 2 (Fri) 21:46 > To:dev ; JingsongLee > Cc:Aviem Zur ; Ben Chambers > > Subject:Re: [DISCUSS] Source Watermark Metrics > I think havi

Re: [DISCUSS] Source Watermark Metrics

2017-06-02 Thread Ben Chambers
2. Jun 2017, at 03:52, JingsongLee wrote: > > > > @Aviem Zur @Ben Chambers What do you think about the value of > METRIC_MAX_SPLITS? > > > > > --From:JingsongLee > Time:2017 May 11 (Thu) > 16:37To:dev@beam.apache.

Re: Behavior of Top.Largest

2017-05-14 Thread Ben Chambers
Exposing the CombineFn is necessary for use with composed combine or combining value state. There may be other reasons we made this visible, but these continue to justify it. On Sun, May 14, 2017, 1:00 PM Reuven Lax wrote: > I believe the reason why this is called Top.largest, is that originally

Re: How I hit a roadblock with AutoValue and AvroCoder

2017-04-05 Thread Ben Chambers
Correction autovalue coder. On Wed, Apr 5, 2017, 2:24 PM Ben Chambers wrote: > Serializable coder had a separate set of issues - often larger and less > efficient. Ideally, we would have an avrocoder. > > On Wed, Apr 5, 2017, 2:15 PM Pablo Estrada > wrote: > > As

Re: How I hit a roadblock with AutoValue and AvroCoder

2017-04-05 Thread Ben Chambers
Serializable coder had a separate set of issues - often larger and less efficient. Ideally, we would have an avrocoder. On Wed, Apr 5, 2017, 2:15 PM Pablo Estrada wrote: > As a note, it seems that SerializableCoder does the trick in this case, as > it does not require a no-arg constructor for th

Re: Proposal on porting PAssert away from aggregators

2017-03-30 Thread Ben Chambers
When you say instance do you mean a different named PTransform class? Or just a separate instance of whatever "HandleAssertResult" PTransform is introduced? If the latter, I believe that makes sense. It also addresses the problems with committed vs. attempted metrics. Specifically: Since we want t

Re: [PROPOSAL] "Requires deterministic input"

2017-03-21 Thread Ben Chambers
Allowing an annotation on DoFn's that produce deterministic output could be added in the future, but doesn't seem like a great option. 1. It is a correctness issue to assume a DoFn is deterministic and be wrong, so we would need to assume all transform outputs are non-deterministic unless annotate

Re: [DISCUSS] Per-Key Watermark Maintenance

2017-02-28 Thread Ben Chambers
Following up on what Kenn said, it seems like the idea of a per-key watermark is logically consistent with Beam model. Each runner already chooses how to approximate the model-level concept of the per-key watermark. The list Kenn had could be extended with things ilke: 4. A runner that assumes the

Re: Metrics for Beam IOs.

2017-02-18 Thread Ben Chambers
r using a "pull-like" would use an adapter ? > >>> > >>> On Sat, Feb 18, 2017 at 6:27 PM Jean-Baptiste Onofré > >>> wrote: > >>> > >>>> Hi Ben, > >>>> > >>>> ok it's what I thought. Thanks for the clarification.

Re: Metrics for Beam IOs.

2017-02-18 Thread Ben Chambers
gt;> their >>> own metrics reporting system - but that's for the runner author to >> decide. >>> Stas did this for the Spark runner because Spark doesn't report back >>> user-defined Accumulators (Spark's Aggregators) to it's Metrics system. &

Re: Metrics for Beam IOs.

2017-02-17 Thread Ben Chambers
cs I think we should be fine, though this could also change between >> runners as well. >> >> Finally, considering "split-metrics" is interesting because on one hand it >> affects the pipeline directly (unbalanced partitions in Kafka that may >> cause backlog) b

Re: Metrics for Beam IOs.

2017-02-14 Thread Ben Chambers
On Tue, Feb 14, 2017 at 9:07 AM Stephen Sisk wrote: > hi! > > (ben just sent his mail and he covered some similar topics to me, but I'll > keep my comments intact since they are slightly different) > > * I think there are a lot of metrics that should be exposed for all > transforms - everything f

Re: Metrics for Beam IOs.

2017-02-14 Thread Ben Chambers
Thanks for starting this conversation Ismael! I too have been thinking we'll need some general approach to metrics for IO in the near future. Two general thoughts: 1. Before making the metrics configurable, I think it would be worthwhile to see if we can find the right set of metrics that provide

Re: Should you always have a separate PTransform class for a new transform?

2017-02-07 Thread Ben Chambers
Going back to the beginning, why not "Combine.perKey(Count.fn())" or some such? We have a lot of boilerplate already to support "Count.perKey()" and that will only become worse when we try to have a Count.PerKey class that has the functionality of Combine.PerKey, why not just get rid of the boiler

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Ben Chambers
ing are: > > - A transform that takes elements and produces batches, like Robert said > > - A simple Beam-agnostic library that takes Java objects and produces > > batches of Java objects, with an API that makes it convenient to use in a > > typical batching DoFn > >

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Ben Chambers
The third option for batching: - Functionality within the DoFn and DoFnRunner built as part of the SDK. I haven't thought through Batching, but at least for the IntraBundleParallelization use case this actually does make sense to expose as a part of the model. Knowing that a DoFn supports paralle

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Ben Chambers
> >> >>> meaningful and potentially less expensive to implement in the > absence > >> of > >> >>> state (this is why it needs a design discussion at all, really). > >> >>> > >> >>> Caveat: these APIs are new and not supported in e

Re: Committed vs. attempted metrics results

2017-01-26 Thread Ben Chambers
t; A) Should we create a util for filtering MetricsResults based on a query, > using the suggested filtering implementation in this thread (Substring > match) ? > B) Should direct runner be changed to use such a util, and conform with the > results filtering suggested. > > On Tue, Jan

Re: Committed vs. attempted metrics results

2017-01-23 Thread Ben Chambers
eeds to be discussed/nailed down to help with your PR? On Thu, Jan 19, 2017 at 3:57 PM Ben Chambers wrote: > On Thu, Jan 19, 2017 at 3:28 PM Amit Sela wrote: > > On Fri, Jan 20, 2017 at 1:18 AM Ben Chambers > > wrote: > > > Part of the problem here is that whether attempted

Re: Committed vs. attempted metrics results

2017-01-19 Thread Ben Chambers
On Thu, Jan 19, 2017 at 3:28 PM Amit Sela wrote: > On Fri, Jan 20, 2017 at 1:18 AM Ben Chambers > > wrote: > > > Part of the problem here is that whether attempted or committed is "what > > matters" depends on the metric. If you're counting the number of

Re: Committed vs. attempted metrics results

2017-01-19 Thread Ben Chambers
t > tricky since a runner is expected to keep unique naming/ids for steps, but > users are supposed to be aware of this here and I'd suspect users might not > follow and if they use the same ParDo in a couple of places they'll query > it and it might be confusing for them to see &qu

Re: Committed vs. attempted metrics results

2017-01-19 Thread Ben Chambers
Thanks for starting the discussion! I'm going to hold off saying what I think and instead just provide some background and additional questions, because I want to see where the discussion goes. When I first suggested the API for querying metrics I was adding it for parity with aggregators. A good

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-17 Thread Ben Chambers
We should start by understanding the goals. If elements are in different windows can they be out in the same batch? If they have different timestamps what timestamp should the batch have? As a composite transform this will likely require a group by key which may affect performance. Maybe within a

Re: @ProcessElement and private methods

2017-01-17 Thread Ben Chambers
We thought about it when this was added. We decided against it because these are overrides-in-spirit. Letting them be private would be misleading because they are called from outside the class and should be thought of in that way. Also, this seems similar to others in the Java ecosystem: JUnit tes

Re: Testing Metrics

2017-01-03 Thread Ben Chambers
The Metrics API in Beam is proposed to support both committed metrics (only from the successfully committed bundles) and attempted metrics (from all attempts at processing bundles). I think any mechanism based on the workers reporting metrics to a monitoring system will only be able to support atte

Re: PCollection to PCollection Conversion

2016-12-29 Thread Ben Chambers
Dan's proposal to move forward with a simple (future-proofed) version of the ToString transform and Javadoc, and add specific features via follow-up PRs. On Thu, Dec 29, 2016 at 3:53 PM Jesse Anderson wrote: > @Ben which idea do you like? > > On Thu, Dec 29, 2016 at 3:20 P

Re: PCollection to PCollection Conversion

2016-12-29 Thread Ben Chambers
I like that idea, with the caveat that we should probably come up with a better name. Perhaps "ToString.elements()" and ToString.Elements or something? Calling one the "default" and using "create" for it seems moderately non-future proof. On Thu, Dec 29, 2016 at 3:17 PM Dan Halperin wrote: > On

Re: PCollection to PCollection Conversion

2016-12-29 Thread Ben Chambers
+1 to Dan's point that MapElements.via is not that hard to use and going too far down this path leads to significant complexity. Pipelines should generally prefer to deal with structured data as much as possible. As has been discussed on this thread, though, sometimes it is necessary to convert th

Re: Creating Sum.[*]Fn instances

2016-12-22 Thread Ben Chambers
Don't they need to be visible for use with composed combine and combining value state? On Thu, Dec 22, 2016, 9:45 AM Lukasz Cwik wrote: > Those are used internally within Sum and its expected that users instead > call Sum.integersPerKey, or Sum.doublesPerKey, or Sum.integersGlobally, or > ... >