Using Prometheus Client Metrics in Beam with FlinkRunner

2021-02-22 Thread Rion Williams
Is anyone aware of a way to get Prometheus metrics to output via Beam (i.e. using them over the Beam abstractions)? I seem to be able to define them as expected, however I don’t see them being emitted with the rest of my Prometheus metrics: private class RouteEvent(): DoFn<...>() {

Defining Custom Labels / Label Support in Beam Metrics

2021-02-15 Thread Rion Williams
Hey all, I've been working extensively on the observability stories surrounding some of the pipelines that I've been building recently in Beam and while the existing Metrics have been extremely helpful, I don't see any easy or perhaps just well documented way of adding/updating labels for these

Pull Request Review / Feature Feedback for KafkaIO

2021-02-11 Thread Rion Williams
Hi all, Recently, I encountered a bit of functionality for a pipeline that I was working that seemed to be slightly lacking (specifically the recognition of explicitly defined partitioning in the KafkaIO.WriteRecords transform) so I put together a JIRA related to it [1] as well as a more detailed

Re: Unit Testing Kafka in Apache Beam

2021-02-09 Thread Rion Williams
sages after it has > started? We use this approach for some tests against Cloud PubSub. > Note if using the DirectRunner you need to set the blockOnRun pipeline option > to False to do this. > > Brian > >> On Mon, Feb 8, 2021 at 2:10 PM Rion Williams wrote: >> Hey

Unit Testing Kafka in Apache Beam

2021-02-08 Thread Rion Williams
Hey folks, I’ve been working on fleshing out a proof-of-concept pipeline that deals with some out of order data (I.e. mismatching processing times / event-times) and does quite a bit of windowing depending on the data. Most of the work I‘ve done in a lot of streaming systems relies heavily on

Re: Separating Data from Kafka by Keying Strategy in a Kafka Splittable DoFn

2021-02-01 Thread Rion Williams
f having a watermark over these 3 partitions. Do I > understand this correctly? > > If so, I think you need separated pipelines. If you only want to know > which records come from which partitions, ReadFromKafkaDoFn emits a KV pair > where the KafkaSourceDescripto

Re: Separating Data from Kafka by Keying Strategy in a Kafka Splittable DoFn

2021-02-01 Thread Rion Williams
enerates KafkaSourceDescriptor) > .apply(ParDo.of(ReadFromKafkaDoFn)) > .apply(other parts) > >> On Mon, Feb 1, 2021 at 8:06 AM Rion Williams wrote: >> Hey all, >> >> I'm currently in a situation where I have a single Kafka topic with data >> across

Separating Data from Kafka by Keying Strategy in a Kafka Splittable DoFn

2021-02-01 Thread Rion Williams
Hey all, I'm currently in a situation where I have a single Kafka topic with data across multiple partitions and covers data from multiple sources. I'm trying to see if there's a way that I'd be able to accomplish reading from these different sources as different pipelines and if a Splittable

Handling Out-of-Order Windowing Event Times from Kafka to GCS

2021-01-29 Thread Rion Williams
Hey folks, I’ve been mulling over how to solve a given problem in Beam and thought I’d reach out to a larger audience for some advice. At present things seem to be working sparsely and I was curious if someone could provide a sounding-board to see if this workflow makes sense. The primary

Accessing Custom Beam Metrics in Dataproc

2021-01-12 Thread Rion Williams
Hi all, I'm currently in the process of adding some metrics to an existing pipeline that runs on Google Dataproc via Spark and I'm trying to determine how to access these metrics and eventually expose them to Stackdriver (to be used downstream in Grafana dashboards). The metrics themselves

Re: Help measuring upcoming performance increase in flink runner on production systems

2020-12-14 Thread Rion Williams
Hi Teodor, Although I’m sure you’ve come across it, this might have some valuable resources or methodologies to consider as you explore this a bit more: https://arxiv.org/pdf/1907.08302.pdf I’m looking forward to reading about your finding, especially using a more recent iteration of Beam!

Transform Logging Issues with Spark/Dataproc in GCP

2020-10-27 Thread Rion Williams
Hi all, Recently, I deployed a very simple Apache Beam pipeline to get some insights into how it behaved executing in Dataproc as opposed to on my local machine. I quickly realized that after executing that any DoFn or transform-level logging didn't appear within the job logs within the Google

Re: Issue with Maintaining State in LocalRunner

2020-06-12 Thread Rion Williams
Jun 12, 2020 at 3:53 PM Rion Williams wrote: >> Hi Luke, >> >> I think that’s likely my mistake. I had forgotten that was tied to a given >> key-window. In this example use case, all of the data is keyed differently >> (and thus not associated to a window or a key)

Re: Issue with Maintaining State in LocalRunner

2020-06-12 Thread Rion Williams
ote: > >  > Simple question, you are expecting to see prior results under the same window > and key which you are not seeing (since state is per key and window)? > >> On Fri, Jun 12, 2020 at 3:09 PM Rion Williams wrote: >> Hi all, >> >> I've been

Issue with Maintaining State in LocalRunner

2020-06-12 Thread Rion Williams
Hi all, I've been toying around with stateful DoFns recently and was attempting some approaches involving buffering when I realized that it seemed that my existing state was being ignored in the following DoFn: ``` class ExampleStatefulDoFn(): DoFn, KV>() { @StateId("count") private

Re: Multiple Tenant Consolidation via ElasticSearch in Beam Pipeline

2020-06-02 Thread Rion Williams
o occur within a DoFn and would happen when the pipeline > executes. > > On Mon, Jun 1, 2020 at 5:17 PM Rion Williams wrote: > > > Hi folks, > > > > I was talking with a colleague about a scenario he was facing and we were > > exploring the idea of Beam as

Multiple Tenant Consolidation via ElasticSearch in Beam Pipeline

2020-06-01 Thread Rion Williams
Hi folks, I was talking with a colleague about a scenario he was facing and we were exploring the idea of Beam as a possible solution. I thought I’d reach out to the audience hear to get their opinions. Basically, we have a series of single tenant Elasticsearch indices that we are attempting

Re: Kotlin Type Inference Issue for Primitives in DoFn

2020-05-28 Thread Rion Williams
the > TypeToken framework we use for analyzing Java types isn't working properly > on these Kotlin types. > > On Wed, May 27, 2020 at 1:27 PM Reuven Lax wrote: > > > Do you have the full stack trace from that exception? > > > > On Wed, May 27, 2020 at 1:13 PM Rion W

Re: Kotlin Type Inference Issue for Primitives in DoFn

2020-05-27 Thread Rion Williams
expect, however for simple transforms the @Element approach can be a bit easier to grok. > On May 27, 2020, at 3:01 PM, Reuven Lax wrote: > >  > I'm assuming that Kotlin has its own type for Int, which is not the same as > Java's Integer type. > >> On Fri, May

Re: Try Beam Katas Today

2020-05-24 Thread Rion Williams
gt; the GoLang katas, and kept behind the scenes. Is there reasoning for > breaking from the convention of the other Katas? > https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#Create > > Thanks, > Austin > > >> On Thu, May 21, 2020 at 8:00 PM Rion Williams w

Kotlin Type Inference Issue for Primitives in DoFn

2020-05-22 Thread Rion Williams
Hi all, I was writing a very simple transform in Kotlin as follows that takes in a series of integers and applies a simply DoFn against them: pipeline .apply(Create.of(1, 2, 3)) .apply(ParDo.of(object: DoFn(){ @ProcessElement fun

Re: Try Beam Katas Today

2020-05-21 Thread Rion Williams
from html to >>> md. >>> Please also help to remove all the *-remote-info.yaml files. >>> I assume that you've adjusted the answer placeholders in all tasks as well. >>> Afterwards, you can create a pull request and assign me as reviewer. >>> >>> P

Re: Try Beam Katas Today

2020-05-19 Thread Rion Williams
the *-remote-info.yaml files. > I assume that you've adjusted the answer placeholders in all tasks as well. > Afterwards, you can create a pull request and assign me as reviewer. > > Please reach out to me if you have any questions. > > > Regards, > Henry > > > &

Re: Try Beam Katas Today

2020-05-19 Thread Rion Williams
our branch? If > so, I can give them a try, and use that as a review... > Henry, any other ideas? > > On Tue, May 19, 2020 at 12:04 PM Rion Williams > wrote: > > > Hi all, > > > > I was recently added as a contributor and created a JIRA ticket related to >

Re: Try Beam Katas Today

2020-05-19 Thread Rion Williams
might need to be taken before trying to get it merged in. Any feedback would be welcome! Thanks, Rion On 2020/05/14 23:40:45, Rion Williams wrote: > +1 on the contributions front. My team and I have been working with Beam > primarily with Kotlin and I recently added the appro

Re: Try Beam Katas Today

2020-05-14 Thread Rion Williams
+1 on the contributions front. My team and I have been working with Beam primarily with Kotlin and I recently added the appropriate dependencies to Gradle and performed a bit of conversions and have it working as expected against the existing Java course. I don’t know how many others are

Re: Jacek's new Apache Beam Internals Project

2020-05-03 Thread Rion Williams
Yeah, I’m in hopes that gets fleshed out a bit more. I just opened it up and didn’t realize it was a work in progress :) > On May 3, 2020, at 5:17 PM, Rion Williams wrote: > > Thanks so much Holden! I’ve been looking for something that dives a bit > deeper into the internals

Re: Jacek's new Apache Beam Internals Project

2020-05-03 Thread Rion Williams
Thanks so much Holden! I’ve been looking for something that dives a bit deeper into the internals of Beam, so this is perfect! Just purchased - looking forward to digging into it! > On Apr 28, 2020, at 1:04 PM, Holden Karau wrote: > >  > Hi Folks, > > I just saw Jacek's tweet about his new