I have some side inputs that I would like to add to my pipeline. Some of
them are based on a file pattern, so I found that I can collect the
contents of those files using a pattern like the following:
val genotypes =
p.apply(FileIO.`match`.filepattern(opts.getGenotypesFilePattern()))
.ap
ype: text/csv) to
> the output files and, as consequence, GZIP files were automatically
> decompressed when downloading them (as explained in the previous link).
>
> Best,
>
>
> El vie., 12 oct. 2018 23:40, Randal Moore escribió:
>
>> Using Beam Java SDK 2.6.
>>
&g
Using Beam Java SDK 2.6.
I have a batch pipeline that has run successfully in its current several
times. Suddenly I am getting strange errors complaining about the format of
the input. As far as I know, the pipeline didn't change at all since the
last successful run. The error:
java.util.zip.ZipEx
I cannot find a way to control which namespace I'm writing to when saving
to DataStore from a Beam/DataFlow job. I am using
org.apache.beam.sdk.io.gcp.datastore.DatastoreV1
I find the ability to control the namespace with the reader but not the writer.
Am I missing something obvious?
ed client.
>
> MonitoringUtil.toState() converts that string to the set of enums your
> familiar with:
>
> https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/MonitoringUtil.java#L222
>
> On Fri, Feb 2, 2018 at 8:40 AM,
I'm using dataflow. Found what seems to me to be "usage problem" with the
available APIs.
When I submit a job to the dataflow runner, I get back a
DataflowPipelineJob or its superclass, PipelineResult which provides me the
status of the job - as an enumerated type. But if I use a DataflowClient to
+1
On Tue, Oct 17, 2017 at 5:21 PM Raghu Angadi wrote:
> +1.
>
> On Tue, Oct 17, 2017 at 2:11 PM, David McNeill
> wrote:
>
>> The final version of Beam that supports Java 7 should be clearly stated
>> in the docs, so those stuck on old production infrastructure for other java
>> app dependencie
I have a batch pipeline that runs well with small inputs but fails with a
larger dataset.
Looking at stackdriver I find a fair number of the following:
Request failed with code 400, will NOT retry:
https://dataflow.googleapis.com/v1b3/projects/cgs-nonprod/locations/us-central1/jobs/2017-08-03_13_0
master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineJob.java#L441
> [3] https://issues.apache.org/jira/secure/CreateIssue!default.jspa
>
> On Sun, Jul 9, 2017 at 2:54 PM, Randal Moore wrote:
>
>> Is this part of the Beam API or
Is this part of the Beam API or something I should look at the google docs
for help? Assume a job is running in dataflow - how can an interested
third-party app query the status if it knows the job-id?
rdm
Maybe this is more of a question for DataFlow - but I'm submitting a
pipeline that needs to access a rest service running in a GKE kubernetes
instance. I need to pass in creds. I started with pipeline-options which
work but all options get exposed on the DataFlow web pages.
Is there a way to pas
. Keeping Y small will improve caching, larger
> Y helps with hot keys.
>
> On Fri, Jul 7, 2017 at 8:26 AM, Randal Moore wrote:
>
>> Sorry for being confusing - I am still grasping at the correct semantics
>> to use to refer to some of the things. I think that made a mess of
e index but it would be 100 times larger?
> ** A map based side input which has values which are 4 bytes vs 400 bytes
> isn't going to change much in lookup cost
>
>
>
> On Wed, Jul 5, 2017 at 6:22 PM, Randal Moore wrote:
>
>> Based on my understanding so far, I
can contain methods marked with @Setup/@Teardown which only get invoked
>> once per DoFn instance (which is relatively infrequently) and you could
>> store an instance per DoFn instead of a singleton if the REST library was
>> not thread safe.
>>
>> On Wed, Jul 5, 2
I have a step in my beam pipeline that needs some data from a rest service.
The data acquired from the rest service is dependent on the context of the
data being processed and relatively large. The rest client I am using isn't
serializable - nor is it likely possible to make it so (background threa
Just starting looking at Beam this week as a candidate for executing some
fairly CPU intensive work. I am curious if the stream-oriented features of
Beam are a match for my usecase. My user will submit a large number of
computations to the system (as a "job"). Those computations can be
expressed
16 matches
Mail list logo