Re: Use different S3 access key for different S3 bucket

2024-01-18 Thread Josh Mahonin via user
Oops my syntax was a bit off there, as shown in the Hadoop docs, it looks like: fs.s3a.bucket.. Josh >

Re: Use different S3 access key for different S3 bucket

2024-01-18 Thread Josh Mahonin via user
.html#Configuring_different_S3_buckets_with_Per-Bucket_Configuration I'm not certain if the 's3..' syntax will work, you may need to set the 'fs.s3a..' values directly. Josh On Thu, Jan 18, 2024 at 6:02 AM Qing Lim wrote: > Hi Jun > > > > I am indeed talking about processin

Re: Filter push-down not working for a custom BatchTableSource

2019-05-03 Thread Josh Bradt
Hi Fabian, Thanks for taking a look. I've filed this ticket: https://issues.apache.org/jira/browse/FLINK-12399 Thanks, Josh On Fri, May 3, 2019 at 3:41 AM Fabian Hueske wrote: > Hi Josh, > > The code looks good to me. > This seems to be a bug then. > It's stra

Re: Filter push-down not working for a custom BatchTableSource

2019-05-02 Thread Josh Bradt
n TableSchema.fromTypeInfo(getReturnType()); } } Thanks, Josh On Thu, May 2, 2019 at 3:42 AM Fabian Hueske wrote: > Hi Josh, > > Does your TableSource also implement ProjectableTableSource? > If yes, you need to make sure that the filter information is also > forwarded if

Filter push-down not working for a custom BatchTableSource

2019-04-30 Thread Josh Bradt
cs? Or does anyone have any tips for this? Thanks, Josh -- Josh Bradt Software Engineer 225 Franklin St, Boston, MA 02110 klaviyo.com <https://www.klaviyo.com> [image: Klaviyo Logo]

Get get file name when reading from files? Or assign timestamps from file creation time?

2018-04-06 Thread Josh Lemer
Hey there, is it possible to somehow read the filename of elements that are read from `env.readFile`? In our case, the date of creation is encoded in the file name. Otherwise, maybe it is possible to assign timestamps somehow by the file's creation time directly? Thanks!

Bucketing Sink does not complete files, when source is from a collection

2018-04-04 Thread Josh Lemer
Hello, I was wondering if I could get some pointers on what I'm doing wrong here. I posted this on stack overflow , but I thought I'd also ask here. I'm trying to generate s

Re: Using Flink with Accumulo

2016-11-07 Thread Josh Elser
Oliver Swoboda wrote: Hi Josh, thank you for your quick answer! 2016-11-03 17:03 GMT+01:00 Josh Elser mailto:els...@apache.org>>: Hi Oliver, Cool stuff. I wish I knew more about Flink to make some better suggestions. Some points inline, and sorry in advance if I s

Re: ExpiredIteratorException when reading from a Kinesis stream

2016-11-05 Thread Josh
s state is not a long term solution, so if anyone has any ideas I can restore the large state and investigate further. Btw sorry for going a bit off topic on this thread! On Fri, Nov 4, 2016 at 11:19 AM, Josh wrote: > Hi Scott & Stephan, > > The problem has happened a couple mor

Re: Flink Application on YARN failed on losing Job Manager | No recovery | Need help debug the cause from logs

2016-11-04 Thread Josh
plication. You > can do this via -z flag > (https://ci.apache.org/projects/flink/flink-docs- > release-1.2/setup/cli.html). > > Does this work? > > On Fri, Nov 4, 2016 at 3:28 PM, Josh wrote: > > Hi Ufuk, > > > > I see, but in my case the failure caused YARN app

Re: Flink Application on YARN failed on losing Job Manager | No recovery | Need help debug the cause from logs

2016-11-04 Thread Josh
, Josh On Fri, Nov 4, 2016 at 1:52 PM, Ufuk Celebi wrote: > No you don't need to manually trigger a savepoint. With HA checkpoints > are persisted externally and store a pointer in ZooKeeper to recover > them after a JobManager failure. > > On Fri, Nov 4, 2016 at 2:27 PM, Josh

Re: Flink Application on YARN failed on losing Job Manager | No recovery | Need help debug the cause from logs

2016-11-04 Thread Josh
the most recent checkpoint? I can use `flink run -m yarn-cluster -s s3://my-savepoints/id .` to restore from a savepoint, but what if I haven't manually taken a savepoint recently? Thanks, Josh On Fri, Nov 4, 2016 at 10:06 AM, Maximilian Michels wrote: > Hi Anchit, > > The

Re: ExpiredIteratorException when reading from a Kinesis stream

2016-11-04 Thread Josh
investigate this, but let me know if you have any more ideas! Thanks, Josh On Thu, Nov 3, 2016 at 6:27 PM, Stephan Ewen wrote: > Is it possible that you have stalls in your topology? > > Reasons could be: > > - The data sink blocks or becomes slow for some periods (where are yo

Re: Using Flink with Accumulo

2016-11-03 Thread Josh Elser
Hi Oliver, Cool stuff. I wish I knew more about Flink to make some better suggestions. Some points inline, and sorry in advance if I suggest something outright wrong. Hopefully someone from the Flink side can help give context where necessary :) Oliver Swoboda wrote: Hello, I'm using Flink

Re: ExpiredIteratorException when reading from a Kinesis stream

2016-11-03 Thread Josh
ately I can't find anything useful in the logs so not sure what happened! Josh On Thu, Nov 3, 2016 at 12:44 PM, Tzu-Li (Gordon) Tai wrote: > Hi Josh, > > That warning message was added as part of FLINK-4514. It pops out whenever > a shard iterator was used after 5 minutes it wa

Re: ExpiredIteratorException when reading from a Kinesis stream

2016-11-03 Thread Josh
59472198219567464450,}}'}; refreshing the iterator ... Having restarted the job from my last savepoint, it's consuming the stream fine again with no problems. Do you have any idea what might be causing this, or anything I should do to investigate further? Cheers, Josh On Wed, Oct 5, 2016 a

Re: Checkpointing large RocksDB state to S3 - tips?

2016-10-25 Thread Josh
ee it crash again, and for now will just add more nodes whenever we need to speed up the checkpointing. Thanks, Josh On Tue, Oct 25, 2016 at 3:12 PM, Aljoscha Krettek wrote: > Hi Josh, > Checkpoints that take longer than the checkpoint interval should not be an > issue (if you u

Checkpointing large RocksDB state to S3 - tips?

2016-10-24 Thread Josh
te equally among the task managers? Also just wondering - is there any chance the incremental checkpoints work will be complete any time soon? Thanks, Josh

Re: Exception when restoring state from RocksDB - how to recover?

2016-10-11 Thread Josh
Ah ok great, thanks! I will try upgrading sometime this week then. Cheers, Josh On Tue, Oct 11, 2016 at 5:37 PM, Stephan Ewen wrote: > Hi Josh! > > I think the master has gotten more stable with respect to that. The issue > you mentioned should be fixed. > > Another big s

Re: Exception when restoring state from RocksDB - how to recover?

2016-10-11 Thread Josh
k-job-fails-to-restore-RocksDB-state-after-upgrading-to-1-2-SNAPSHOT-td9110.html Sorry to jump around but do you know if that's fixed in the latest 1.2-SNAPSHOT? Was it resolved by Flink-4788? Thanks, Josh On Tue, Oct 11, 2016 at 4:13 PM, Stephan Ewen wrote: > Hi Josh! > > Th

Re: Exception when restoring state from RocksDB - how to recover?

2016-10-11 Thread Josh
Hi Aljoscha, Yeah I'm using S3. Is this a known problem when using S3? Do you have any ideas on how to restore my job from this state, or prevent it from happening again? Thanks, Josh On Tue, Oct 11, 2016 at 1:58 PM, Aljoscha Krettek wrote: > Hi, > you are using S3 to store the

Exception when restoring state from RocksDB - how to recover?

2016-10-11 Thread Josh
checkpoint (e.g. the second-last checkpoint)? The version of Flink I'm using Flink-1.1-SNAPSHOT, from mid-June. Thanks, Josh [*]The exception when restoring state: java.lang.Exception: Could not restore checkpointed state to operators and functions

Re: Flink job fails to restore RocksDB state after upgrading to 1.2-SNAPSHOT

2016-10-03 Thread Josh
8) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: org.apache.flink.streaming. connectors.kafka.internals.KafkaTopicPartition Any ideas what's going on here? Is the Kafka consumer state management broken right now in Flink master? Thanks, Josh On Thu, Sep 22, 2016

Re: Flink job fails to restore RocksDB state after upgrading to 1.2-SNAPSHOT

2016-09-21 Thread Josh
gs working with an older version of Flink - but it would be good to know what's changed recently that's causing the restore to break and if my job is not going to be compatible with future releases of Flink. Best, Josh On Wed, Sep 21, 2016 at 8:21 PM, Stephan Ewen wrote: > Hi Josh

Flink job fails to restore RocksDB state after upgrading to 1.2-SNAPSHOT

2016-09-21 Thread Josh
than rewriting the Flink job in Java. Thanks, Josh

Re: Kinesis connector - Iterator expired exception

2016-08-26 Thread Josh
this happened around 5 times the job fully caught up to the head of the stream and started running smoothly again. Thanks for looking into this! Best, Josh On Fri, Aug 26, 2016 at 1:57 PM, Tzu-Li (Gordon) Tai wrote: > Hi Josh, > > Thank you for reporting this, I’m looking into it.

Kinesis connector - Iterator expired exception

2016-08-26 Thread Josh
can't restore the job from the existing state). Any ideas what's causing this? It's possible that it's been fixed in recent commits, as the version of the Kinesis connector I'm using is behind master - I'm not sure exactly what commit I'm using (doh!) but it was built around mid June. Thanks, Josh

Re: Reprocessing data in Flink / rebuilding Flink state

2016-08-01 Thread Josh
mmy-2") I guess it would be nice if Flink could recover from removed tasks/operators without needing to add dummy operators like this. Josh On Fri, Jul 29, 2016 at 5:46 PM, Aljoscha Krettek wrote: > Hi, > I have to try this to verify but I think the approach works if you give >

Re: Reprocessing data in Flink / rebuilding Flink state

2016-07-29 Thread Josh
arting job' - as you're rebuilding state as part of the main job when it comes across events that require historical data. Actually I think we'll need to do something very similar in the future but right now I can probably get away with something simpler! Thanks for the replies! Josh

Re: Reprocessing data in Flink / rebuilding Flink state

2016-07-29 Thread Josh
when the job first starts and there's a period of time where it's rebuilding its state from the Kafka log). Since I would have everything needed to rebuild the state persisted in a Kafka topic, I don't think I would need a second Flink job for this? Thanks, Josh On Thu, Jul 28, 20

Reprocessing data in Flink / rebuilding Flink state

2016-07-28 Thread Josh
ing like this already with Flink? If so are there any examples of how to do this replay & switchover (rebuild state by consuming from a historical log, then switch over to processing the live stream)? Thanks for any insights, Josh

Re: State in external db (dynamodb)

2016-07-25 Thread Josh
oring state in the external store with a tuple: (state, previous_state, checkpoint_id). Then when reading from the store, if checkpoint_id is in the future, read the previous_state, otherwise take the current_state. On Mon, Jul 25, 2016 at 3:20 PM, Josh wrote: > Hi Chen, > > Can you ex

Re: State in external db (dynamodb)

2016-07-25 Thread Josh
ry between checkpoints. Then it would implement the Checkpointed interface and write to the external store in snapshotState(...)? Thanks, Josh On Sun, Jul 24, 2016 at 6:00 PM, Chen Qin wrote: > > > On Jul 22, 2016, at 2:54 AM, Josh wrote: > > Hi all, > > >(1) Only wri

Re: State in external db (dynamodb)

2016-07-22 Thread Josh
The operator emits a new state (and updates its in-memory cache with the new state) 3. The sink batches up all the new states and upon checkpoint flushes them to the external store Could anyone point me at the work that's already been done on this? Has it already been merged into Flink? Than

Re: Dynamic partitioning for stream output

2016-07-11 Thread Josh
Hi guys, I've been working on this feature as I needed something similar. Have a look at my issue here https://issues.apache.org/jira/browse/FLINK-4190 and changes here https://github.com/joshfg/flink/tree/flink-4190 The changes follow Kostas's suggestion in this thread. Thanks, Jos

Re: How to avoid breaking states when upgrading Flink job?

2016-07-01 Thread Josh
at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:290) at org.apache.flink.util.SerializedValue.deserializeValue(SerializedValue.java:58) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546) On Fri, Jul 1, 2016 at 10:21 AM, Josh wrote: >

Re: How to avoid breaking states when upgrading Flink job?

2016-07-01 Thread Josh
Thanks guys, that's very helpful info! @Aljoscha I thought I saw this exception on a job that was using the RocksDB state backend, but I'm not sure. I will do some more tests today to double check. If it's still a problem I'll try the explicit class definitions solution.

Re: Flink on YARN - how to resize a running cluster?

2016-06-30 Thread Josh
cale the job in the future if I need to? Josh On Thu, Jun 30, 2016 at 11:17 AM, Till Rohrmann wrote: > Hi Josh, > > at the moment it is not possible to dynamically increase the parallelism > of your job. The same holds true for a restarting a job from a savepoint. > But we'r

Flink on YARN - how to resize a running cluster?

2016-06-29 Thread Josh
4 task mangers (not 6!) and restored my job from the last checkpoint. Can anyone point me in the right direction? Thanks, Josh

How to avoid breaking states when upgrading Flink job?

2016-06-29 Thread Josh
eed to look for a specific class, like com.me.flink.MyJob$$anon$1$$anon$7$$anon$4 ? Thanks for any advice, Josh

Re: Kinesis connector classpath issue when running Flink 1.1-SNAPSHOT on YARN

2016-06-23 Thread Josh
Hi Aljoscha, I opened an issue here https://issues.apache.org/jira/browse/FLINK-4115 and submitted a pull request. I'm not sure if my fix is the best way to resolve this, or if it's better to just remove the verification checks completely. Thanks, Josh On Thu, Jun 23, 2016 at 9:41 AM

Re: Kinesis connector classpath issue when running Flink 1.1-SNAPSHOT on YARN

2016-06-17 Thread Josh
itialisation of filesystem in the constructor commented out (not sure why this is initialised in the constructor, since it seems to get initialised later anyway) Josh On Fri, Jun 17, 2016 at 2:53 PM, Aljoscha Krettek wrote: > Hi, > I think the problem w

Re: Kinesis connector classpath issue when running Flink 1.1-SNAPSHOT on YARN

2016-06-17 Thread Josh
t the RocksDBStateBackend which is erroring with the s3 URI. I'm wondering if it could be an issue with RocksDBStateBackend? On Fri, Jun 17, 2016 at 12:09 PM, Josh wrote: > Hi Gordon/Fabian, > > Thanks for helping with this! Downgrading the Maven version I was using to > build Flink appears to h

Re: Kinesis connector classpath issue when running Flink 1.1-SNAPSHOT on YARN

2016-06-17 Thread Josh
d that error in the first place? By the way, I built Flink with: mvn clean install -Pinclude-kinesis,vendor-repos -DskipTests -Dhadoop.version=2.7.1 Josh On Fri, Jun 17, 2016 at 9:56 AM, Fabian Hueske wrote: > Hi Josh, > > I assume that you build the SNAPSHOT version yourself. I had sim

Re: Moving from single-node, Maven examples to cluster execution

2016-06-16 Thread Josh
the jar to a remote JobManager with the -m flag. Although I don't do this at the moment because it doesn't work so easily if you're running Flink on AWS/EMR. Josh On Thu, Jun 16, 2016 at 10:51 PM, Prez Cannady wrote: > Having a hard time trying to get my head around how to dep

Kinesis connector classpath issue when running Flink 1.1-SNAPSHOT on YARN

2016-06-16 Thread Josh
--- Any ideas what's going on? The job I'm deploying has httpclient 4.3.6 and httpcore 4.3.3 which I believe are the libraries with the HttpConnectionParams class. Thanks, Josh

Re: Migrating from one state backend to another

2016-06-15 Thread Josh
Hi Aljoscha, Thanks, that makes sense. I will start using RocksDB right away then. Josh On Wed, Jun 15, 2016 at 1:01 PM, Aljoscha Krettek wrote: > Hi, > right now migrating from one state backend to another is not possible. I > have it in the back of my head, however, that we should

Migrating from one state backend to another

2016-06-14 Thread Josh
d. I was just wondering if it's possible/easy to use savepoints to migrate existing state from the filesystem backend to the RocksDB backend? As I would not want to lose any job state when switching to RocksDB. If there's a way to do it then I can worry about RocksDB later. Thanks! Josh

Re: Accessing StateBackend snapshots outside of Flink

2016-06-13 Thread Josh
his approach wouldn't work, or anything to be careful of? Thanks, Josh On Mon, Apr 18, 2016 at 8:23 AM, Aljoscha Krettek wrote: > Hi, > key refers to the key extracted by your KeySelector. Right now, for every > named state (i.e. the name in the StateDescriptor) there is a an isolate

Re: Using Flink watermarks and a large window state for scheduling

2016-06-09 Thread Josh
, and does the final transformation. I just thought there might be a nicer way to do it using Flink! On Thu, Jun 9, 2016 at 2:23 PM, Aljoscha Krettek wrote: > Hi Josh, > I'll have to think a bit about that one. Once I have something I'll get > back to you. > > Best, >

Using Flink watermarks and a large window state for scheduling

2016-06-08 Thread Josh
ks to execute the scheduled transformations. If anyone has any views on how this could be done, (or whether it's even possible/a good idea to do) with Flink then it would be great to hear! Thanks, Josh

Re: ClassCastException when redeploying Flink job on running cluster

2016-06-08 Thread Josh
Thanks Till, your suggestion worked! I actually just created a new SpecificData for each AvroDeserializationSchema instance, so I think it's still just as efficient. Josh On Wed, Jun 8, 2016 at 4:41 PM, Till Rohrmann wrote: > The only thing I could think of is to not use the Spec

Re: ClassCastException when redeploying Flink job on running cluster

2016-06-08 Thread Josh
(TimestampsAndPeriodicWatermarksOperator.java:63) at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:351) ... 3 more On Wed, Jun 8, 2016 at 3:19 PM, Josh wrote: > Hi Till, > > Thanks for the reply! I see - yes it does sound very much like FLINK-1390. > >

Re: ClassCastException when redeploying Flink job on running cluster

2016-06-08 Thread Josh
].runtimeClass) Looking at SpecificData, it contains a classCache which is a map of strings to classes, similar to what's described in FLINK-1390. I'm not sure how to change my AvroDeserializationSchema to prevent this from happening though! Do you have any ideas? Josh On Wed, Jun 8, 2016 a

ClassCastException when redeploying Flink job on running cluster

2016-06-08 Thread Josh
(see below), and I'm wondering if that's causing the problem when the Flink job creates an AvroDeserializationSchema[MyAvroType]. Does anyone have any ideas? Thanks, Josh class AvroDeserializationSchema[T <: SpecificRecordBase :ClassTag] extends DeserializationSchema[T] {

Re: Submit Flink Jobs to YARN running on AWS

2016-06-06 Thread Josh
Hi Abhi, I'm also looking to deploy Flink jobs remotely to YARN, and eventually automate it - just wondering if you found a way to do it? Thanks, Josh On Wed, May 25, 2016 at 12:36 AM, Bajaj, Abhinav wrote: > Hi, > > Has anyone tried to submit a Flink Job remotely to Yarn

Re: Combining streams with static data and using REST API as a sink

2016-05-25 Thread Josh
here will be a delay in receiving updates since the updates aren't being continuously ingested by Flink. But it certainly sounds like it would be a nice feature to have! Thanks, Josh On Tue, May 24, 2016 at 1:48 PM, Aljoscha Krettek wrote: > Hi Josh, > for the first part of your ques

Re: Combining streams with static data and using REST API as a sink

2016-05-23 Thread Josh
but how would I handle occasional updates in that case, since I guess the open() function is only called once? Do I need to periodically restart the job, or periodically trigger tasks to restart and refresh their data? Ideally I would want this job to be running constantly. Josh On Mon, May 23,

Re: Combining streams with static data and using REST API as a sink

2016-05-23 Thread Josh
;. What's wrong with doing that update in the Flink job via an HTTP REST call (updating the customer resource), rather than writing directly to a database? The reason I'd like to do it this way is to decouple the underlying database from Flink. Josh On Mon, May 23, 2016 at 2:35

Combining streams with static data and using REST API as a sink

2016-05-23 Thread Josh
27;t found much evidence that anyone else has used a REST API as a Flink sink - is there a reason why this might be a bad idea? Thanks for any advice on these, Josh

Re: External DB as sink - with processing guarantees

2016-03-12 Thread Josh
Hi Fabian, Thanks, that's very helpful. Actually most of my writes will be idempotent so I guess that means I'll get the exact once guarantee using the Hadoop output format! Thanks, Josh > On 12 Mar 2016, at 09:14, Fabian Hueske wrote: > > Hi Josh, > > Flink c

Re: External DB as sink - with processing guarantees

2016-03-12 Thread Josh
-once? Thanks, Josh > On 12 Mar 2016, at 02:46, Nick Dimiduk wrote: > > Pretty much anything you can write to from a Hadoop MapReduce program can be > a Flink destination. Just plug in the OutputFormat and go. > > Re: output semantics, your mileage may vary. Flink should do y

External DB as sink - with processing guarantees

2016-03-11 Thread Josh
able to have Flink's processing guarantees? I.e. Can I be sure that every tuple has contributed to the DynamoDB state either at-least-once or exactly-once? Thanks for any advice, Josh