Re: Monitoring S3 Bucket with Spark Streaming

2016-04-09 Thread Nezih Yigitbasi
Natu, Benjamin, With this mechanism you can configure notifications for *buckets* (if you only care about some key prefixes you can take a look at object key name filtering, see the docs) for various event types, and then these events can be published to SNS, SQS or Lambdas. I think using SQS as

Re: Monitoring S3 Bucket with Spark Streaming

2016-04-09 Thread Nezih Yigitbasi
While it is doable in Spark, S3 also supports notifications: http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html On Fri, Apr 8, 2016 at 9:15 PM Natu Lauchande wrote: > Hi Benjamin, > > I have done it . The critical configuration items are the ones below

Re: Amazon S3 Access Error

2016-04-06 Thread Nezih Yigitbasi
Did you take a look at this jira? On Wed, Apr 6, 2016 at 6:44 PM Joice Joy wrote: > I am facing an S3 access error when using Spark 1.6.1 pre-built for Hadoop > 2.6 or later. > But if I use Spark 1.6.1 pre-built for

Re: SparkContext.stop() takes too long to complete

2016-03-18 Thread Nezih Yigitbasi
ersion of hadoop do you use ? > > bq. Requesting to kill executor(s) 1136 > > Can you find more information on executor 1136 ? > > Thanks > > On Fri, Mar 18, 2016 at 4:16 PM, Nezih Yigitbasi < > nyigitb...@netflix.com.invalid> wrote: > >> Hi Spark experts

SparkContext.stop() takes too long to complete

2016-03-18 Thread Nezih Yigitbasi
Hi Spark experts, I am using Spark 1.5.2 on YARN with dynamic allocation enabled. I see in the driver/application master logs that the app is marked as SUCCEEDED and then SparkContext stop is called. However, this stop sequence takes > 10 minutes to complete, and YARN resource manager kills the

Re: question about combining small parquet files

2015-11-30 Thread Nezih Yigitbasi
teresting compaction approach of small files is discussed recently > > http://blog.cloudera.com/blog/2015/11/how-to-ingest-and-query-fast-data-with-impala-without-kudu/ > > > AFAIK Spark supports views too. > > > -- > Ruslan Dautkhanov > > On Thu, Nov 26, 201

question about combining small parquet files

2015-11-26 Thread Nezih Yigitbasi
Hi Spark people, I have a Hive table that has a lot of small parquet files and I am creating a data frame out of it to do some processing, but since I have a large number of splits/files my job creates a lot of tasks, which I don't want. Basically what I want is the same functionality that Hive