Re: how about a custom coalesce() policy?

2016-04-03 Thread Nezih Yigitbasi
Sure, here is the jira and this is the PR. Nezih On Sat, Apr 2, 2016 at 10:40 PM Hemant Bhanawat wrote: > correcting email id for Nezih > > Hemant Bhanawat

Re: how about a custom coalesce() policy?

2016-04-02 Thread Hemant Bhanawat
correcting email id for Nezih Hemant Bhanawat www.snappydata.io On Sun, Apr 3, 2016 at 11:09 AM, Hemant Bhanawat wrote: > Hi Nezih, > > Can you share JIRA and PR numbers? > > This partial de-coupling of data

Re: how about a custom coalesce() policy?

2016-04-02 Thread Hemant Bhanawat
Hi Nezih, Can you share JIRA and PR numbers? This partial de-coupling of data partitioning strategy and spark parallelism would be a useful feature for any data store. Hemant Hemant Bhanawat www.snappydata.io On Fri, Apr 1, 2016 at 10:33

Re: how about a custom coalesce() policy?

2016-04-01 Thread Nezih Yigitbasi
Hey Reynold, Created an issue (and a PR) for this change to get discussions started. Thanks, Nezih On Fri, Feb 26, 2016 at 12:03 AM Reynold Xin wrote: > Using the right email for Nezih > > > On Fri, Feb 26, 2016 at 12:01 AM, Reynold Xin wrote: > >> I

Re: how about a custom coalesce() policy?

2016-02-26 Thread Reynold Xin
Using the right email for Nezih On Fri, Feb 26, 2016 at 12:01 AM, Reynold Xin wrote: > I think this can be useful. > > The only thing is that we are slowly migrating to the Dataset/DataFrame > API, and leave RDD mostly as is as a lower level API. Maybe we should do > both?

Re: how about a custom coalesce() policy?

2016-02-26 Thread Reynold Xin
I think this can be useful. The only thing is that we are slowly migrating to the Dataset/DataFrame API, and leave RDD mostly as is as a lower level API. Maybe we should do both? In either case it would be great to discuss the API on a pull request. Cheers. On Wed, Feb 24, 2016 at 2:08 PM, Nezih

how about a custom coalesce() policy?

2016-02-24 Thread Nezih Yigitbasi
Hi Spark devs, I have sent an email about my problem some time ago where I want to merge a large number of small files with Spark. Currently I am using Hive with the CombineHiveInputFormat and I can control the size of the output files with the max split size parameter (which is used for