carbondata partitioned by date generate many small files

2018-06-04 Thread 陈星宇
hi carbondata team, i am using carbondata 1.3.1 to create table and import data, generated many small files and spark job is very slow, i suspected the number of file is related to the number of spark job . but if i decrease the jobs, job will fail because of outofmemory. see my ddl as below:

carbondata partitioned by data generate many small file

2018-06-04 Thread 陈星宇
hi

Re: [Discussion] Carbon Local Dictionary Support

2018-06-04 Thread xm_zzc
Hi: +1. This is an exciting feature, hope to have it in version 1.5. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [Discussion] Carbon Local Dictionary Support

2018-06-04 Thread manish gupta
+1 It is a good feature to have. Once the design document is uploaded we will get a better idea of how it will be implemented. Regards Manish Gupta On Tue, Jun 5, 2018 at 11:18 AM, Kumar Vishal wrote: > Hi Xuchuanyin, > > I am working on design document, and all the points you have mentioned I

Re: [Discussion] Carbon Local Dictionary Support

2018-06-04 Thread Kumar Vishal
Hi Xuchuanyin, I am working on design document, and all the points you have mentioned I have already captured. I will share once it is finished. -Regards Kumar Vishal On Tue, Jun 5, 2018 at 9:22 AM, xuchuanyin wrote: > Hi, Kumar: > Local dictionary will be nice feature and other formats like

Re: [Discussion] Carbon Local Dictionary Support

2018-06-04 Thread Ravindra Pesala
Hi Vishal, +1 Thank you for starting a discussion on it. It will be a very helpful feature to improve query performance and reduces the memory footprint. Please add the design document for the same. Regards, Ravindra. On 5 June 2018 at 09:22, xuchuanyin wrote: > Hi, Kumar: > Local dictionar

Re: [Discussion] Carbon Local Dictionary Support

2018-06-04 Thread xuchuanyin
Hi, Kumar: Local dictionary will be nice feature and other formats like parquet all support this. My concern is that: How will you implement this feature? 1. What's the scope of the `local`? Page level (for all containing rows), Blocklet level (for all containing pages), Block level(for all

[Discussion] Carbon Local Dictionary Support

2018-06-04 Thread Kumar Vishal
Hi Community,Currently CarbonData supports global dictionary or No-Dictionary (Plain-Text stored in LV format) for storing dimension column data. *Bottleneck with Global Dictionary* 1. As dictionary file is mutable file, so it is not possible to support global dictionary in storage env

Re: Support updating/deleting data for stream table

2018-06-04 Thread xm_zzc
Hi: ok, I will create a parent jira to trace this issue. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: Support updating/deleting data for stream table

2018-06-04 Thread Raghunandan S
Hi, Those are 2 steps in the same solution. Not different solutions. We can create jira considering all and implement only the part. The parent jira would get closed when all the child jira are implemented Regards Raghu On Sun, 3 Jun 2018, 1:07 pm Liang Chen, wrote: > Hi > > +1 for first consid