Re: Carbon over-use cluster resources

2020-04-20 Thread Manhua Jiang
services change size? (like local-sort is done, then most threads work for writing and none for input reading and converting) BTW, do you know why the cofigurtation "carbon.number.of.cores.while.loading" born ? On 2020/04/15 13:54:50, Ajantha Bhat wrote: > Hi Manhua, > &g

Re: Carbon over-use cluster resources

2020-04-20 Thread Manhua Jiang
Hi Vishal, what you said "1 core launching 2 thread" could be the view from system level, right? In yarn mode, what application got is vCore, so carbon should not take that as a physical core. On 2020/04/16 16:15:23, Kumar Vishal wrote: > Hi Manhua, > In addition to what A

Carbon over-use cluster resources

2020-04-02 Thread Manhua Jiang
Hi All, Recently, I found carbon over-use cluster resources. Generally the design of carbon work flow does not act as common spark task which only do one small work in one thread, but the task has its mind/logic. For example, 1.launch carbon with --num-executors=1 but set

Re: [ANNOUNCE] Kunal Kapoor as new PMC for Apache CarbonData

2020-03-31 Thread Manhua Jiang
Congratulations Kunal ! Regards, Manhua On 2020/03/30 09:31:33, Indhumathi wrote: > Congratulations Kunal! > > Regards, > Indhumathi > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ >

Re: Apply to open 'Issues' tab in Apache CarbonData github

2019-12-18 Thread Manhua Jiang
+1 Issues tab is easier to reach than JIRA too On 2019/12/19 03:06:58, "恩爸" <441586...@qq.com> wrote: > Hi community: >   I suggest community to open 'Issues' tab in carbondata github page, we can > use this feature to collect the information of carbondata users, like this: >

Re: [DISCUSSION] Page Level Bloom Filter

2019-11-26 Thread Manhua
Hi Vishal, I want to ask a question. For supporting huge binary/varchar/complex data, the row number in a page will be larger or smaller than 32000? Thanks. On 2019/11/26 11:49:54, Kumar Vishal wrote: > Hi Manhua, > > I agree with Ravindra and Vimal adding page le

Re: [DISCUSSION] Page Level Bloom Filter

2019-11-26 Thread Manhua
rs(Bloom > size and number of hash functions) that would work effectively for a file > level bloom filter. > > Regards, > Vimal > > On Tue, Nov 26, 2019 at 12:30 PM ravipesala wrote: > > > Hi Manhua, > > > > Main problem with this approach is we cannot

Re: [DISCUSSION] Page Level Bloom Filter

2019-11-25 Thread Manhua
do `OR` operation on bitmaps of pages to get blocklet level, and similarly to get block level. On 2019/11/26 07:14:40, ravipesala wrote: > Hi Manhua, > > Main problem with this approach is we cannot save any IO as our IO unit is > blocklet not page. Once it is already to memory I

Re: [DISCUSSION] Page Level Bloom Filter

2019-11-11 Thread Manhua
To my understanding, only IO of the *filter columns*' column pages are saved if we do this, in condition of that minmax/pagebloom decides we can *skip* scanning these pages. On 2019/11/12 03:37:08, Jacky Li wrote: > > > On 2019/11/05 02:30:30, Manhua Jiang wrote: >

Re: [DISCUSSION] Changing default spark dependency to 2.3

2019-11-11 Thread Manhua
+1 On 2019/11/12 05:07:05, "恩爸" <441586...@qq.com> wrote: > +1, and can upgrade spark version 2.3.2 to 2.3.4. > If it's ok, i can do this. > > > > > -- Original -- > From: "Jacky Li-3 [via Apache CarbonData Dev Mailing List > archive]"; > Date: Tue, Nov 12,

Re: [DISCUSSION] Page Level Bloom Filter

2019-11-11 Thread Manhua
Hi xuchuanyin, Thanks for you inputs. Table properties will not be used while querying. We will apply page level bloom filter only when both following two conditions are met. 1. query runs into IncludeFilterExecuter for a column 2. page bloom of that column is set/stored in carbondata file

Re: [DISCUSSION] Page Level Bloom Filter

2019-11-04 Thread Manhua Jiang
the datachunk3 and column pages. The IO for column page is wasted. Should we change this first? Is this worth for us to separate one IO operation into two? Anyone interesting in this part is welcomed to share you ideas also. Thanks. Manhua On 2019/11/04 09:15:35, Jacky Li wrote: > Hi Man

[DISCUSSION] Page Level Bloom Filter

2019-10-31 Thread Manhua Jiang
or suggestion. Thanks & Regards, Manhua

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-10-07 Thread Manhua
UDF might have performance problem: Spark built-in UDF vs Spark UDF vs Hive UDF have some different On 2019/10/07 10:26:07, Ravindra Pesala wrote: > Hi Akash, > > 1. It is better to make it simple and let user provide the udf he wants in > the query. So no need to rewrite the query and

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-09-24 Thread Manhua
Hi Akash Can user specific the granularity? Such as 5minutes, 15 minutes Is there any constraint on timestamp_column's datatype? Including DATE, TIMESTAMP, BIGINT(Unix timestamp) On 2019/09/23 13:42:48, Akash Nilugal wrote: > Hi Community, > > Timeseries data are simply measurements or

Re: [DISCUSSION] Cache Pre Priming

2019-08-19 Thread Manhua
, you are right, fire a count(*) can do that. On 2019/08/19 09:23:06, Akash Nilugal wrote: > Hi manhua, > > Thanks for the inputs. > > 1. No need to take care separately to invalidate the cache, i agree that it > will have limit. Since we already have eviction policy, when

Re: [DISCUSSION] Cache Pre Priming

2019-08-18 Thread Manhua
Hi, I come up with following ideas: 1. Although index server can provide more memory to hold the cache for index data, its space still has a limit. So cache managment(especially cache invalid) should be paid attention if we Pre-Prime during data load or start of index server which easily fill

Re: [1.5.2] Gzip Compression Support

2018-10-24 Thread manhua
a default is BIG_ENDIAN. - Regards Manhua -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

[Discussion] Wrapped Exception

2018-10-24 Thread manhua
ot;catch \{[^}]*throw new" in scala to deal with exception. Any good ideas to decide whether there is problem for the wrapped exception and how to fix it? For this example, shall we change the error message to "e.getMessage()" to keep original error? - Regards Manhua -

[DISCUSSION] Remove BTree related code

2018-08-23 Thread manhua
Hi All, Since I read latest code of carbon and found that BTree related code is only used by a test class called`BTreeBlockFinderTest`. So I try delete those codes and test shows it works fine. But I wonder whether to delete those code now or anyone thinks it can be used for something else ?