Or is it possible to use mandatory dimensions instead of join/hierarchical one. In that case, Cube wont be exploded as such. Pls. advise.
Can I put mandatory – 60 Hierarchy – 20 Regards, Manoj From: Kumar, Manoj H Sent: Saturday, February 03, 2018 10:40 AM To: 'user@kylin.apache.org' <user@kylin.apache.org> Subject: RE: optimal parameters Thanks for your inputs.. Is there any other way to get 80+ dimensions into one Cube? Can we split the cube – 20 Dimension Cube 1 – 20 dimensions Cube2 – 20 dimensions Query should take the data from both cube – Cube1+cube2 – so that Tableau will have 40 dimensions into one worksheet. Pls. advise. Regards, Manoj From: ShaoFeng Shi [mailto:shaofeng...@apache.org] Sent: Friday, February 02, 2018 4:09 PM To: user <user@kylin.apache.org<mailto:user@kylin.apache.org>> Subject: Re: optimal parameters Hi Manoj, 450 millions in one build is a common case for Kylin. But 80+ dimensions is too many, as by default the cube will have 2^N dimension combinations (N is dimension number). I think you have optimized the aggregation group, as by default Kylin only allows 2048 combinations in one Cube. If you see the build is very slow, a possible reason is the cluster's capacity. Please try a smaller data set with a simpler Cube first, and then increase that based on the performance. 2018-02-02 18:17 GMT+08:00 Kumar, Manoj H <manoj.h.ku...@jpmorgan.com<mailto:manoj.h.ku...@jpmorgan.com>>: Any updates on this?? How to process 450 milions of records in one partition – fact table has this much data for one COB. Regards, Manoj From: Kumar, Manoj H Sent: Friday, February 02, 2018 11:45 AM To: 'user@kylin.apache.org<mailto:user@kylin.apache.org>' <user@kylin.apache.org<mailto:user@kylin.apache.org>> Subject: optimal parameters Importance: High Hi Folks – Need your inputs for optimizing the kylin Cube build process – We have approx.. 450 millions of records in one Partition & 80-90 Dimensions to be picked up from the tables. Can you pls. advise on this? What would be optimal way of running the jobs.We have Cloudera cluster of 16 nodes – with 8 cores machine for each nodes. This process is running since 60 minutes. 2018-02-01 23:54:16,257 INFO [pool-9-thread-1] threadpool.DefaultScheduler:116 : CubingJob{id=2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd, name=BUILD CUBE - Deposits - 20170929000000_201709 30000000 - GMT+08:00 2018-02-02 12:37:11, state=READY} scheduled 79923 2018-02-01 23:54:16,258 INFO [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] execution.AbstractExecutable:111 : Executing AbstractExecutable (BUILD CUBE - Deposits - 20170929000000_20 170930000000 - GMT+08:00 2018-02-02 12:37:11) 79924 2018-02-01 23:54:16,263 INFO [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] execution.ExecutableManager:425 : job id:2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd from READY to RUNNING 79925 2018-02-01 23:54:16,271 INFO [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] execution.AbstractExecutable:111 : Executing AbstractExecutable (Extract Fact Table Distinct Columns) 79926 2018-02-01 23:54:16,275 INFO [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] execution.ExecutableManager:425 : job id:2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-02 from READY to RUNNING 79927 2018-02-01 23:54:16,358 INFO [pool-9-thread-1] threadpool.DefaultScheduler:123 : Job Fetcher: 0 should running, 1 actual running, 0 stopped, 1 ready, 86 already succeed, 47 error, 0 discarded, 0 others 79928 2018-02-01 23:54:16,371 INFO [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] common.MapReduceExecutable:115 : parameters of the MapReduceExecutable: -conf /apps/rft/rcmo/apps/kylin/k ylin_namespace/apache-kylin-2.1.0-KYLIN-2846-cdh57/conf/kylin_job_conf.xml -cubename Deposits -output hdfs://sfpdev/tenants/rft/rcmo/kylin/ns_rft_rcmo_creg_poc-kylin_metadata/kylin-2b 8baabe-0d16-4ad8-9c4a-449b24cb0fcd/Deposits/fact_distinct_columns -segmentid da273eda-45ea-4c72-816c-709c8a61df16 -statisticsenabled true -statisticsoutput hdfs://sfpdev/tenants/rft/r cmo/kylin/ns_rft_rcmo_creg_poc-kylin_metadata/kylin-2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd/Deposits/fact_distinct_columns/statistics -statisticssamplingpercent 100 -jobname Kylin_Fact_D istinct_Columns_Deposits_Step -cubingJobId 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd 79929 2018-02-01 23:54:16,424 INFO [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] steps.FactDistinctColumnsJob:106 : Starting: Kylin_Fact_Distinct_Columns_Deposits_Step 79930 2018-02-01 23:54:16,775 INFO [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] hive.metastore:386 : Trying to connect to metastore with URI thrift://bdtpisr3n1.svr.us.jpmchase.net:9083<http://bdtpisr3n1.svr.us.jpmchase.net:9083> 79931 2018-02-01 23:54:16,784 INFO [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] hive.metastore:431 : Opened a connection to metastore, current connections: 3 79932 2018-02-01 23:54:16,784 INFO [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] hive.metastore:483 : Connected to metastore. 79933 2018-02-01 23:54:17,345 INFO [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] common.KylinConfigBase:162 : Kylin Config was updated with kylin.metadata.url : /apps/rft/rcmo/apps/kylin/ kylin_namespace/apache-kylin-2.1.0-KYLIN-2846-cdh57/bin/../tomcat/temp/kylin_job_meta8814952902761392543/meta 79934 2018-02-01 23:54:17,347 INFO [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] persistence.ResourceStore:79 : Using metadata url /apps/rft/rcmo/apps/kylin/kylin_namespace/apache-kylin-2 .1.0-KYLIN-2846-cdh57/bin/../tomcat/temp/kylin_job_meta8814952902761392543/meta for resource store 79935 2018-02-01 23:54:17,354 DEBUG [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] common.AbstractHadoopJob:547 : Dump resources to /apps/rft/rcmo/apps/kylin/kylin_namespace/apache-kylin-2. 1.0-KYLIN-2846-cdh57/bin/../tomcat/temp/kylin_job_meta8814952902761392543/meta took 9 ms 79936 2018-02-01 23:54:17,354 INFO [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] common.AbstractHadoopJob:505 : HDFS meta dir is: file:///apps/rft/rcmo/apps/kylin/kylin_namespace/apache-k<file:///\\apps\rft\rcmo\apps\kylin\kylin_namespace\apache-k> ylin-2.1.0-KYLIN-2846-cdh57/bin/../tomcat/temp/kylin_job_meta8814952902761392543/meta 79937 2018-02-01 23:54:17,470 INFO [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] hdfs.DFSClient:1086 : Created token for a_rcmo_nd: HDFS_DELEGATION_TOKEN owner=a_rcmo...@naeast.ad.JPMORGA<mailto:owner=a_rcmo...@naeast.ad.JPMORGA> NCHASE.COM<https://secureweb.jpmchase.net/readonly/http:/NCHASE.COM>, renewer=yarn, realUser=, issueDate=1517547257468, maxDate=1518152057468, sequenceNumber=917925, masterKeyId=921 on ha-hdfs:sfpdev 79938 2018-02-01 23:54:17,471 INFO [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] security.TokenCache:144 : Got dt for hdfs://sfpdev; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:sfpdev, Ident: (token for a_rcmo_nd: HDFS_DELEGATION_TOKEN owner=a_rcmo...@naeast.ad.jpmorganchase.com<mailto:owner=a_rcmo...@naeast.ad.jpmorganchase.com>, renewer=yarn, realUser=, issueDate=1517547257468, maxDate=1518152057468, sequenceNumber =917925, masterKeyId=921) 79939 2018-02-01 23:54:17,478 INFO [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] client.ConfiguredRMFailoverProxyProvider:100 : Failing over to rm76 79940 2018-02-01 23:54:18,864 INFO [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] mapred.FileInputFormat:249 : Total input paths to process : 482 79941 2018-02-01 23:54:19,518 INFO [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] mapreduce.JobSubmitter:202 : number of splits:482 79942 2018-02-01 23:54:19,566 INFO [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] mapreduce.JobSubmitter:291 : Submitting tokens for job: job_1516848187601_12793 79943 2018-02-01 23:54:19,566 INFO [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] mapreduce.JobSubmitter:293 : Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:sfpdev, Ident: (token for a_rcm o_nd: HDFS_DELEGATION_TOKEN owner=a_rcmo...@naeast.ad.jpmorganchase.com<mailto:owner=a_rcmo...@naeast.ad.jpmorganchase.com>, renewer=yarn, realUser=, issueDate=1517547257468, maxDate=1518152057468, sequenceNumber=917925, masterKeyId=92 1) 79944 2018-02-01 23:54:19,821 INFO [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] impl.YarnClientImpl:260 : Submitted application application_1516848187601_12793 79945 2018-02-01 23:54:19,825 INFO [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] mapreduce.Job:1311 : The url to track the job: http://bdtpisr3n2.svr.us.jpmchase.net:8088/proxy/applicatio Also pls. advise on Spark parameter as well. 147 kylin.engine.mr<https://secureweb.jpmchase.net/readonly/http:/kylin.engine.mr>.reduce-input-mb=400 149 #kylin.engine.mr<https://secureweb.jpmchase.net/readonly/http:/kylin.engine.mr>.max-reducer-number=300 151 kylin.engine.mr<https://secureweb.jpmchase.net/readonly/http:/kylin.engine.mr>.mapper-input-rows=500000 154 #kylin.engine.mr<https://secureweb.jpmchase.net/readonly/http:/kylin.engine.mr>.build-dict-in-reducer=true 157 kylin.engine.mr<https://secureweb.jpmchase.net/readonly/http:/kylin.engine.mr>.uhc-reducer-count=2 159 #### CUBE | DICTIONARY ### 164 kylin.cube.algorithm=inmem 166 ## A smaller threshold prefers layer, a larger threshold prefers in-mem 167 #kylin.cube.algorithm.layer-or-inmem-threshold=7 169 kylin.cube.aggrgroup.max-combination=61440 171 kylin.snapshot.max-mb=1500 kylin.engine.spark.rdd-partition-cut-mb=800 229 kylin.engine.spark.min-partition=1 231 ## Max partition numbers of rdd 232 kylin.engine.spark.max-partition=500 237 kylin.engine.spark-conf.spark.yarn.queue=XXXX 238 kylin.engine.spark-conf.spark.executor.memory=8G 239 kylin.engine.spark-conf.spark.executor.cores=6 240 kylin.engine.spark-conf.spark.executor.instances=10 241 kylin.engine.spark-conf.spark.eventLog.enabled=true 242 kylin.engine.spark-conf.spark.eventLog.dir=XXXX 243 kylin.engine.spark-conf.spark.history.fs.logDirectory=XXXX 244 kylin.engine.spark-conf.spark.hadoop.yarn.timeline-service.enabled=false Regards, Manoj This message is confidential and subject to terms at: http://www.jpmorgan.com/emaildisclaimer<http://www.jpmorgan.com/emaildisclaimer> including on confidentiality, legal privilege, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately. Any unauthorized use is strictly prohibited. -- Best regards, Shaofeng Shi 史少锋 This message is confidential and subject to terms at: http://www.jpmorgan.com/emaildisclaimer including on confidentiality, legal privilege, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately. Any unauthorized use is strictly prohibited.