his work has not been kicked off.)
>
> For the Spark cubing optimization, I uploaded the slide we talked in Kylin
> Meetup @Shanghai, hope it is helpful to you:
> https://www.slideshare.net/ShiShaoFeng1/spark-tunning-in-apache-kylin
>
> 2018-08-30 13:39 GMT+08:00 Sonny Heer :
-configure.html
Thanks
On Tue, Aug 28, 2018 at 8:17 PM Sonny Heer wrote:
> yeah seems that way. I did copy over the spark-defaults.conf from EMR to
> KYLIN_HOME/spark/conf
>
> e.g.
>
> spark.driver.extraClassPath
> :/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/u
our reference. As
> EMR version keeps changing, there might be other cases.
>
> Please let me know if it works. I can add this piece to the documentation
> if got verified.
>
> 2018-08-29 6:04 GMT+08:00 Sonny Heer :
>
>> After fixing the above issue by updating spark_hom
)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
On Tue, Aug 28, 2018 at 8:11 AM Sonny Heer wrote:
> Unable
Unable to build cube at step "#6 Step Name: Build Cube with Spark"
Looks to be a classpath issue with spark not able to find some amazon emr
libs. when i look in spark defaults /etc/spark/conf i do see the classpath
being set correctly.
any ideas?
-
Exception in thread "main" java
n.run(BlockingRpcConnection.java:334)
... 1 more
On Tue, Aug 28, 2018 at 7:59 AM Sonny Heer wrote:
> while using EMR and autoscaling during multiple cube builds.
>
> Some builds intermittently fail with the following exception and halt that
> cube build. Typically restarting th
while using EMR and autoscaling during multiple cube builds.
Some builds intermittently fail with the following exception and halt that
cube build. Typically restarting the build completes successfully.
---
5d388edae3b13f9f1d6a709720cc5378. is closing
at
org.apache.hadoop.hbase.reg
On Thu, Aug 9, 2018 at 7:12 PM ShaoFeng Shi wrote:
> Hi Sonny,
>
> We have a JDBC metadata store for Kylin (support MySQL/SQLServer); I think
> that can address your problem. If the community has the need, we can
> opensource it into Kylin.
>
> 2018-08-10 7:21 GMT+08:00 So
Has anyone done any work around moving kylin metadata off of hbase?
We'd like to utilize EMR hbase read replica option with kylin, but kylin
writes to hbase from even query nodes to hbase.
Thoughts?
;
> I'm afraid we don't have a video (even there is one, it will be in Chinese
> which I think won't be helpful). Our docker file hasn't yet open sourced. I
> will follow the progress and notify you if there is any news.
>
> On Aug 7, 2018, 11:12 PM +0800, Sonny
ShaoFeng,
Is Strikingly open to sharing their work? It appears our use case is
similar and would love to see what work they have matches ours.
On Mon, Aug 6, 2018 at 7:01 AM Sonny Heer wrote:
> Does that require a HA cluster & kylin installed on its own instance? EMR
> doesn
ot tested with EMR, but I
> think they are similar.
>
>
> 2018-08-06 10:55 GMT+08:00 Sonny Heer :
>
>> Yea that would be great if Kylin can have a centralized metastore in RDS.
>>
>> The big problem for us now is this:
>>
>> 2 emr clusters each running kylin on
:
> Hi Sonny,
>
> EMR HBase read replica is a great feature, but we didn't try. Are you
> going to using this feature? or just want to deploy Kylin as a cluster?
>
> If putting Kylin metadata to RDS, can it be easier for you?
>
> 2018-08-04 0:05 GMT+08:00 Sonny Heer :
>
&g
e approach, but that is prone to
errors as emr libs have to copied around..
ref:
https://aws.amazon.com/blogs/big-data/setting-up-read-replica-clusters-with-hbase-on-amazon-s3/
Anyone else have experience or can share their use case on emr?
Thanks!
On Thu, Aug 2, 2018 at 2:32 PM Sonny Heer wr
Is it possible in the new version of kylin to have multiple EMR clusters
with Kylin installed on master node but talking to the same S3 location.
e.g. one Write EMR cluster and one Read EMR cluster
?
d=713832, waitTime=60001, operationTimeout=6 expired.
at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:70)
at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1204)
... 22 more
On Tue, Apr 17, 2018 at 10:44 AM, Sonny Heer wrote:
> OK it does move to another RegionServer. we're doing
OK it does move to another RegionServer. we're doing more testing, but it
appears DN that hosts the kylin_metadata goes down sometimes. Sometimes
the same job succeeds...
On Tue, Apr 17, 2018 at 10:36 AM, Sonny Heer wrote:
> Not sure if this is normal or not, but I see kylin metadata
Not sure if this is normal or not, but I see kylin metadata is on a single
region server (DN & RS on node).
if this datanode goes down... it appears kylin isn't able to pull jobs for
monitor or complete jobs?
hbase requests:
kylin_metadata,,1467291011009.2cc83fc3fb51700a8a9884e5c5401e20. 553455
Yes we currently have 1.6.0 Hbase1x. Would like to test things on same
cluster with 2.0
On Fri, Apr 13, 2018 at 9:02 AM Ted Yu wrote:
> bq. Our stack is on spark 1.6
>
> Did you mean that you have Kylin 1.6 in your cluster ?
>
> Cheers
>
> On Fri, Apr 13, 2018 at 8:48
The latest version of kylin has various properties to override the prefix
for hbase table name and zookeeper locations. Our stack is on spark 1.6 -
will installing Kylin 2.0 on the same cluster as 1.6 cause any issues
(HBase metadata / tables etc.)?
Note: those prefix properties do not appear to
3 nodes are set to "query" and 1 is set to "all". kylin 1.6
kylin.server.mode=query
kylin.server.mode=all
On Wed, Apr 11, 2018 at 10:13 PM, ShaoFeng Shi
wrote:
> Do you have multiple Kylin "job" instances?
>
> 2018-04-12 12:35 GMT+08:00 Sonny Heer
we have a daily job that builds cubes. It works fine for some number of
days, but at times fails with this:
Error Log: org.apache.kylin.job.exception.ExecuteException:
java.lang.IllegalStateException: Overwriting conflict
/execute_output/a1052507-e3bc-4302-ac73-bbc169a597ff-07, expect old TS
152
Does changing only the count distinct measure from HLL to precisely make
the query slower?
With HLL some of our queries were sub-second, but after moving to precisely
- the same queries are slow. Is this expected? How to fix?
Thanks
Any reason why fast cubing was removed in 1.6? I see the following
public enum AlgorithmEnum {
LAYER, INMEM
}
n 14 March 2018 at 16:54, Sonny Heer wrote:
>
>> 8 YARN nodes with 11 slots each. each slot is configured to ~2gb. Step
>> #3 in Kylin is launching 19 mappers and 5 reducers. 5 reducers when there
>> are 88 slots.
>>
>> btw: kylin version is 1.6
>>
>&g
8 YARN nodes with 11 slots each. each slot is configured to ~2gb. Step #3
in Kylin is launching 19 mappers and 5 reducers. 5 reducers when there are
88 slots.
btw: kylin version is 1.6
On Wed, Mar 14, 2018 at 9:48 AM, Sonny Heer wrote:
> YARN is properly configured. we use many other
YARN is properly configured. we use many other m/r and spark programs that
utilize the full slots. It's only when building cubes.
On Wed, Mar 14, 2018 at 9:46 AM, Alberto Ramón
wrote:
> You need check your yarn configuration first
>
> On Wed, 14 Mar 2018, 14:58 Sonny Heer, wr
Step 3 isn't using our full cluster. How can i increase the
mappers/reducers to use all the slots? Any config to look at in kylin?
Thanks
>
>> Please use vendor's forum.
>>
>> Thanks
>>
>> Original message
>> From: Sonny Heer
>> Date: 2/28/18 2:35 PM (GMT-08:00)
>> To: user@kylin.apache.org
>> Subject: Re: running spark on kylin 2.2
>>
>&g
text.table(Ljava/lang/String;)Lorg/apache/spark/sql/Dataset;
at
org.apache.kylin.engine.spark.SparkCubingByLayer.execute(SparkCubingByLayer.java:167
It appears it only supports spark 2.x? Please advise what we can do to
make this work on HDP 2.4...
Thanks
On Wed, Feb 28, 2018 at 2:07 PM, Sonny H
I don't see spark-libs.jar under $KYLIN_HOME/spark/jars
per this doc: http://kylin.apache.org/docs21/tutorial/cube_spark.html
On Wed, Feb 28, 2018 at 10:30 AM, Sonny Heer wrote:
> Hi Billy
> Looks like the current error is this:
>
> Error: Could not find o
gt;
>
> 2018-02-28 22:53 GMT+08:00 Sonny Heer :
> > Anyone know what I need to set in order for spark-submit to use the HDP
> > version of spark and not the internal one?
> >
> > currently i see:
> >
> > export HADOOP_CONF_DIR=/ebs/kylin/hadoop-conf &am
ot of ram memory available on the machine.
>
> Regards,
>
> El 12/01/2018 a las 18:10, Sonny Heer escribió:
>
> Kylin users,
>
> Is it possible to run new version of kylin 2.x along side old version
> 1.6.x ?
>
> Thanks
>
>
> --
>
> *Roberto Tardío Olm
Kylin users,
Is it possible to run new version of kylin 2.x along side old version 1.6.x
?
Thanks
reducers; Are all mapper/reducers take a similar
> time, or some specific took much longer than others?
>
> Furthermore, for deep div, please provide the cube definition; We need to
> know the dimension number, aggregation groups, encodings method as well as
> other possible facto
Has anyone used kylin in EMR and push data to S3 and finally down to
persistent cluster (e.g. ec2/ambari/HDFS)?
How would kylin map HBase tables to kylin project/cube?
Thanks
can someone explain what step 3 does?
specifically how it relates dimensions, measures, and row keys. our input
fact table is abou 234 million records and this step is taking forever.
we have 450gb memory with 25 slots per node, which is about 225
concurrently running slots, and its still taking
he vendor
> products.
>
> 2017-12-09 5:43 GMT+08:00 Sonny Heer :
>
>> We are using Kylin 1.6 for a while now. One the problems we continue to
>> run into is having to maintain HBase backend. Typically regionservers go
>> down for different reasons.
>>
>> We pr
We are using Kylin 1.6 for a while now. One the problems we continue to
run into is having to maintain HBase backend. Typically regionservers go
down for different reasons.
We prefer to move to a columnar storage backend. I heard there was a
version of kylin that replaced HBase? Any updates on
We have a table in hive which has a gender column (char(1)). The group by
shows the following:
M 8946041
8 9
F 14215364
215400
Kylin shows:
10 GENDER char(1) 274693
Looking at the HiveColumnCardinalityJob code I don't see anything obviously
wrong. Any idea why that value is wrong in th
Any updates on this issue? Has it been fixed in later versions? we are on
1.6
On Fri, Sep 22, 2017 at 5:34 PM, Li Yang wrote:
> The JIRA is good. Thanks Sonny!
>
> On Tue, Sep 19, 2017 at 8:46 AM, Sonny Heer wrote:
>
>> Here is the JIRA: https://issues.apache.org/jira
have system wide impact.
>
> Once KYLIN-2717 <https://issues.apache.org/jira/browse/KYLIN-2717> is
> done, tables are isolated by project, we will be ready to grant table
> permissions to project level admin.
>
> On Sun, Sep 17, 2017 at 6:23 AM, Sonny Heer wrote:
>
>>
Kylin versions is 1.6
Is there a way to give full access to a project? Currently we are able to
give access to a project via ROLE in ldap, but that doesn't allow user to
sync/load hive tables (the blue buttons are missing). Also unable to edit
model. In order to give that permission we have to
Whats the best way to move kylin cube environment1 to environment2? e.g.
Hbase1 -> Hbase2. I know HBase has tools for hbase part, more interested
in kylin. Also I'm not talking about the cube desc, etc. I understand
Kylin has the rest call for loading cube desc. Looking to avoid building
the c
deed.
>
> On Thu, Jun 22, 2017 at 10:49 PM, Sonny Heer wrote:
>
>> Hi users,
>>
>> I need some clarification on how to properly use aggregation groups.
>>
>> Assume I have report page 1 which has filters A, B, C, D. When user is
>> in page 2, these filter
nct
>> measure. then it will fall into the second for loop and it is a column
>> that needs a dictionary because its bitmap (getcolumnsNeedDictionary
>> method).
>>
>>
>> It is still running out of java heap. where is this actually running?
>> is it on the
12 PM, Sonny Heer wrote:
> If it is fix length dimension and also a count distinct measure. the hive
> type is bigint. then should that be building a dictionary or not?
>
> On Fri, Jun 23, 2017 at 11:08 AM, Sonny Heer wrote:
>
>> Another question. Is there any way to set p
If it is fix length dimension and also a count distinct measure. the hive
type is bigint. then should that be building a dictionary or not?
On Fri, Jun 23, 2017 at 11:08 AM, Sonny Heer wrote:
> Another question. Is there any way to set properties per step in cube
> building?
>
>
Another question. Is there any way to set properties per step in cube
building?
On Fri, Jun 23, 2017 at 6:56 AM, Sonny Heer wrote:
> Yeah...it is. making it fix length doesn't require dict. I thought it
> was int in hive, but yah its bigint. It got past that, but is now stuck
tail metadata for trouble shooting, that is important for
> analysis; otherwise we can only guess, but there are many possiblilies
> cause a problem...
>
> 2017-06-23 14:31 GMT+08:00 Sonny Heer :
>
>> It's a dimension and count distinct measure. No GD
>>
>>
It's a dimension and count distinct measure. No GD
On Thu, Jun 22, 2017 at 11:27 PM ShaoFeng Shi
wrote:
> Does the "USER_ID" column appear in other measures?
>
> 2017-06-23 13:57 GMT+08:00 Sonny Heer :
>
>> It is set to this:
>>
>
quot;last_modified_time":1498183660303},"dictionary_class":null,"cardinality":0}
On Thu, Jun 22, 2017 at 10:47 PM, ShaoFeng Shi
wrote:
> Seems Kylin still trying to build dictionary for the UHC dimension. Could
> you double check the dimension encoding
"JSON(Cube)" tab.
>
> 2017-06-23 8:48 GMT+08:00 Sonny Heer :
>
>> The column has count distinct measure as well. so it still doesn't need
>> GD? i tried, but appears it ran out of memory.
>>
>> On Thu, Jun 22, 2017 at 5:36 PM, ShaoFeng Shi
>>
; as the encoding in the dimension,
> and leave blank for the global dictionary.
>
> 2017-06-23 6:30 GMT+08:00 Sonny Heer :
>
>> Thanks ShaoFeng.
>>
>> so to clarify. for UHC dimension. It is integer. So i can set encoding
>> to integer and then also include it
ct size will beyond Java heap size. In this case, please use
> fixed_length encoding; If that column is integer or long type, you can use
> "integer" encoding. In the meanwhile, keep using GD for the count distinct
> measure.
>
> 2017-06-22 13:37 GMT+08:00 Sonny Heer :
>
Hi users,
I need some clarification on how to properly use aggregation groups.
Assume I have report page 1 which has filters A, B, C, D. When user is in
page 2, these filters are passed along to (drilldown). Page 2 has other
filterable fields (1,2,3), but each is independently connected only to
how best kylin can handle
this? should I remove it as GD and add as dim & fix length?
On Wed, Jun 21, 2017 at 10:33 PM, Sonny Heer wrote:
> Hi,
>
> No, not as a dimension. Only for Count distinct measures.
>
>
> On Wed, Jun 21, 2017 at 10:25 PM, ShaoFeng Shi
> wrot
ionary, as it can only encode a
> String to an integer, it doesn't support decode the String from an integer.
> The main usage for GlobalDictionary is the precise Count Distinct, as
> bitmap only accepts integer as input, so Kylin use the GD to do the
> conversion.
>
> 2017
After finally getting the global dictionary to work with building the cube
there are now exceptions during query.
ERROR in query:
"AppendTrieDictionary can't retrive value from id"
Here is where it ends up in the code::: ->
@Override
final protected T getValueFromIdImpl(int id) {
:26 AM, ShaoFeng Shi wrote:
> Hi Sonny, I need more info:
> 1) where you see this error trace, in kylin.log or in Mapreduce's log?
> 2) what's the configuration of "kylin.hdfs.working.dir" in
> conf/kylin.properties?
>
> 2017-06-05 23:58 GMT+08:00 Sonny Heer
is used when run Kylin. You can check whether
> your *-site.xml are in the classpath and in a front position.
>
> 2017-06-05 22:18 GMT+08:00 Sonny Heer :
>
>> okay - checking. Is he running on HDP 2.4 and 1.6.0 version of kylin?
>> Where does kylin add core-site to CP? in
t
> find problem. In Meituan.com, there are many Cubes using the
> GlobalDictionary to implement the concice distinct count, and runs well. I
> still suggest you check the environment configurations.
>
> 2017-06-05 21:31 GMT+08:00 Sonny Heer :
>
>> Does KYLIN-2192 fix this? An
Does KYLIN-2192 fix this? Anyone run into this?
On Sun, Jun 4, 2017 at 10:33 PM, Sonny Heer wrote:
> Looks like a new hadoop conf is initialized and the hadoop FileSystem
> object is used after that:
>
> Configuration conf = new Configuration();
>
> (FileSystem.get(file
overs/adds core-site to the classpath.
I'm a little surprised no one has run into this yet...am i missing
something?
On Sun, Jun 4, 2017 at 10:28 PM, ShaoFeng Shi
wrote:
> @kangkaisen, kaisen, any idea about this error?
>
> 2017-06-05 13:00 GMT+08:00 Sonny Heer :
>
>> wher
of HDFS. You need
> check whether the proper core-site.xml (with hdfs as default file system)
> is used by kylin. Or you can upgrade to kylin 2.0 to see whether it works.
>
> 2017-06-05 11:25 GMT+08:00 Sonny Heer :
>
>> Any ideas on this? Not sure ,but appears the dictionary i
Any ideas on this? Not sure ,but appears the dictionary is looking on
local FS vs HDFS? ...what am i missing here?
On Sat, Jun 3, 2017 at 9:05 PM, Sonny Heer wrote:
> Kylin version 1.6.0
>
> Our data has High cardinality columns that require count distinct
> measures. The
Kylin version 1.6.0
Our data has High cardinality columns that require count distinct
measures. Therefore using GlobalDictionary. The .index file exists on
HDFS, but kylin errors out with FileNotFound exception (see below). RowKey
is set to "dict". Any ideas if this is a known issue or somethi
Kylin users,
We use hive views to feed kylin. This worked fine until we added more
tables to the hive join. Now Kylin never finishes from Step#1. I
understand a really large view will take time, but which properties should
I look at in order to provide more resources for this step?
--
tical
queries) with more coverage (drill).(e.g. cases of no realization found
or cross cube join)...
On Thu, Mar 16, 2017 at 12:20 PM, Billy Liu wrote:
> Hi Heer,
>
> May I ask more more for this proposal, the benefits or use case?
>
> 2017-03-16 7:58 GMT-07:00 Sonny Heer :
>
Is there any support for Apache Drill within Kylin? Any tickets for work
done around this yet?
Thanks
o update existing cube since the underline data
> changed, please click "Refresh" on the Cube.
>
> 2017-03-15 13:23 GMT-07:00 Sonny Heer :
>
>> Thanks ShaoFeng,
>>
>> views is what we are doing now. How does kylin handle
>> new/updated/deleted records?
wrote:
> View might be slower, but it is flexible. This is a tradeoff.
>
> 2017-03-14 4:15 GMT+08:00 Sonny Heer :
>
>> How about Hive view fed into kylin vs materialized table...performance
>> impact?
>>
>> On Fri, Mar 10, 2017 at 1:34 AM, ShaoFeng Shi
>> wrot
How about Hive view fed into kylin vs materialized table...performance
impact?
On Fri, Mar 10, 2017 at 1:34 AM, ShaoFeng Shi
wrote:
> One Cube is one topic, all activities like build and ACL are managed by
> cube; one query should only hit one cube; acrossing cube query isn't
> suggested.
>
-
2.x, it supports multiple Fact tables in one Cube; You don't
> need create additional view or flat table, just use the original names.
>
> 2017-03-09 9:45 GMT+08:00 Sonny Heer :
>
>> to clarify in my use case the data can be organized to either have a
>> couple fact ta
to clarify in my use case the data can be organized to either have a couple
fact tables or a large single one. Queries are open ended at this point.
queries may cross facts or may not.
On Wed, Mar 8, 2017 at 5:13 PM, Sonny Heer wrote:
> Let me put it anther way. assume a SALES table an
from
Kylin perspective which is better. have one-to-many in a single table or
some normalized form?
On Wed, Mar 8, 2017 at 4:24 PM, Billy Liu wrote:
> please check star schema first: https://en.wikipedia.org/wiki/Star_schema
>
> 2017-03-08 12:48 GMT-08:00 Sonny Heer :
>
>> Hi I
Hi I'm somewhat new to Kylin. we have a relational db schema imported into
hive as is at the moment. The schema is highly normalized with lots of
tables. I can see this database having multiple fact tables or a handful
of fact tables.
In Kylin I see when creating a model (star) you have the opt
77 matches
Mail list logo