Hi Gopal,
Thanks for your comment. Yes, Kylin generated the query. I'm using Kylin 1.5.3. But I still not sure how can I fix the problem. I'm a beginner of Hive and Kylin, Can the problem be fixed by just change the hive or kylin settings? The total data is about 1 billion lines, I'm trying to build a cube as the base and then dealing with the increment everyday. Show I separate the 1 billion lines to hundreds of pieces and then build the cube? Thanks, Minghao Feng ________________________________ From: Gopal Vijayaraghavan <go...@hortonworks.com> on behalf of Gopal Vijayaraghavan <gop...@apache.org> Sent: Wednesday, August 17, 2016 11:10:45 AM To: user@hive.apache.org Subject: Re: hive throws ConcurrentModificationException when executing insert overwrite table > This problem has blocked me a whole week, anybodies have any ideas? This might be a race condition here. <https://github.com/apache/hive/blob/master/shims/common/src/main/java/org/ apache/hadoop/hive/io/HdfsUtils.java#L68> aclStatus.getEntries(); is being modified without being copied (oddly with Kerberos, it might be okay). >> >= '1970-01-01 01:00:00' AND TBL_HIS_UWIP_SCAN_PROM.START_TIME < >>'2010-01-01 01:00:00') DISTRIBUTE BY RAND(); Did Kylin generate this query? This pattern is known to cause data loss during runtime. Distribute BY RAND() loses data when map tasks fail. > at org.apache.hadoop.hdfs.DFSClient.setAcl(DFSClient.java:3242) ... > at >org.apache.hadoop.hive.io.HdfsUtils.setFullFileStatus(HdfsUtils.java:126) > An interesting thing is that if I narrow down the 'where' to make the >select query only return about 300,000 line, the insert SQL can be >completed successfully. Producing exactly 1 file will fix the issue. Cheers, Gopal