
After tweaking the configs, I found out that
"hive.vectorized.execution.enabled" and "hive.auto.convert.join" configs
are the culprit.

I think vectorization on map column data type is not supported in my
current Hive version. Also, the Map Join is having problems on the map data

So, after setting atleast one of these configurations to "false", the cross
product query successfully runs. Of course, there might be some performance
loss since we're turning off the vectorization and Map Join.

Jan Charles

On Thu, Jan 17, 2019 at 1:57 PM Jan Adona <jan.ad...@cheetahdigital.com>

> Hi,
> Just a follow up, I think that JOIN is not the problem here since this
> error also occurs when I am querying 2 tables even without a join and you
> include the map column in the select statement.
> I'm going to rewrite the schema and queries that I've sent before because
> I mistakenly formatted the body that's why it has random asterisks.
> *Schema:*
> *CREATE TABLE test_table0(userid BIGINT, mapCol map<STRING,
> test_table1(userid BIGINT, col1 STRING, col2 STRING)COMMENT 'Test table
> *Rows:*
> *INSERT INTO TABLE test_table0 VALUES (1, map('a', 1, 'b', 2));INSERT INTO
> TABLE test_table0 VALUES (2, map('c', 3, 'd', 4));INSERT INTO TABLE
> test_table1 VALUES (1, 'mycol1', 'mycol2');*
> *Query with a JOIN (fail):*
> *SELECT a.*, b.* FROM test_table0 a INNER JOIN test_table1 b
> ONa.mapCol['a'] = b.userid;*
> *Query without a JOIN (fail):*
> *SELECT a.*, b.* FROM test_table0 a, test_table1 b;*
> *Query without a JOIN, not including the column with the map data type
> (success):*
> *SELECT a.userid, b.* FROM test_table0 a, test_table1 b;*
> *Error message of the failed queries:*
> ERROR : Status: Failed
> ERROR : Vertex failed, vertexName=Map 1,
> vertexId=vertex_1546408189013_0179_7_01, diagnostics=[Task failed,
> taskId=task_1546408189013_0179_7_01_000000, diagnostics=[TaskAttempt 0
> failed, info=[Error: Error while running task ( failure ) :
> attempt_1546408189013_0179_7_01_000000_0:java.lang.RuntimeException:
> java.lang.RuntimeException: Map operator initialization failed
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
> at
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
> at
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
> at
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Map operator initialization failed
> at
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:354)
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
> ... 16 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected
> column vector type MAP
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorCopyRow.init(VectorCopyRow.java:302)
> at
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.initializeOp(VectorMapJoinCommonOperator.java:419)
> at
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.initializeOp(VectorMapJoinGenerateResultOperator.java:115)
> at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
> at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:572)
> at
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:524)
> at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
> at
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:335)
> ... 17 more
> Also, I'm running this query on an HDP 3.0.1 cluster with Apache Hive
> 3.1.0.
> Thanks,
> Jan Charles
> On Wed, Jan 16, 2019 at 7:46 PM Jan Adona <jan.ad...@cheetahdigital.com>
> wrote:
>> Hi,
>> I'm trying out a JOIN query on 2 tables using a map value (which is a
>> BIGINT) on the 1st table and a BIGINT column on the 2nd table. So here is
>> the schema:
>> *CREATE TABLE test_table0(userid BIGINT, mapCol map<STRING, BIGINT>)*
>> *COMMENT 'Test table 0'*
>> *CREATE TABLE test_table1(userid BIGINT, col1 STRING, col2 STRING)*
>> *COMMENT 'Test table 1'*
>> with these rows:
>> *INSERT INTO TABLE test_table0 VALUES (1, map('a', 1, 'b', 2));*
>> *INSERT INTO TABLE test_table0 VALUES (2, map('c', 3, 'd', 4));*
>> *INSERT INTO TABLE test_table1 VALUES (1, 'mycol1', 'mycol2');*
>> and here is the query:
>> *SELECT a.*, b.* FROM test_table0 a INNER JOIN test_table1 b ON
>> a.mapCol['a'] = b.userid;*
>> which results to this error:
>> *ERROR : Status: Failed*
>> *ERROR : Vertex failed, vertexName=Map 1,
>> vertexId=vertex_1546408189013_0167_9_01, diagnostics=[Task failed,
>> taskId=task_1546408189013_0167_9_01_000000, diagnostics=[TaskAttempt 0
>> failed, info=[Error: Error while running task ( failure ) :
>> attempt_1546408189013_0167_9_01_000000_0:java.lang.RuntimeException:
>> java.lang.RuntimeException: Map operator initialization failed*
>> * at
>> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)*
>> * at
>> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)*
>> * at
>> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)*
>> * at
>> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)*
>> * at
>> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)*
>> * at java.security.AccessController.doPrivileged(Native Method)*
>> * at javax.security.auth.Subject.doAs(Subject.java:422)*
>> * at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)*
>> * at
>> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)*
>> * at
>> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)*
>> * at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)*
>> * at
>> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)*
>> * at
>> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)*
>> * at
>> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)*
>> * at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)*
>> * at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)*
>> * at java.lang.Thread.run(Thread.java:745)*
>> *Caused by: java.lang.RuntimeException: Map operator initialization
>> failed*
>> * at
>> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:354)*
>> * at
>> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)*
>> * ... 16 more*
>> *Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected
>> column vector type MAP*
>> * at
>> org.apache.hadoop.hive.ql.exec.vector.VectorCopyRow.init(VectorCopyRow.java:302)*
>> * at
>> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.initializeOp(VectorMapJoinCommonOperator.java:419)*
>> * at
>> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.initializeOp(VectorMapJoinGenerateResultOperator.java:115)*
>> * at
>> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)*
>> * at
>> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:572)*
>> * at
>> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:524)*
>> * at
>> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)*
>> * at
>> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:335)*
>> * ... 17 more*
>> But running this query is successful (I just discarded the map column
>> "colMap"):
>> *SELECT a.userid, b.* FROM test_table0 a INNER JOIN test_table1 b ON
>> a.mapCol['a'] = b.userid;*
>> Btw, I'm running this query on HDP 3.0.1 cluster with Apache Hive 3.1.0.
>> Thanks,
>> Jan Charles

Reply via email to