H All, I am trying to use HiveOutput Module to insert the ingested data into
hive external table. The table is already created with the same location as
/dt.application.<app_name>.operator.hiveOutput.prop.filePath/ property and
partition column is accessdate. With below configurations in property file,
the hdfs file structure I am expecting is
/common/data/test/accessCounts
|
----- accessdate=2017-05-15
|
-------
<fil1>
-------
<fil2>
----- accessdate=2017-05-16
|
-------
<fil1>
-------
<fil2>
but the actual structure look like
/common/data/test/accessCounts/<yarn_application_id_for_apex_ingest_appl>/10
|
----- 2017-05-15
|
------- <fil1>
------- <fil2>
|
----- 2017-05-16
|
------- <fil1>
------- <fil2>
Questions
1. Why the yarn_application_id and some other extra directories are created
when it is no where specified in config
2. If I want to achieve the structure I want, what other configurations I
will need to set?
HiveOutputModule Configs
==================
<property>
<name>dt.application.<app_name>.operator.hiveOutput.prop.filePath
</name>
<value>/common/data/test/accessCounts</value>
</property>
<property>
<name>dt.application.<app_name>.operator.hiveOutput.prop.databaseUrl
</name>
<value><jdbc_url></value>
</property>
<property>
<name>dt.application.<app_name>.operator.hiveOutput.prop.databaseDriver
</name>
<value>org.apache.hive.jdbc.HiveDriver</value>
</property>
<property>
<name>dt.application.<app_name>.operator.hiveOutput.prop.tablename
</name>
<value><hive table name where records needs to be
inserted></value>
</property>
<property>
<name>dt.application.<app_name>.operator.hiveOutput.prop.hivePartitionColumns
</name>
<value>{accessdate}</value>
</property>
<property>
<name>dt.application.<app_name>.operator.hiveOutput.prop.password
</name>
<value><hive connection password></value>
</property>
<property>
<name>dt.application.<app_name>.operator.hiveOutput.prop.userName
</name>
<value><hive connection user></value>
</property>
<property>
<name>dt.application.<app_name>.operator.hiveOutput.prop.hiveColumns
</name>
<value>{col1,col2,col3,col4}</value>
</property>
<property>
<name>dt.application.<app_name>.operator.hiveOutput.prop.hiveColumnDataTypes
</name>
<value>{STRING,STRING,STRING,STRING}</value>
</property>
<property>
<name>dt.application.<app_name>.operator.hiveOutput.prop.hivePartitionColumns
</name>
<value>{accessdate}</value>
</property>
<property>
<name>dt.application.<app_name>.operator.hiveOutput.prop.hivePartitionColumnDataTypes
</name>
<value>{STRING}</value>
</property>
<property>
<name>dt.application.<app_name>.operator.hiveOutput.prop.expressionsForHiveColumns
</name>
<value>{"getCol1()","getCol2()","getCol3()","getCol4()"}</value>
</property>
<property>
<name>dt.application.<app_name>.operator.hiveOutput.prop.expressionsForHivePartitionColumns
</name>
<value>{"getAccessdate()"}</value>
</property>
--
View this message in context:
http://apache-apex-users-list.78494.x6.nabble.com/HiveOutputModule-creating-extra-directories-than-specified-while-saving-data-into-HDFS-tp1620.html
Sent from the Apache Apex Users list mailing list archive at Nabble.com.