Hi
Thanks for your answer.
---
At first I create an empty hdfs directory (if directory is empty I did
not have problems before too).
[margusja@hadoopnn1 ~]$ hdfs dfs -mkdir /user/margusja/trips
[margusja@hadoopnn1 ~]$ beeline -f create_externat_table_trips.hql -u
"jdbc:hive2://hadoopnn1.example.com:2181,hadoopnn2.example.com:2181,hadoopdn1.example.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_h...@testhadoop.com"
--verbose
WARNING: Use "yarn jar" to launch YARN applications.
issuing: !connect
jdbc:hive2://hadoopnn1.example.com:2181,hadoopnn2.example.com:2181,hadoopdn1.example.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_h...@testhadoop.com
'' [passwd stripped]
Connecting to
jdbc:hive2://hadoopnn1.example.com:2181,hadoopnn2.example.com:2181,hadoopdn1.example.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_h...@testhadoop.com
Connected to: Apache Hive (version 1.2.1.2.3.4.0-3485)
Driver: Hive JDBC (version 1.2.1.2.3.4.0-3485)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> use default;
Getting log thread is interrupted, since query is done!
No rows affected (1.225 seconds)
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> drop table if exists trips;
Getting log thread is interrupted, since query is done!
No rows affected (0.159 seconds)
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> CREATE EXTERNAL TABLE
`TRIPS`(
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> `bike_nr` string,
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> `duration` int,
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> `start_date` string,
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> `start_station` string,
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> `end_station` string)
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> PARTITIONED BY (
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> `year` int,
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> `month` string)
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> ROW FORMAT DELIMITED
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> FIELDS TERMINATED BY ','
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> LINES TERMINATED BY '\n'
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> STORED AS INPUTFORMAT
0: jdbc:hive2://hadoopnn1.example.com:2181,hado>
'org.apache.hadoop.mapred.TextInputFormat'
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> OUTPUTFORMAT
0: jdbc:hive2://hadoopnn1.example.com:2181,hado>
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> LOCATION
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> '/user/margusja/trips';
Getting log thread is interrupted, since query is done!
No rows affected (0.067 seconds)
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> desc trips;
Getting log thread is interrupted, since query is done!
+--------------------------+-----------------------+-----------------------+--+
| col_name | data_type | comment |
+--------------------------+-----------------------+-----------------------+--+
| bike_nr | string | |
| duration | int | |
| start_date | string | |
| start_station | string | |
| end_station | string | |
| year | int | |
| month | string | |
| | NULL | NULL |
| # Partition Information | NULL | NULL |
| # col_name | data_type | comment |
| | NULL | NULL |
| year | int | |
| month | string | |
+--------------------------+-----------------------+-----------------------+--+
13 rows selected (0.46 seconds)
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> !exit
Closing: 0:
jdbc:hive2://hadoopnn1.example.com:2181,hadoopnn2.example.com:2181,hadoopdn1.host.ee:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_h...@testhadoop.com
But In case I try:
CREATE EXTERNAL TABLE default.st1_test_margusja (
original STRING,
rsyslog_timestamp STRING,
rsyslog_url STRING,
rsyslog_appname STRING,
rsyslog_pos5 STRING,
ts STRING,
url STRING,
username STRING,
ip_address STRING,
log_level STRING,
content STRING
)
COMMENT 'Dealgate logs raw data'
PARTITIONED BY (year STRING, month STRING, day STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES ('input.regex'='([^\\s]+ ([^\\s]+) ([^\\s]+)
([^\\s]+) ([^\\s]+) [^\\s]+
[^\\s]+[\\s]{1,2}([^\\|]+)\\|([^\\|]+)\\|([^\\+]*)\\+([^\\|]*)\\|([^\\|]+)\\|(.+))'
)
STORED AS TEXTFILE
LOCATION '/datasource/test/';
Now I am starting to see loads of GC warnings in hiveserver2 log and
sometomes beeline ends: Error:
org.apache.thrift.transport.TTransportException (state=08S01,code=0)
[hdfs@hadoopnn1 ~]$ hdfs dfs -count -h /datasource/test/
53 7.2 K 8.4 G /datasource/test
And now I use hive:
[margusja@hadoopnn1 ~]$ hive
WARNING: Use "yarn jar" to launch YARN applications.
log4j:WARN No such property [maxBackupIndex] in
org.apache.log4j.DailyRollingFileAppender.
Logging initialized using configuration in
file:/etc/hive/2.3.4.0-3485/0/hive-log4j.properties
hive> CREATE EXTERNAL TABLE default.st1_test_margusja (
> original STRING,
> rsyslog_timestamp STRING,
> rsyslog_url STRING,
> rsyslog_appname STRING,
> rsyslog_pos5 STRING,
> ts STRING,
> url STRING,
> username STRING,
> ip_address STRING,
> log_level STRING,
> content STRING
> )
> COMMENT 'Dealgate logs raw data'
> PARTITIONED BY (year STRING, month STRING, day STRING)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
> WITH SERDEPROPERTIES ('input.regex'='([^\\s]+ ([^\\s]+)
([^\\s]+) ([^\\s]+) ([^\\s]+) [^\\s]+
[^\\s]+[\\s]{1,2}([^\\|]+)\\|([^\\|]+)\\|([^\\+]*)\\+([^\\|]*)\\|([^\\|]+)\\|(.+))'
)
> STORED AS TEXTFILE
> LOCATION '/datasource/test/';
OK
Time taken: 1.254 seconds
hive>
Any hint is welcome
Margus (margusja) Roo
http://margus.roo.ee
skype: margusja
+372 51 48 780
On 11/05/16 02:16, Mich Talebzadeh wrote:
*use default;
drop table if exists trips;
CREATE EXTERNAL TABLE `TRIPS`(
`bike_nr` string,
`duration` int,
`start_date` string,
`start_station` string,
`end_station` string)
PARTITIONED BY (
`year` int,
`month` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'/test/text/trips';*