Re: Create external table

Margus Roo Tue, 10 May 2016 22:16:40 -0700

Hi

Thanks for your answer.

---

At first I create an empty hdfs directory (if directory is empty I didnot have problems before too).


[margusja@hadoopnn1 ~]$ hdfs dfs -mkdir /user/margusja/trips

[margusja@hadoopnn1 ~]$ beeline -f create_externat_table_trips.hql -u"jdbc:hive2://hadoopnn1.example.com:2181,hadoopnn2.example.com:2181,hadoopdn1.example.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/[email protected]"--verbose

WARNING: Use "yarn jar" to launch YARN applications.

issuing: !connectjdbc:hive2://hadoopnn1.example.com:2181,hadoopnn2.example.com:2181,hadoopdn1.example.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/[email protected]'' [passwd stripped]Connecting tojdbc:hive2://hadoopnn1.example.com:2181,hadoopnn2.example.com:2181,hadoopdn1.example.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/[email protected]

Connected to: Apache Hive (version 1.2.1.2.3.4.0-3485)
Driver: Hive JDBC (version 1.2.1.2.3.4.0-3485)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> use default;
Getting log thread is interrupted, since query is done!
No rows affected (1.225 seconds)
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> drop table if exists trips;
Getting log thread is interrupted, since query is done!
No rows affected (0.159 seconds)

0: jdbc:hive2://hadoopnn1.example.com:2181,hado> CREATE EXTERNAL TABLE`TRIPS`(

0: jdbc:hive2://hadoopnn1.example.com:2181,hado>   `bike_nr` string,
0: jdbc:hive2://hadoopnn1.example.com:2181,hado>   `duration` int,
0: jdbc:hive2://hadoopnn1.example.com:2181,hado>   `start_date` string,
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> `start_station` string,
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> `end_station` string)
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> PARTITIONED BY (
0: jdbc:hive2://hadoopnn1.example.com:2181,hado>   `year` int,
0: jdbc:hive2://hadoopnn1.example.com:2181,hado>   `month` string)
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> ROW FORMAT DELIMITED
0: jdbc:hive2://hadoopnn1.example.com:2181,hado>   FIELDS TERMINATED BY ','
0: jdbc:hive2://hadoopnn1.example.com:2181,hado>   LINES TERMINATED BY '\n'
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> STORED AS INPUTFORMAT

0: jdbc:hive2://hadoopnn1.example.com:2181,hado>'org.apache.hadoop.mapred.TextInputFormat'

0: jdbc:hive2://hadoopnn1.example.com:2181,hado> OUTPUTFORMAT

0: jdbc:hive2://hadoopnn1.example.com:2181,hado>'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

0: jdbc:hive2://hadoopnn1.example.com:2181,hado> LOCATION
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> '/user/margusja/trips';
Getting log thread is interrupted, since query is done!
No rows affected (0.067 seconds)
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> desc trips;
Getting log thread is interrupted, since query is done!
+--------------------------+-----------------------+-----------------------+--+
|         col_name         |       data_type       | comment        |
+--------------------------+-----------------------+-----------------------+--+
| bike_nr                  | string |                       |
| duration                 | int |                       |
| start_date               | string |                       |
| start_station            | string |                       |
| end_station              | string |                       |
| year                     | int |                       |
| month                    | string |                       |
|                          | NULL                  | NULL                  |
| # Partition Information  | NULL                  | NULL                  |
| # col_name               | data_type             | comment               |
|                          | NULL                  | NULL                  |
| year                     | int |                       |
| month                    | string |                       |
+--------------------------+-----------------------+-----------------------+--+
13 rows selected (0.46 seconds)
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> !exit

Closing: 0:jdbc:hive2://hadoopnn1.example.com:2181,hadoopnn2.example.com:2181,hadoopdn1.host.ee:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/[email protected]



But In case I try:

 CREATE EXTERNAL TABLE default.st1_test_margusja (
  original STRING,
  rsyslog_timestamp STRING,
  rsyslog_url STRING,
  rsyslog_appname STRING,
  rsyslog_pos5 STRING,
  ts STRING,
  url STRING,
  username STRING,
    ip_address STRING,
    log_level STRING,
    content STRING
  )
  COMMENT 'Dealgate logs raw data'
  PARTITIONED BY (year STRING, month STRING, day STRING)
  ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'

WITH SERDEPROPERTIES ('input.regex'='([^\\s]+ ([^\\s]+) ([^\\s]+)([^\\s]+) ([^\\s]+) [^\\s]+[^\\s]+[\\s]{1,2}([^\\|]+)\\|([^\\|]+)\\|([^\\+]*)\\+([^\\|]*)\\|([^\\|]+)\\|(.+))')

  STORED AS TEXTFILE
  LOCATION '/datasource/test/';

Now I am starting to see loads of GC warnings in hiveserver2 log andsometomes beeline ends: Error:org.apache.thrift.transport.TTransportException (state=08S01,code=0)


[hdfs@hadoopnn1 ~]$ hdfs dfs -count -h /datasource/test/
          53        7.2 K              8.4 G /datasource/test

And now I use hive:

[margusja@hadoopnn1 ~]$ hive
WARNING: Use "yarn jar" to launch YARN applications.

log4j:WARN No such property [maxBackupIndex] inorg.apache.log4j.DailyRollingFileAppender.

Logging initialized using configuration infile:/etc/hive/2.3.4.0-3485/0/hive-log4j.properties

hive>  CREATE EXTERNAL TABLE default.st1_test_margusja (
    >   original STRING,
    >   rsyslog_timestamp STRING,
    >   rsyslog_url STRING,
    >   rsyslog_appname STRING,
    >   rsyslog_pos5 STRING,
    >   ts STRING,
    >   url STRING,
    >   username STRING,
    >     ip_address STRING,
    >     log_level STRING,
    >     content STRING
    >   )
    >   COMMENT 'Dealgate logs raw data'
    >   PARTITIONED BY (year STRING, month STRING, day STRING)
    >   ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'

> WITH SERDEPROPERTIES ('input.regex'='([^\\s]+ ([^\\s]+)([^\\s]+) ([^\\s]+) ([^\\s]+) [^\\s]+[^\\s]+[\\s]{1,2}([^\\|]+)\\|([^\\|]+)\\|([^\\+]*)\\+([^\\|]*)\\|([^\\|]+)\\|(.+))')

    >   STORED AS TEXTFILE
    >   LOCATION '/datasource/test/';
OK
Time taken: 1.254 seconds
hive>


Any hint is welcome



Margus (margusja) Roo
http://margus.roo.ee
skype: margusja
+372 51 48 780

On 11/05/16 02:16, Mich Talebzadeh wrote:

*use default;
drop table if exists trips;
CREATE EXTERNAL TABLE `TRIPS`(
  `bike_nr` string,
  `duration` int,
  `start_date` string,
  `start_station` string,
  `end_station` string)
PARTITIONED BY (
  `year` int,
  `month` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY ','
  LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  '/test/text/trips';*

Re: Create external table

Reply via email to