Hi, I have created the table as you said
2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1430691855979_0477_m_000000_1: Error: java.lang.RuntimeException: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177) at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180) ... 8 more Caused by: java.lang.ClassNotFoundException: au.com.bytecode.opencsv.CSVReader at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 14 more Thanks Jay On Fri, May 1, 2015 at 6:08 AM, Nitin Pawar <nitinpawar...@gmail.com> wrote: > as Akex suggested, Please use row format in your query like > CREATE TABLE DBCLOC(....) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' > and give it a try > > On Fri, May 1, 2015 at 6:33 PM, Kumar Jayapal <kjayapa...@gmail.com> > wrote: > >> 106,"2003-02-03",20,2,"A","2","2","037" >> 106,"2003-02-03",20,3,"A","2","2","037" >> 106,"2003-02-03",8,2,"A","2","2","037" >> >> >> >> >> >> >> >> >> >> >> Thanks >> Jay >> >> On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <nitinpawar...@gmail.com> >> wrote: >> >>> Jay can you give first 3 lines of your gz file >>> >>> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kjayapa...@gmail.com> >>> wrote: >>> >>>> Alex, >>>> >>>> >>>> I followed the same steps as mentioned in the site. Once I load data >>>> into table which is create below >>>> >>>> >>>> >>>> Created table CREATE TABLE raw (line STRING) PARTITIONED BY >>>> (FISCAL_YEAR smallint, FISCAL_PERIOD smallint) >>>> STORED AS TEXTFILE; >>>> >>>> and loaded it with data. >>>> >>>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO >>>> TABLE raw; >>>> >>>> >>>> >>>> when I say select * from raw it shows all null values. >>>> >>>> >>>> NULLNULLNULLNULLNULLNULLNULLNULL >>>> NULLNULLNULLNULLNULLNULLNULLNULL >>>> NULLNULLNULLNULLNULLNULLNULLNULL >>>> NULLNULLNULLNULLNULLNULLNULLNULL >>>> Why is not show showing the actual data in file. will it show once I >>>> load it to parque table? >>>> >>>> Please let me know if I am doing anything wrong. >>>> >>>> I appreciate your help. >>>> >>>> >>>> Thanks >>>> jay >>>> >>>> >>>> >>>> Thank you very much for you help Alex, >>>> >>>> >>>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov < >>>> apivova...@gmail.com> wrote: >>>> >>>>> 1. Create external textfile hive table pointing to /extract/DBCLOC >>>>> and specify CSVSerde >>>>> >>>>> if using hive-0.14 and newer use this >>>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde >>>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde >>>>> >>>>> You do not even need to unzgip the file. hive automatically unzgip >>>>> data on select. >>>>> >>>>> 2. run simple query to load data >>>>> insert overwrite table <orc_table> >>>>> select * from <csv_table> >>>>> >>>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kjayapa...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hello All, >>>>>> >>>>>> >>>>>> I have this table >>>>>> >>>>>> >>>>>> CREATE TABLE DBCLOC( >>>>>> BLwhse int COMMENT 'DECIMAL(5,0) Whse', >>>>>> BLsdat string COMMENT 'DATE Sales Date', >>>>>> BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#', >>>>>> BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#', >>>>>> BLscnr string COMMENT 'CHAR(1) Scenario', >>>>>> BLareq string COMMENT 'CHAR(1) Act Requested', >>>>>> BLatak string COMMENT 'CHAR(1) Act Taken', >>>>>> BLmsgc string COMMENT 'CHAR(3) Msg Code') >>>>>> PARTITIONED BY (FSCAL_YEAR smallint, FSCAL_PERIOD smallint) >>>>>> STORED AS PARQUET; >>>>>> >>>>>> have to load from hdfs location /extract/DBCLOC/DBCL0301P.csv.gz to >>>>>> the table above >>>>>> >>>>>> >>>>>> Can any one tell me what is the most efficient way of doing it. >>>>>> >>>>>> >>>>>> Thanks >>>>>> Jay >>>>>> >>>>> >>>>> >>>> >>> >>> >>> -- >>> Nitin Pawar >>> >> >> > > > -- > Nitin Pawar >