Hi I was playing with external table in hive and it got me confused as concept of external as explain in documentation and practical implementation is not going correctly.
Hive Version : 0.7 CREATE EXTERNAL TABLE IF NOT EXISTS learn.crime_external_native ( Orig_State String, TypeofCrime String, Crime String, Year int, Count int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION 'hdfs://localhost:8020/user/srcdata' CREATE EXTERNAL TABLE IF NOT EXISTS learn.crime_external_native_1 ( like learn.crime_external_native LOCATION '/user/crime_external_native_1' LOAD DATA INPATH '/user/CrimeHDFS2.csv' INTO TABLE learn.crime_external_native_1; Gives Error as "Path is not legal '/user/CrimeHDFS2.csv': Move from hdfs://0.0.0.0/user/CrimeHDFS2.csv to hdfs://localhost:8020/user/crime_external_native_1 is not valid. Please check that values for params "default.fs.name" and "hive.metastore.warehosue.dir" do not conflict" What am I doing wrong here? Whereas when i load data from local file to external table it WORKS! LOAD DATA LOCAL INPATH '/home/cloudera/CrimeHDFS2.csv' INTO TABLE learn.crime_external_native_1; >From above I am making following assumptions. Is it correct. * While creating a EXTERNAL table in hive you have specify directory on HDFS (not the data file name) which contains the source data files. * ROW FORMAT specified should match with data files contained in specified directory of external table. SO that even new file gets added. you can query it directly. No need to use LOAD command. * When you create a external table and if you load data from local file. It copies file to external table location and when you drop this table it removed directory and data file (I feel it contradict with the external table concepts). Am i correct! Thanks, Kuldeep
