Re: Best way to load CSV file into Hive

2015-11-01 Thread Furcy Pin
Hi Vijaya, If you need some nice ETL capabilities, you may want to try https://github.com/databricks/spark-csv Among other things, spark-csv let you read the csv as is and create and insert a copy of the data into a Hive table with any format you like (Parquet, ORC, etc.) If you have a header ro

Re: Best way to load CSV file into Hive

2015-10-31 Thread Jörn Franke
You clearly need to escape those characters as for any other tool. You may want to use avro instead of csv , xml or JSON etc > On 30 Oct 2015, at 19:16, Vijaya Narayana Reddy Bhoomi Reddy > wrote: > > Hi, > > I have a CSV file which contains hunderd thousand rows and about 200+ > columns. So

Re: Best way to load CSV file into Hive

2015-10-31 Thread Peyman Mohajerian
if you find a way to escape the characters, some pre-processing step then you may find this useful: https://cwiki.apache.org/confluence/display/Hive/CSV+Serde On Fri, Oct 30, 2015 at 11:36 AM, Martin Menzel wrote: > Hi > Do have access to the data source? > If not you have first to find out if t

Re: Best way to load CSV file into Hive

2015-10-30 Thread Martin Menzel
Hi Do have access to the data source? If not you have first to find out if the data can be mapped to the columns in a unique way and for all rows. If yes maybe bindy can be a option to convert the data in a first step to tsv. I hope this helps. Regards Martin Am 30.10.2015 19:16 schrieb "Vijaya Nar

Re: Best way to load CSV file into Hive

2015-10-30 Thread Daniel Lopes
Hello, If you have file with diferents types of data, it's prefered to use other type of file like TSV, ORC or Parquet. Best, On Fri, Oct 30, 2015 at 4:16 PM, Vijaya Narayana Reddy Bhoomi Reddy < vijaya.bhoomire...@whishworks.com> wrote: > Hi, > > I have a CSV file which contains hunderd thousa

Best way to load CSV file into Hive

2015-10-30 Thread Vijaya Narayana Reddy Bhoomi Reddy
Hi, I have a CSV file which contains hunderd thousand rows and about 200+ columns. Some of the columns have free text information, which means it might contain characters like comma, colon, quotes etc with in the column content. What is the best way to load such CSV file into Hive? Another serio