Hi Vijaya,
If you need some nice ETL capabilities, you may want to try
https://github.com/databricks/spark-csv
Among other things, spark-csv let you read the csv as is and create and
insert a copy of the
data into a Hive table with any format you like (Parquet, ORC, etc.)
If you have a header ro
You clearly need to escape those characters as for any other tool. You may want
to use avro instead of csv , xml or JSON etc
> On 30 Oct 2015, at 19:16, Vijaya Narayana Reddy Bhoomi Reddy
> wrote:
>
> Hi,
>
> I have a CSV file which contains hunderd thousand rows and about 200+
> columns. So
if you find a way to escape the characters, some pre-processing step then
you may find this useful:
https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
On Fri, Oct 30, 2015 at 11:36 AM, Martin Menzel
wrote:
> Hi
> Do have access to the data source?
> If not you have first to find out if t
Hi
Do have access to the data source?
If not you have first to find out if the data can be mapped to the columns
in a unique way and for all rows. If yes maybe bindy can be a option to
convert the data in a first step to tsv.
I hope this helps.
Regards
Martin
Am 30.10.2015 19:16 schrieb "Vijaya Nar
Hello,
If you have file with diferents types of data, it's prefered to use other
type of file like TSV, ORC or Parquet.
Best,
On Fri, Oct 30, 2015 at 4:16 PM, Vijaya Narayana Reddy Bhoomi Reddy <
vijaya.bhoomire...@whishworks.com> wrote:
> Hi,
>
> I have a CSV file which contains hunderd thousa
Hi,
I have a CSV file which contains hunderd thousand rows and about 200+
columns. Some of the columns have free text information, which means it
might contain characters like comma, colon, quotes etc with in the column
content.
What is the best way to load such CSV file into Hive?
Another serio