Hi Furcy, Thats lot of information.Thanks a lot On Feb 13, 2015 3:40 PM, "Furcy Pin" <furcy....@flaminem.com> wrote:
> Hi Sreeman, > > Unfortunately, I don't think that Hive built-in format can currently read > csv files with fields enclosed in double quotes. > More generally, for having ingested quite a lot of messy csv files myself, > I would recommend you to write a MapReduce (or Spark) job > for cleaning your csv before giving it to Hive. This is what I did. > The (other) kind of issue I've met were among : > > - File not encoded in utf-8, making special characters unreadable for > Hive > - Some lines with missing or too many columns, which could shift your > columns and ruin your stats. > - Some lines with unreadable characters (probably data corruption) > - I even got some lines with java stack traces in it > > I hope your csv is cleaner than that, and would recommend that if you have > the control on how it is generated, replace your current separator with tab > (and replace inline tabs with \t) or something like that. > > There might be some open source tools for data cleaning already out there. > I plan to release mine one day, once I've migrated it to Spark maybe, and > if my company agrees. > > If you're lazy, I heard that Dataiku Studio (which has a free version) can > do such thing, though I never used it myself. > > Hope this helps, > > Furcy > > > > 2015-02-13 7:30 GMT+01:00 Slava Markeyev <slava.marke...@upsight.com>: > >> You can use lazy simple serde with ROW FORMAT DELIMITED FIELDS TERMINATED >> BY ',' ESCAPED BY '\'. Check the DDL for details >> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL >> >> >> >> On Thu, Feb 12, 2015 at 8:19 PM, Sreeman <sreebalin...@gmail.com> wrote: >> >>> Hi All, >>> >>> How all of you are creating hive/Impala table when the CSV file has some >>> values with COMMA in between. it is like >>> >>> sree,12345,"payment made,but it is not successful" >>> >>> >>> >>> >>> >>> I know opencsv serde is there but it is not available in lower versions >>> of Hive 14.0 >>> >>> >>> >> >> >> >> -- >> >> Slava Markeyev | Engineering | Upsight >> Find me on LinkedIn <http://www.linkedin.com/in/slavamarkeyev> >> <http://www.linkedin.com/in/slavamarkeyev> >> > >