Thanks Yanbo. I my doubt is got clarified now.
On Fri, May 3, 2013 at 2:38 PM, Yanbo Liang <yanboha...@gmail.com> wrote: > load data to different partitions parallel is OK, because it equivalent to > write to different file on HDFS > > > 2013/5/3 selva <selvai...@gmail.com> > >> Hi All, >> >> I need to load a month worth of processed data into a hive table. Table >> have 10 partitions. Each day have many files to load and each file is >> taking two seconds(constantly) and i have ~3000 files). So it will take >> days to complete for 30 days worth of data. >> >> I planned to load every day data parallel into respective partition so >> that i can complete it short time. >> >> But i need clarrification before proceeding it. >> >> Question: >> >> 1. Will it cause data loss/corruption by loading parallel in different >> partition of same hive table ? >> >> For example, Assume i am doing like below, >> >> Table : processedlogs >> Partition : logdate >> >> Running below commands parallel, >> LOAD DATA INPATH '/logs/processed/2013-04-01' OVERWRITE INTO TABLE >> processedlogs PARTITION(logdate='2013-04-01'); >> LOAD DATA INPATH '/logs/processed/2013-04-02' OVERWRITE INTO TABLE >> processedlogs PARTITION(logdate='2013-04-02'); >> LOAD DATA INPATH '/logs/processed/2013-04-02' OVERWRITE INTO TABLE >> processedlogs PARTITION(logdate='2013-04-03'); >> LOAD DATA INPATH '/logs/processed/2013-04-02' OVERWRITE INTO TABLE >> processedlogs PARTITION(logdate='2013-04-04'); >> ..... >> LOAD DATA INPATH '/logs/processed/2013-04-30' OVERWRITE INTO TABLE >> processedlogs PARTITION(logdate='2013-04-30'); >> >> Thanks >> Selva >> >> >> >> >> >> > -- -- selva