Hi, yes i saw somewhere in sql scripts enabled bucketing adhoc via set command - "hive.enforce.bucketing" + "hive.optimize.bucketmapjoin" . So those metada information are required? I cant just delete those 43b files?
JV On Tue, Aug 18, 2015 at 5:35 PM, Prasanth Jayachandran < [email protected]> wrote: > Are you using bucketing? If so those are empty ORC files without any data > containing only metadata information. > > > _____________________________ > From: Juraj jiv <[email protected]> > Sent: Tuesday, August 18, 2015 8:28 AM > Subject: Hive 12 - CDH 5.0.1 - many small files when using ORC table > To: <[email protected]> > > > > Hello all, > > i have question about ORC table format. We use it as for our datastore > tables but during maintenance i noticed there is many small files inside > tables which I presume doesn't contains any data. They are only 43bytes in > size and they takes around 70% of all files inside table folder. > > For example (grep 43 bytes is size and other): > > hadoop@hadoopnn:~$ hdfs dfs -du -h > /user/hive/warehouse/dwh.db/<table>/date_report_start_part=2015-07-30 | > grep "^43 " | wc -l > 7448 > hadoop@hadoopnn:~$ hdfs dfs -du -h > /user/hive/warehouse/dwh.db/<table>/date_report_start_part=2015-07-30 | > grep -v "^43 " | wc -l > 4712 > > Why is that? Why is there those many 43bytes files? > > Ascii content of the files is, which i guess is just ORC header: > 0@▒▒▒" > ▒▒ORC > > hive version: > 0.12.0+cdh5.0.1+315 1.cdh5.0.1.p0.31 CDH 5 > > Thanks > JV > > >
