Hi Deepak,
Thanks for your response. The table is not bucketed or clustered. It can be
seen below.
DROP TABLE IF EXISTS ${SCHEMA_NM}. daily_summary;
CREATE EXTERNAL TABLE ${SCHEMA_NM}.daily_summary
(
bouncer VARCHAR(12),
device_type VARCHAR(52),
visitor_type VARCHAR(10),
visit_origination_type VARCHAR(65),
visit_origination_name VARCHAR(260),
pg_domain_name VARCHAR(215),
class1_id VARCHAR(650),
class2_id VARCHAR(650),
bouncers INT,
rv_revenue DECIMAL(17,2),
visits INT,
active_page_view_time INT,
total_page_view_time BIGINT,
average_visit_duration INT,
co_conversions INT,
page_views INT,
landing_page_url VARCHAR(1332),
dt DATE
)
PARTITIONED BY (datelocal DATE)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\001'
LOCATION '${OUTPUT_PATH}/daily_summary/'
TBLPROPERTIES ('serialization.null.format'='');
MSCK REPAIR TABLE ${SCHEMA_NM}.daily_summary;
Regards,
Sujeet Singh Pardeshi
Software Specialist
SAS Research and Development (India) Pvt. Ltd.
Level 2A and Level 3, Cybercity, Magarpatta, Hadapsar Pune, Maharashtra, 411
013
off: +91-20-30418810
"When the solution is simple, God is answering…"
-----Original Message-----
From: Deepak Jaiswal <[email protected]>
Sent: 07 August 2018 PM 11:19
To: [email protected]
Subject: Re: Hive output file 000000_0
EXTERNAL
Hi Sujeet,
I am assuming that the table is bucketed? If so, then the name represents which
bucket the file belongs to as Hive creates 1 file per bucket for each operation.
In this case, the file 000003_0 belongs to bucket 3.
To always have files named 000000_0, the table must be unbucketed.
I hope it helps.
Regards,
Deepak
On 8/7/18, 1:33 AM, "Sujeet Pardeshi" <[email protected]> wrote:
Hi All,
I am doing an Insert overwrite operation through a hive external table onto
AWS S3. Hive creates a output file 000000_0 onto S3. However at times I am
noticing that it creates file with other names like 0000003_0 etc. I always
need to overwrite the existing file but with inconsistent file names I am
unable to do so. How do I force hive to always create a consistent filename
like 000000_0? Below is an example of how my code looks like, where tab_content
is a hive external table.
INSERT OVERWRITE TABLE tab_content
PARTITION(datekey)
select * from source
Regards,
Sujeet Singh Pardeshi
Software Specialist
SAS Research and Development (India) Pvt. Ltd.
Level 2A and Level 3, Cybercity, Magarpatta, Hadapsar Pune, Maharashtra,
411 013
off: +91-20-30418810
"When the solution is simple, God is answering…"