Hi Alex,

 

Am I correct that the source of data resides in a relational table and that 
table has all the data already (the golden source) sent to both instances of 
Hive? Is the data in Hive added incrementally daily with “operation timestamp”  
for each record? Also do you have a unique identifier for each row in each 
table? 

 

HTH

 

Mich Talebzadeh

 

http://talebzadehmich.wordpress.com

 

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", 
ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 
978-0-9759693-0-4

Publications due shortly:

Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and 
Coherence Cache

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one 
out shortly

 

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries 
or their employees, unless expressly so stated. It is the responsibility of the 
recipient to ensure that this email is virus free, therefore neither Peridale 
Ltd, its subsidiaries nor their employees accept any responsibility.

 

From: Alexander Pivovarov [mailto:apivova...@gmail.com] 
Sent: 27 April 2015 21:27
To: user@hive.apache.org
Subject: How to compare data in two tables?

 

Hi Everyone

Lets say I have hive table in 2 datacenters. Table format can be textfile or 
Orc.

There is scoop job running every day which adds data to the table.

Each datacenter has its own instance of scoop job.

In Ideal case scenario the data in these two table should be the same.


The same means that row count is the same and tables contain the same rows.

However row order can be different. number of files and their size also can be 
different.

 

Is there a way to scan the table and get some hashcode which can be used to 
compare tables?

Thank you

Alex

Reply via email to