*To:* user@hadoop.apache.org
*Subject:* Re: Data cleansing in modern data architecture
Hi Bob,
the answer to your original question depends entirely on the procedures
and conventions set forth for your data warehouse. So only you can answer
it.
If you're asking for best practices, it still
Hi Bob,
the answer to your original question depends entirely on the procedures and
conventions set forth for your data warehouse. So only you can answer it.
If you're asking for best practices, it still depends:
- How large are your files?
- Have you enough free space for recoding?
- Are you
: Saturday, August 09, 2014 11:55 PM
To: user@hadoop.apache.org
Subject: Re: Data cleansing in modern data architecture
While, I may not have enough context to your entire processing pipeline, here
are my thoughts.
1. It's always useful to have raw data, irrespective of if it was right or
wrong
...@gmail.com
*Sent:* Saturday, August 09, 2014 11:55 PM
*To:* user@hadoop.apache.org
*Subject:* Re: Data cleansing in modern data architecture
While, I may not have enough context to your entire processing pipeline,
here are my thoughts.
1. It's always useful to have raw data, irrespective
/in/bobwakefieldmba
Twitter: @BobLovesData
*From:* Sriram Ramachandrasekaran sri.ram...@gmail.com
*Sent:* Saturday, August 09, 2014 11:55 PM
*To:* user@hadoop.apache.org
*Subject:* Re: Data cleansing in modern data architecture
While, I may not have enough context to your entire processing
pipeline
To: user@hadoop.apache.org
Subject: Re: Data cleansing in modern data architecture
Well, keeping bad data has its use too. I assume you know about temporal
database.
Back to your use case, if you only need to remove a few records from HDFS
files, the easiest might be during the reading
Analytics
913.938.6685
www.linkedin.com/in/bobwakefieldmba
From: Shahab Yunus
Sent: Sunday, July 20, 2014 4:20 PM
To: user@hadoop.apache.org
Subject: Re: Data cleansing in modern data architecture
I am assuming you meant the batch jobs that are/were used in old world for data
cleansing.
As far as I
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData
From: Adaryl Bob Wakefield, MBA
Sent: Saturday, August 09, 2014 8:55 PM
To: user@hadoop.apache.org
Subject: Re: Data cleansing in modern data architecture
Answer: No we can’t get rid of bad records. We have to go back
cleansing in modern data architecture
Answer: No we can’t get rid of bad records. We have to go back and
rebuild the entire file. We can’t edit records but we can get rid of entire
files right? This would suggest that appending data to files isn’t that
great of an idea. It sounds like it would
In the old world, data cleaning used to be a large part of the data warehouse
load. Now that we’re working in a schemaless environment, I’m not sure where
data cleansing is supposed to take place. NoSQL sounds fun because
theoretically you just drop everything in but transactional systems that
I am assuming you meant the batch jobs that are/were used in old world for
data cleansing.
As far as I understand there is no hard and fast rule for it and it depends
functional and system requirements of the usecase.
It is also dependent on the technology being used and how it manages
11 matches
Mail list logo