date:20140809

Re: Data cleansing in modern data architecture

2014-08-09 Thread Sriram Ramachandrasekaran

While, I may not have enough context to your entire processing pipeline, here are my thoughts. 1. It's always useful to have raw data, irrespective of if it was right or wrong. The way to look at it is, it's the source of truth at timestamp t. 2. Note that, You only know that the data at timestamp

Re: Data cleansing in modern data architecture

2014-08-09 Thread Adaryl "Bob" Wakefield, MBA

Or...as an alternative, since HBASE uses HDFS to store it’s data, can we get around the no editing file rule by dropping structured data into HBASE? That way, we have data in HDFS that can be deleted. Any real problem with that idea? Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics 91

Re: Data cleansing in modern data architecture

2014-08-09 Thread Adaryl "Bob" Wakefield, MBA

Answer: No we can’t get rid of bad records. We have to go back and rebuild the entire file. We can’t edit records but we can get rid of entire files right? This would suggest that appending data to files isn’t that great of an idea. It sounds like it would be more appropriate to cut a hadoop dat

Can anyone help me resolve this Error: unable to create new native thread

2014-08-09 Thread Chris MacKenzie

Hi, I¹ve scrabbled around looking for a fix for a while and have set the soft ulimit size to 13172. I¹m using Hadoop 2.4.1 Thanks in advance, Chris MacKenzie telephone: 0131 332 6967 email: stu...@chrismackenziephotography.co.uk corporate: www.chrismackenziephotography.co.uk

Re: Data cleansing in modern data architecture

2014-08-09 Thread Adaryl "Bob" Wakefield, MBA

I’m sorry but I have to revisit this again. Going through the reply below I realized that I didn’t quite get my question answered. Let me be more explicit with the scenario. There is a bug in the transactional system. The data gets written to HDFS where it winds up in Hive. Somebody notices that

Re: Data cleansing in modern data architecture

Re: Data cleansing in modern data architecture

Re: Data cleansing in modern data architecture

Top K words problem

Can anyone help me resolve this Error: unable to create new native thread

Re: Data cleansing in modern data architecture

6 matches

Site Navigation

Mail list logo

Footer information