Re: hadoop not using whole disk for HDFS

2015-11-07 Thread Adaryl "Bob" Wakefield, MBA
No it’s flat out saying that that config cannot be set with anything starting with /home. Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics, LLC 913.938.6685 www.linkedin.com/in/bobwakefieldmba Twitter: @BobLovesData From: Naganarasimha G R (Naga) Sent: Thursday, November 0

Re: hadoop not using whole disk for HDFS

2015-11-07 Thread Adaryl "Bob" Wakefield, MBA
So when you say remount, what exactly am I remounting? /dev/mapper/centos-home? Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics, LLC 913.938.6685 www.linkedin.com/in/bobwakefieldmba Twitter: @BobLovesData From: Naganarasimha G R (Naga) Sent: Thursday, November 05, 2015 1

Re: hadoop not using whole disk for HDFS

2015-11-07 Thread Adaryl "Bob" Wakefield, MBA
By manually you mean actually going in with nano and editing the config file? I could do that but if Ambari won’t let you do it through the interface, isn’t it possible that trying to add the directory in home might break something? Adaryl "Bob" Wakefield, MBA Principal Mass Street

Re: hadoop not using whole disk for HDFS

2015-11-07 Thread Adaryl "Bob" Wakefield, MBA
tmpfs 16G 97M 16G 1% /run tmpfs 16G 0 16G 0% /sys/fs/cgroup /dev/sda2 494M 124M 370M 26% /boot /dev/mapper/centos-home 2.7T 33M 2.7T 1% /home Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics, LLC 913.938.6685 www.linkedin.com/in/bobwakefieldm

Re: hadoop not using whole disk for HDFS

2015-11-07 Thread Adaryl "Bob" Wakefield, MBA
though it’s in root that it’s still somehow pointing to /home? So confused. It’s the part amount mounting a drive to another folder..on the same disk. Is it kind of like how on Windows you can have more than one “drive” on a disk? Adaryl "Bob" Wakefield, MBA Principal Mass Street Anal

Re: hadoop not using whole disk for HDFS

2015-11-05 Thread Adaryl "Bob" Wakefield, MBA
Is there a maximum amount of disk space that HDFS will use? Is 100GB that max? When we’re supposed to be dealing with “big data” why is the amount of data to be held on any one box such a small number when you’ve got terabytes available? Adaryl "Bob" Wakefield, MBA Principal M

Re: hadoop not using whole disk for HDFS

2015-11-04 Thread Adaryl "Bob" Wakefield, MBA
/home directory. So I made it /hdfs/data. 2. When I restarted, the space available increased by a whopping 100GB. Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics, LLC 913.938.6685 www.linkedin.com/in/bobwakefieldmba Twitter: @BobLovesData From: Naganarasimha G R (N

Re: hadoop not using whole disk for HDFS

2015-11-04 Thread Adaryl "Bob" Wakefield, MBA
So like I can just create a new folder in the home directory like: home/hdfs/data and then set dfs.datanode.data.dir to: /hadoop/hdfs/data,home/hdfs/data Restart the node and that should do it correct? Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics, LLC 913.938.6685 www.li

Re: hadoop not using whole disk for HDFS

2015-11-04 Thread Adaryl "Bob" Wakefield, MBA
/hadoop/hdfs/data Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics, LLC 913.938.6685 www.linkedin.com/in/bobwakefieldmba Twitter: @BobLovesData From: P lva Sent: Wednesday, November 04, 2015 3:41 PM To: user@hadoop.apache.org Subject: Re: hadoop not using whole disk for

Re: hadoop not using whole disk for HDFS

2015-11-04 Thread Adaryl "Bob" Wakefield, MBA
/mapper/centos-home 2.7T 33M 2.7T 1% /home That’s from one datanode. The second one is nearly identical. I discovered that 50GB is actually a default. That seems really weird. Disk space is cheap. Why would you not just use most of the disk and why is it so hard to reset the default? Adaryl &quo

Re: hadoop not using whole disk for HDFS

2015-11-03 Thread Adaryl "Bob" Wakefield, MBA
Yeah. It has the current value of 1073741824 which is like 1.07 gig. B. From: Chris Nauroth Sent: Tuesday, November 03, 2015 11:57 AM To: user@hadoop.apache.org Subject: Re: hadoop not using whole disk for HDFS Hi Bob, Does the hdfs-site.xml configuration file contain the property dfs.datanod

hadoop not using whole disk for HDFS

2015-11-03 Thread Adaryl "Bob" Wakefield, MBA
I’ve got the Hortonworks distro running on a three node cluster. For some reason the disk available for HDFS is MUCH less than the total disk space. Both of my data nodes have 3TB hard drives. Only 100GB of that is being used for HDFS. Is it possible that I have a setting wrong somewhere? B.

Re: hdfs commands tutorial

2015-08-13 Thread Adaryl "Bob" Wakefield, MBA
@hadoop.apache.org Subject: Re: hdfs commands tutorial I am confused. The linked posted above tells you exactly that that how you interact with hdfs to do various tasks and features with examples. What else are you looking for? Regards, Shahab On Aug 14, 2015 12:14 AM, "Adaryl "Bob"

Re: hdfs commands tutorial

2015-08-13 Thread Adaryl "Bob" Wakefield, MBA
: Re: hdfs commands tutorial Did you try this . I referred to this when I was learning . http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/FileSystemShell.html Thanks and Regards, Ashish Kumar From:"Adaryl \"Bob\" Wakefield, MBA" To:

hdfs commands tutorial

2015-08-13 Thread Adaryl "Bob" Wakefield, MBA
Does anybody know of a good place to learn and practice HDFS commands? B.

Re: installation with Ambari

2015-02-12 Thread Adaryl "Bob" Wakefield, MBA
This is turning into less about Ambari and more general computing. I’m trying to set up Hadoop on a home network. Not work, not on EC2; just a simple three node cluster in my personal computer lab. My machines don’t belong to a domain. Everything I read says that in this situation, the computer

installation with Ambari

2015-02-12 Thread Adaryl "Bob" Wakefield, MBA
I’m trying to set up a Hadoop cluster but Ambari is giving me issues. At the screen where it ask me to confirm host, I get: 1. Warning that I’m not inputting a fully qualified domain name. 2. The host that the Ambari instance is actually sitting on is not even registering. When run hostname –fqd

Re: Spark vs Tez

2014-10-24 Thread Adaryl "Bob" Wakefield, MBA
should be able to call a method in java from scala but can not figure out how to turn a Comparator into a Comparator[_: wrote: Yeah, compared to something as performant as java... On 10/20/2014 10:16 PM, Adaryl "Bob" Wakefield, MBA wrote: Using an interpreted scripting lang

Re: Spark vs Tez

2014-10-20 Thread Adaryl "Bob" Wakefield, MBA
Friday, October 17, 2014, Adaryl "Bob" Wakefield, MBA wrote: “The only problem with Spark adoption is the steep learning curve of Scala , and understanding the API properly.” This is why I’m looking for reasons to avoid Spark. In my mind, it’s one more thing to have to master a

Re: Spark vs Tez

2014-10-17 Thread Adaryl "Bob" Wakefield, MBA
On Fri, Oct 17, 2014 at 11:06 AM, Adaryl "Bob" Wakefield, MBA wrote: Does anybody have any performance figures on how Spark stacks up against Tez? If you don’t have figures, does anybody have an opinion? Spark seems so popular but I’m not really seeing why. B.

Re: Spark vs Tez

2014-10-17 Thread Adaryl "Bob" Wakefield, MBA
: Spark vs Tez What aspects of Tez and Spark are you comparing? They have different purposes and thus not directly comparable, as far as I understand. Regards, Shahab On Fri, Oct 17, 2014 at 2:06 PM, Adaryl "Bob" Wakefield, MBA wrote: Does anybody have any performance figures on

Spark vs Tez

2014-10-17 Thread Adaryl "Bob" Wakefield, MBA
Does anybody have any performance figures on how Spark stacks up against Tez? If you don’t have figures, does anybody have an opinion? Spark seems so popular but I’m not really seeing why. B.

Tez and MapReduce

2014-08-31 Thread Adaryl "Bob" Wakefield, MBA
Can Tez and MapReduce live together and get along in the same cluster? B.

what do you call it when you use Tez?

2014-08-25 Thread Adaryl "Bob" Wakefield, MBA
You've got MapReduce jobs right? What is it called if, instead, you're using Tez? A Tez job? Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics 913.938.6685 www.linkedin.com/in/bobwakefieldmba Twitter: @BobLovesData

multi tenancy with cassandra

2014-08-19 Thread Adaryl "Bob" Wakefield, MBA
? I've been thinking about it like DOS. Is that an incorrect analogy? Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics 913.938.6685 www.linkedin.com/in/bobwakefieldmba Twitter: @BobLovesData

Re: Data cleansing in modern data architecture

2014-08-18 Thread Adaryl "Bob" Wakefield, MBA
messing up every SUM() function result. In the old world, it was simply a matter of going in the warehouse and blowing away those records. I think the solution we came up with is instead of dropping that data into a file, drop it into HBASE where you can do row level deletes. Adaryl &quo

Re: Started learning Hadoop. Which distribution is best for native install in pseudo distributed mode?

2014-08-13 Thread Adaryl "Bob" Wakefield, MBA
and what does what and how. Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics 913.938.6685 www.linkedin.com/in/bobwakefieldmba Twitter: @BobLovesData From: Kilaru, Sambaiah Sent: Wednesday, August 13, 2014 1:10 PM To: user@hadoop.apache.org Subject: Re: Started learning Had

Re: Started learning Hadoop. Which distribution is best for native install in pseudo distributed mode?

2014-08-12 Thread Adaryl "Bob" Wakefield, MBA
Is this up to date? http://www.mapr.com/products/product-overview/overview Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics 913.938.6685 www.linkedin.com/in/bobwakefieldmba Twitter: @BobLovesData From: Aaron Eng Sent: Tuesday, August 12, 2014 4:31 PM To: user@hadoop.

Re: Started learning Hadoop. Which distribution is best for native install in pseudo distributed mode?

2014-08-12 Thread Adaryl "Bob" Wakefield, MBA
You fell into my trap sir. I was hoping someone would clear that up. :) Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics 913.938.6685 www.linkedin.com/in/bobwakefieldmba Twitter: @BobLovesData From: Kai Voigt Sent: Tuesday, August 12, 2014 4:10 PM To: user@hadoop.apache.or

Re: Started learning Hadoop. Which distribution is best for native install in pseudo distributed mode?

2014-08-12 Thread Adaryl "Bob" Wakefield, MBA
community around it. 5. Who the heck is BigInsights? (Which should tell you something.) Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics 913.938.6685 www.linkedin.com/in/bobwakefieldmba Twitter: @BobLovesData From: mani kandan Sent: Tuesday, August 12, 2014 3:12 P

Re: Data cleansing in modern data architecture

2014-08-10 Thread Adaryl "Bob" Wakefield, MBA
Hadoop and the individual projects, but there is very little on how to actually manage data in Hadoop. Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics 913.938.6685 www.linkedin.com/in/bobwakefieldmba Twitter: @BobLovesData From: Bertrand Dechoux Sent: Sunday, August 10,

Re: Data cleansing in modern data architecture

2014-08-10 Thread Adaryl "Bob" Wakefield, MBA
. It’s better to just blow these records away, I’m just not certain what the best way to accomplish that is in the new world. Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics 913.938.6685 www.linkedin.com/in/bobwakefieldmba Twitter: @BobLovesData From: Sriram Ramachandrasek

Re: Data cleansing in modern data architecture

2014-08-09 Thread Adaryl "Bob" Wakefield, MBA
Or...as an alternative, since HBASE uses HDFS to store it’s data, can we get around the no editing file rule by dropping structured data into HBASE? That way, we have data in HDFS that can be deleted. Any real problem with that idea? Adaryl "Bob" Wakefield, MBA Principal Mass Street

Re: Data cleansing in modern data architecture

2014-08-09 Thread Adaryl "Bob" Wakefield, MBA
data load up into periodic files (days, months, etc.) that can easily be rebuilt should errors occur Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics 913.938.6685 www.linkedin.com/in/bobwakefieldmba Twitter: @BobLovesData From: Adaryl "Bob" Wakefield, MBA Sent:

Re: Data cleansing in modern data architecture

2014-08-09 Thread Adaryl "Bob" Wakefield, MBA
that their report is off/the numbers don’t look right. We investigate and find the bug in the transactional system. Question: Can we then go back into HDFS and rid ourselves of the bad records? If not, what is the recommended course of action? Adaryl "Bob" Wakefield, MBA Principal M

Re: Ecplise luna 4.4.0 and Hadoop 2.4.1

2014-08-07 Thread Adaryl "Bob" Wakefield, MBA
http://hortonworks.com/hdp/downloads/ Use the Sandbox with YouTube and lots of Google. That’s what I’m doing at least. Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics 913.938.6685 www.linkedin.com/in/bobwakefieldmba From: thejas prasad Sent: Wednesday, August 06, 2014

Re: Can anyone tell me the current typical memory specification, switch size and disk space

2014-08-01 Thread Adaryl "Bob" Wakefield, MBA
The book Hadoop Operations by Eric Sammer helped answer a lot of these questions for me. Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics 913.938.6685 www.linkedin.com/in/bobwakefieldmba -Original Message- From: Chris MacKenzie Sent: Friday, August 01, 2014

Re: PCA in HAdoop MapReduce

2014-07-30 Thread Adaryl "Bob" Wakefield, MBA
This may not be the right place to ask this question. I asked a more generic question about how to do predictive modeling on hadoop and nobody answered. It perplexes me as well how to take these machine learning concepts and implement them in a Map Reduce paradigm. Adaryl "Bob" Wake

How do you create predictive models in Hadoop?

2014-07-28 Thread Adaryl "Bob" Wakefield, MBA
I’ve been working with predictive models for three years now. My models have been single threaded and written against data in a non distributed environment. I’m not certain how to translate my skills to Hadoop. Mahout yes but I don’t know Java as I tend to work with Python (as do a lot of my col

Re: planning a cluster

2014-07-22 Thread Adaryl "Bob" Wakefield, MBA
Someone contacted me directly and suggested the book Hadoop Operations by Eric Sammer. Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics 913.938.6685 www.linkedin.com/in/bobwakefieldmba From: YIMEN YIMGA Gael Sent: Tuesday, July 22, 2014 9:48 AM To: user@hadoop.apache.or

planning a cluster

2014-07-21 Thread Adaryl "Bob" Wakefield, MBA
What is the rule for determining how many nodes should be in your initial cluster? B.

Re: Merging small files

2014-07-20 Thread Adaryl "Bob" Wakefield, MBA
like a single line item in an invoice. Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics 913.938.6685 www.linkedin.com/in/bobwakefieldmba From: Mark Kerzner Sent: Sunday, July 20, 2014 2:08 PM To: Hadoop User Subject: Re: Merging small files Bob, you don't have to

Data cleansing in modern data architecture

2014-07-20 Thread Adaryl "Bob" Wakefield, MBA
In the old world, data cleaning used to be a large part of the data warehouse load. Now that we’re working in a schemaless environment, I’m not sure where data cleansing is supposed to take place. NoSQL sounds fun because theoretically you just drop everything in but transactional systems that

Re: Merging small files

2014-07-20 Thread Adaryl "Bob" Wakefield, MBA
Hadoop in one big file, process them, then store the results of the processing in Oracle. Source file –> Oracle –> Hadoop –> Oracle Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics 913.938.6685 www.linkedin.com/in/bobwakefieldmba From: Shashidhar Rao Sent: Sunday,

Re: Merging small files

2014-07-20 Thread Adaryl "Bob" Wakefield, MBA
“Even if we kept the discussion to the mailing list's technical Hadoop usage focus, any company/organization looking to use a distro is going to have to consider the costs, support, platform, partner ecosystem, market share, company strategy, etc.” Yeah good point. Adaryl "Bob"

Re: Merging small files

2014-07-20 Thread Adaryl "Bob" Wakefield, MBA
=-N9i-YXoQBE&index=77&list=WL Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics 913.938.6685 www.linkedin.com/in/bobwakefieldmba From: Kilaru, Sambaiah Sent: Sunday, July 20, 2014 3:47 AM To: user@hadoop.apache.org Subject: Re: Merging small files This is not place to

what exactly does data in HDFS look like?

2014-07-18 Thread Adaryl "Bob" Wakefield, MBA
And by that I mean is there an HDFS file type? I feel like I’m missing something. Let’s say I have a HUGE json file that I import into HDFS. Does it retain it’s JSON format in HDFS? What if it’s just random tweets I’m streaming. Is it kind of like a normal disk where there are all kinds of files

Re: clarification on HBASE functionality

2014-07-14 Thread Adaryl "Bob" Wakefield, MBA
/book.html#arch.hdfs On Mon, Jul 14, 2014 at 2:52 PM, Adaryl "Bob" Wakefield, MBA wrote: HBASE uses HDFS to store it's data correct? B.

clarification on HBASE functionality

2014-07-14 Thread Adaryl "Bob" Wakefield, MBA
HBASE uses HDFS to store it's data correct? B.

Re: Huge text file for Hadoop Mapreduce

2014-07-07 Thread Adaryl "Bob" Wakefield, MBA
http://www.cs.cmu.edu/~./enron/ Not sure the uncompressed size but pretty sure it’s over a Gig. B. From: navaz Sent: Monday, July 07, 2014 6:22 PM To: user@hadoop.apache.org Subject: Huge text file for Hadoop Mapreduce Hi I am running basic word count Mapreduce code. I have download a fi

define "node"

2014-07-06 Thread Adaryl "Bob" Wakefield, MBA
If you have a server with more than one hard drive is that one node or n nodes where n = the number of hard drives? B.

Re: Streaming data - Avaiable tools

2014-07-04 Thread Adaryl "Bob" Wakefield, MBA
project sponsored by ASF. Look here: http://storm.apache.org On 04/07/14 12:28, Adaryl "Bob" Wakefield, MBA wrote: Storm. It’s not a part of the Apache project but it seems to be what people are using to process event data. B. From: santosh.viswanat...@accenture.com Sent: Frida

Re: Streaming data - Avaiable tools

2014-07-04 Thread Adaryl "Bob" Wakefield, MBA
Storm. It’s not a part of the Apache project but it seems to be what people are using to process event data. B. From: santosh.viswanat...@accenture.com Sent: Friday, July 04, 2014 11:25 AM To: user@hadoop.apache.org Subject: Streaming data - Avaiable tools Hello Experts, Wanted to explore

Re: Big Data tech stack (was Spark vs. Storm)

2014-07-02 Thread Adaryl "Bob" Wakefield, MBA
at a generic stack without oversimplifying to the point of serious deficiencies. There are as you say a multitude of options. You are attempting to boil them down to A vs B as opposed to A may work better under the following conditions .. 2014-07-02 13:25 GMT-07:00 Adaryl "Bob" W

Big Data tech stack (was Spark vs. Storm)

2014-07-02 Thread Adaryl "Bob" Wakefield, MBA
constructs.) So given this, you can pick the framework which is more attuned to your needs. On Wed, Jul 2, 2014 at 3:31 PM, Adaryl "Bob" Wakefield, MBA wrote: Do these two projects do essentially the same thing? Is one better than the other?

Spark vs. Storm

2014-07-02 Thread Adaryl "Bob" Wakefield, MBA
Do these two projects do essentially the same thing? Is one better than the other?

Re: The future of MapReduce

2014-07-01 Thread Adaryl "Bob" Wakefield, MBA
fies things... Just like you can evaluate all kinds of Apache ecosystems products to meet your needs, MapReduce is no longer the only kid on the bock. On Tue, Jul 1, 2014 at 3:07 PM, Adaryl "Bob" Wakefield, MBA wrote: From your answer, it sounds like you need to be able to d

Re: The future of MapReduce

2014-07-01 Thread Adaryl "Bob" Wakefield, MBA
right now". Most are looking for *real-time* fraud detection or recommendations, for example, which MapReduce is not ideal for. Marco On Tue, Jul 1, 2014 at 12:00 PM, Adaryl "Bob" Wakefield, MBA wrote: “The Mahout community decided to move its codebase onto modern data proce

The future of MapReduce

2014-07-01 Thread Adaryl "Bob" Wakefield, MBA
“The Mahout community decided to move its codebase onto modern data processing systems that offer a richer programming model and more efficient execution than Hadoop MapReduce.” Does this mean that learning MapReduce is a waste of time? Is Storm the future or are both technologies necessary? B