Re: SVM (Support Vector Machine) for Hadoop

2013-12-12 Thread Marcos Luis Ortiz Valmaseda
Regards, Felipe. Apache Mahout has included SVM like one of its core algorithms. https://cwiki.apache.org/confluence/display/MAHOUT/Support+Vector+Machines https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms 2013/12/12 Felipe Gutierrez felipe.o.gutier...@gmail.com Does anybody know

Re: Hosting Hadoop

2013-08-08 Thread Marcos Luis Ortiz Valmaseda
Well, all depends, because many companies use Cloud Computing platforms like Amazon EMR. Vmware, Rackscpace Cloud for Hadoop hosting: http://aws.amazon.com/elasticmapreduce http://www.vmware.com/company/news/releases/vmw-mapr-hadoop-062013.html http://bitrefinery.com/services/hadoop-hosting

Re: is RM require a lot of memory?

2013-08-08 Thread Marcos Luis Ortiz Valmaseda
Remember that in YARN, the two main responsibilities of the JobTracker is divided in two different components: - Resource Management by ResourceManager (this is a global component) - Job scheduling and monitoring by the NodeManager (this is a per-node component) - Resource negotiation and task

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

2013-06-06 Thread Marcos Luis Ortiz Valmaseda
Why not to tune the configurations? Both frameworks have many areas to tune: - Combiners, Shuffle optimization, Block size, etc 2013/6/6 sam liu samliuhad...@gmail.com Hi Experts, We are thinking about whether to use Yarn or not in the near future, and I ran teragen/terasort on Yarn and

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

2013-06-06 Thread Marcos Luis Ortiz Valmaseda
(teragen test), but worse on Reduce phase(terasort test). And any detailed suggestions/comments/materials on Yarn performance tunning? Thanks! 2013/6/7 Marcos Luis Ortiz Valmaseda marcosluis2...@gmail.com Why not to tune the configurations? Both frameworks have many areas to tune

Re: Hardware Selection for Hadoop

2013-04-29 Thread Marcos Luis Ortiz Valmaseda
Regards, Raj. To know that data that you want to process with Hadoop is critical for this, at least an approximation of the data. I think that Hadoop Operations is an invaluable resource for this: - Hadoop use heavily RAM, so, the first resource that you have to consider is to use all available

Re: YARN - container networking and ports

2013-04-22 Thread Marcos Luis Ortiz Valmaseda
A great overview of MR2, you can find it in the Cloudera´s blog: http://blog.cloudera.com/blog/2011/11/building-and-deploying-mr2/ http://blog.cloudera.com/blog/2012/02/mapreduce-2-0-in-hadoop-0-23/ 2013/4/22 Brian C. Huffman bhuff...@etinternational.com All, I'm working on writing code for

Re: rack awareness in hadoop

2013-04-20 Thread Marcos Luis Ortiz Valmaseda
Like, Aaron say, this problem is related the Linux memory manager. You can tune it using the vm.overcommit_memory=1. Before to do any change, read all resources first: http://www.thegeekstuff.com/2012/02/linux-memory-swap-cache-shared-vm/

Re: Best way to collect Hadoop logs across cluster

2013-04-18 Thread Marcos Luis Ortiz Valmaseda
When you destroy an EC2 instance, the correct behavior is to erase all data. Why don't you create a service to collect the logs directly to a S3 bucket in real-time or in a batch of 5 mins? 2013/4/18 Mark Kerzner mark.kerz...@shmsoft.com Hi, my clusters are on EC2, and they disappear after

Re: Best way to collect Hadoop logs across cluster

2013-04-18 Thread Marcos Luis Ortiz Valmaseda
all logs in one place? Thank you, Mark On Thu, Apr 18, 2013 at 11:51 PM, Marcos Luis Ortiz Valmaseda marcosluis2...@gmail.com wrote: When you destroy an EC2 instance, the correct behavior is to erase all data. Why don't you create a service to collect the logs directly to a S3 bucket

Re: Mapreduce jobs to download job input from across the internet

2013-04-17 Thread Marcos Luis Ortiz Valmaseda
You can find it here: http://blog.cloudera.com/blog/2012/09/analyzing-twitter-data-with-hadoop/ 2013/4/17 Peyman Mohajerian mohaj...@gmail.com Apache Flume may help you for this use case. I read an article on Cloudera's site about using Flume to pull tweets and same idea may apply here.

Re: Which hadoop installation should I use on ubuntu server?

2013-03-28 Thread Marcos Luis Ortiz Valmaseda
In BigTop´s wiki, you can find this: https://cwiki.apache.org/confluence/display/BIGTOP/How+to+install+Hadoop+distribution+from+Bigtop+0.5.0#HowtoinstallHadoopdistributionfromBigtop0.5.0-Ubuntu%2864bit%2Clucid%2Cprecise%2Cquantal%29 2013/3/28 Ted Dunning tdunn...@maprtech.com Also, Canonical

Re: ProtocolProvider errors On MRv2 Failed to use org.apache.hadoop.mapred.YarnClientProtocolProvider

2011-11-28 Thread Marcos Luis Ortiz Valmaseda
2011/11/28 Stephen Boesch java...@gmail.com: Hi   I set up a pseudo cluster  according to the instructions  here  http://www.cloudera.com/blog/2011/11/building-and-deploying-mr2/. Initially the randomwriter example worked. But after a crash on the machine and restarting the services I am

Re: Reduce tasks timeouts

2011-11-28 Thread Marcos Luis Ortiz Valmaseda
Can you put here the output logs of your NN and DN servers? Regards 2011/11/28 Radim Kolar h...@sendmail.cz: I have problem with reduce tasks ending like this: Task attempt_20250441_0009_r_01_1 failed to report status for 602 seconds. Killing! Every reduce tasks running on particular

Re: heap size problem durning mapreduce

2011-11-28 Thread Marcos Luis Ortiz Valmaseda
Of course, a 32 bits OS can handle only 4 GB of memory (2^32), for that reason, It's getting that error. I personally recommend: - Decreasy the value of HADOOP_HEAPSIZE - Use a 64 bits Unix/Linux OS Regards 2011/11/28 Wayne Wan bestirw...@gmail.com: oh! seems your setting is out of the size

Re: ways to expand hadoop.tmp.dir capacity?

2011-10-10 Thread Marcos Luis Ortiz Valmaseda
2011/10/9 Harsh J ha...@cloudera.com Hello Meng, On Wed, Oct 5, 2011 at 11:02 AM, Meng Mao meng...@gmail.com wrote: Currently, we've got defined: property namehadoop.tmp.dir/name value/hadoop/hadoop-metadata/cache//value /property In our experiments with SOLR, the

Re: output from one map reduce job as the input to another map reduce job?

2011-09-27 Thread Marcos Luis Ortiz Valmaseda
Are you consider for this to Oozie? It´s a workflow engine developed for the Yahoo! engineers Yahoo/oozie at GitHub https://github.com/yahoo/oozie Oozie at InfoQ http://www.infoq.com/articles/introductionOozie Oozie´s examples: http://www.infoq.com/articles/oozieexample

Re: NoSQL to NoSQL

2011-09-26 Thread Marcos Luis Ortiz Valmaseda
Regards, Jignesh You can start your research here: *1235 Joe Cunningham – Visa – Large scale transaction analysis **Cross Data Center Log Processing – Stu Hood, Rackspace **Data Processing for Financial Services – Peter Krey and Sin Lee, JP Morgan Chase *http://atbrox.com/tag/finance/ Next, at

Re: Error while trying to start hadoop on ubuntu lucene first time.

2011-08-25 Thread Marcos Luis Ortiz Valmaseda
Hello, Sean, Can you provide the information how do you initialize the HDFS service? 2011/8/25, Harsh J ha...@cloudera.com: Hello Sean, Welcome to the hadoop mailing lists, and thanks for asking your question supplied with good data! Moving this to cdh-u...@cloudera.org list as you're using