Regards, Felipe. Apache Mahout has included SVM like one of its core
algorithms.
https://cwiki.apache.org/confluence/display/MAHOUT/Support+Vector+Machines
https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms
2013/12/12 Felipe Gutierrez felipe.o.gutier...@gmail.com
Does anybody know
Well, all depends, because many companies use Cloud Computing
platforms like Amazon EMR. Vmware, Rackscpace Cloud for Hadoop
hosting:
http://aws.amazon.com/elasticmapreduce
http://www.vmware.com/company/news/releases/vmw-mapr-hadoop-062013.html
http://bitrefinery.com/services/hadoop-hosting
Remember that in YARN, the two main responsibilities of the JobTracker is
divided in two different components:
- Resource Management by ResourceManager (this is a global component)
- Job scheduling and monitoring by the NodeManager (this is a per-node
component)
- Resource negotiation and task
Why not to tune the configurations?
Both frameworks have many areas to tune:
- Combiners, Shuffle optimization, Block size, etc
2013/6/6 sam liu samliuhad...@gmail.com
Hi Experts,
We are thinking about whether to use Yarn or not in the near future, and I
ran teragen/terasort on Yarn and
(teragen test), but worse on Reduce
phase(terasort test).
And any detailed suggestions/comments/materials on Yarn performance
tunning?
Thanks!
2013/6/7 Marcos Luis Ortiz Valmaseda marcosluis2...@gmail.com
Why not to tune the configurations?
Both frameworks have many areas to tune
Regards, Raj. To know that data that you want to process with Hadoop is
critical for this, at least an approximation of the data. I think that
Hadoop Operations is an invaluable resource for this:
- Hadoop use heavily RAM, so, the first resource that you have to consider
is to use all available
A great overview of MR2, you can find it in the Cloudera´s blog:
http://blog.cloudera.com/blog/2011/11/building-and-deploying-mr2/
http://blog.cloudera.com/blog/2012/02/mapreduce-2-0-in-hadoop-0-23/
2013/4/22 Brian C. Huffman bhuff...@etinternational.com
All,
I'm working on writing code for
Like, Aaron say, this problem is related the Linux memory manager.
You can tune it using the vm.overcommit_memory=1.
Before to do any change, read all resources first:
http://www.thegeekstuff.com/2012/02/linux-memory-swap-cache-shared-vm/
When you destroy an EC2 instance, the correct behavior is to erase all
data.
Why don't you create a service to collect the logs directly to a S3 bucket
in real-time or in a batch of 5 mins?
2013/4/18 Mark Kerzner mark.kerz...@shmsoft.com
Hi,
my clusters are on EC2, and they disappear after
all logs in one place?
Thank you,
Mark
On Thu, Apr 18, 2013 at 11:51 PM, Marcos Luis Ortiz Valmaseda
marcosluis2...@gmail.com wrote:
When you destroy an EC2 instance, the correct behavior is to erase all
data.
Why don't you create a service to collect the logs directly to a S3
bucket
You can find it here:
http://blog.cloudera.com/blog/2012/09/analyzing-twitter-data-with-hadoop/
2013/4/17 Peyman Mohajerian mohaj...@gmail.com
Apache Flume may help you for this use case. I read an article on
Cloudera's site about using Flume to pull tweets and same idea may apply
here.
In BigTop´s wiki, you can find this:
https://cwiki.apache.org/confluence/display/BIGTOP/How+to+install+Hadoop+distribution+from+Bigtop+0.5.0#HowtoinstallHadoopdistributionfromBigtop0.5.0-Ubuntu%2864bit%2Clucid%2Cprecise%2Cquantal%29
2013/3/28 Ted Dunning tdunn...@maprtech.com
Also, Canonical
2011/11/28 Stephen Boesch java...@gmail.com:
Hi
I set up a pseudo cluster according to the instructions here
http://www.cloudera.com/blog/2011/11/building-and-deploying-mr2/.
Initially the randomwriter example worked. But after a crash on the machine
and restarting the services I am
Can you put here the output logs of your NN and DN servers?
Regards
2011/11/28 Radim Kolar h...@sendmail.cz:
I have problem with reduce tasks ending like this:
Task attempt_20250441_0009_r_01_1 failed to report status for 602
seconds. Killing!
Every reduce tasks running on particular
Of course, a 32 bits OS can handle only 4 GB of memory (2^32), for
that reason, It's getting that error.
I personally recommend:
- Decreasy the value of HADOOP_HEAPSIZE
- Use a 64 bits Unix/Linux OS
Regards
2011/11/28 Wayne Wan bestirw...@gmail.com:
oh! seems your setting is out of the size
2011/10/9 Harsh J ha...@cloudera.com
Hello Meng,
On Wed, Oct 5, 2011 at 11:02 AM, Meng Mao meng...@gmail.com wrote:
Currently, we've got defined:
property
namehadoop.tmp.dir/name
value/hadoop/hadoop-metadata/cache//value
/property
In our experiments with SOLR, the
Are you consider for this to Oozie? It´s a workflow engine developed for the
Yahoo! engineers
Yahoo/oozie at GitHub
https://github.com/yahoo/oozie
Oozie at InfoQ
http://www.infoq.com/articles/introductionOozie
Oozie´s examples:
http://www.infoq.com/articles/oozieexample
Regards, Jignesh
You can start your research here:
*1235 Joe Cunningham – Visa – Large scale transaction analysis
**Cross Data Center Log Processing – Stu Hood, Rackspace
**Data Processing for Financial Services – Peter Krey and Sin Lee, JP Morgan
Chase
*http://atbrox.com/tag/finance/
Next, at
Hello, Sean, Can you provide the information how do you initialize the
HDFS service?
2011/8/25, Harsh J ha...@cloudera.com:
Hello Sean,
Welcome to the hadoop mailing lists, and thanks for asking your
question supplied with good data!
Moving this to cdh-u...@cloudera.org list as you're using
19 matches
Mail list logo