Re: Missing records from HDFS

2013-11-22 Thread ZORAIDA HIDALGO SANCHEZ
Thanks for your response Azuryy. My hadoop version: 2.0.0-cdh4.3.0 InputFormat: a custom class that extends from FileInputFormat(csv input format) These fiels are under the same directory, different files. My input path is configured using oozie throughout the propertie mapred.input.dir. Same

RE: Any reference for upgrade hadoop from 1.x to 2.2

2013-11-22 Thread Nirmal Kumar
Hi All, I am also looking into migrating\upgrading from Apache Hadoop 1.x to Apache Hadoop 2.x. I didn’t find any doc\guide\blogs for the same. Although there are guides\docs for the CDH and HDP migration\upgradation from Hadoop 1.x to Hadoop 2.x Would referring those be of some use? I am

Re: Any reference for upgrade hadoop from 1.x to 2.2

2013-11-22 Thread Sandy Ryza
For MapReduce and YARN, we recently published a couple blog posts on migrating: http://blog.cloudera.com/blog/2013/11/migrating-to-mapreduce-2-on-yarn-for-users/ http://blog.cloudera.com/blog/2013/11/migrating-to-mapreduce-2-on-yarn-for-operators/ hope that helps, Sandy On Fri, Nov 22, 2013 at

Unsubscribe

2013-11-22 Thread Thomas Bailet

Re: Difference between clustering and classification in hadoop

2013-11-22 Thread unmesha sreeveni
Thank you Mirko On Fri, Nov 22, 2013 at 2:11 PM, Mirko Kämpf mirko.kae...@gmail.com wrote: ... it depends on the implementation. ;-) Mahout offers both: Mahout in action http://manning.com/owen/ And is more ... http://en.wikipedia.org/wiki/Cluster_analysis

Re: HDFS upgrade problem of fsImage

2013-11-22 Thread Joshi, Rekha
Yes realized that and I see your point :-) However seems like some fs inconsistency present, did you attempt rollback/finalizeUpgrade and check? For that error, FSImage.java/code finds a previous fs state - // Upgrade is allowed only if there are // no previous fs states in any of the

Re: HDFS upgrade problem of fsImage

2013-11-22 Thread Azuryy Yu
Thanks Joshi, Maybe I pasted wrong log messages. please looked at here for the real story. https://issues.apache.org/jira/browse/HDFS-5550 On Fri, Nov 22, 2013 at 6:25 PM, Joshi, Rekha rekha_jo...@intuit.comwrote: Yes realized that and I see your point :-) However seems like some fs

Re: Missing records from HDFS

2013-11-22 Thread ZORAIDA HIDALGO SANCHEZ
One more thing, if we split the files then all the records are processed. Files are of 70,5MB. Thanks, Zoraida.- De: zoraida zora...@tid.esmailto:zora...@tid.es Fecha: viernes, 22 de noviembre de 2013 08:59 Para: user@hadoop.apache.orgmailto:user@hadoop.apache.org

Re: Missing records from HDFS

2013-11-22 Thread Azuryy Yu
I do think this is because of your RecorderReader, can you paste your code here? and give a piece of data example. please use pastebin if you want. On Fri, Nov 22, 2013 at 7:16 PM, ZORAIDA HIDALGO SANCHEZ zora...@tid.eswrote: One more thing, if we split the files then all the records are

Re: Missing records from HDFS

2013-11-22 Thread ZORAIDA HIDALGO SANCHEZ
Sure, our FileInputFormat implementation: public class CVSInputFormat extends FileInputFormatFileValidatorDescriptor, Text { /* * (non-Javadoc) * * @see * org.apache.hadoop.mapreduce.InputFormat#createRecordReader(org.apache *

Re: Problem sending metrics to multiple targets

2013-11-22 Thread Ivan Tretyakov
We investigated the problem and found root cause. Metrics2 framework uses different from first version config parser (Metrics2 uses apache-commons, Metrics uses hadoop's). org.apache.hadoop.metrics2.sink.ganglia.AbstractGangliaSink uses commas as separators by default. So when we provide list of

Re: Any reference for upgrade hadoop from 1.x to 2.2

2013-11-22 Thread Robert Dyer
Thanks Sandy! These seem helpful! MapReduce cluster configuration options have been split into YARN configuration options, which go in yarn-site.xml; and MapReduce configuration options, which go in mapred-site.xml. Many have been given new names to reflect the shift. ... *We’ll follow up with a

Windows - Separating etc (config) from bin

2013-11-22 Thread Ian Jackson
It would be nice if HADOOP_CONF_DIR could be set in the environment like YARN_CONF_DIR. This could be done in lib-exec\hadoop_config.cmd by setting HADOOP_CONF_DIR conditionally. if not defined HADOOP_CONF_DIR ( set HADOOP_CONF_DIR=%HADOOP_HOME%\etc\hadoop ) A similar change might be done in

Heterogeneous Cluster

2013-11-22 Thread Ian Jackson
Has anyone set up a Heterogeneous cluster, some Windows nodes and Linux nodes?

Re: Difference between clustering and classification in hadoop

2013-11-22 Thread unmesha sreeveni
Thanks Devin :) That was a nice explanation. On Fri, Nov 22, 2013 at 6:20 PM, Devin Suiter RDX dsui...@rdx.com wrote: They are both for machine learning. Classification is known as supervised learning where you feed the engine data of known patterns and instruct it what are the key nodes.

Re: Difference between clustering and classification in hadoop

2013-11-22 Thread unmesha sreeveni
when i gone through different Repos for spam data i am only getting MB files . To check in hadoop we need a large file right. I need to test my hadoop svm implementation.I gone through http://archive.ics.uci.edu/ml/machine-learning-databases/spambase/ .But the dataset is of only 700KB or

Re: Missing records from HDFS

2013-11-22 Thread Azuryy Yu
There is problem in the 'initialize', generally, we cannot think split.start as the real start, because FileSplit cannot split on the end of the line accurately, so you need to adjust the start in the 'initialize' to the start of one line if start is not equal to '0'. also, end = start +

Decision Tree - Help

2013-11-22 Thread unmesha sreeveni
Can we implement Decision Tree as Mapreduce Job ? What all algorithms can be converted into MapReduce Job? Thanks Unmesha