Re: Force one mapper per machine (not core)?

2014-01-28 Thread Amr Shahin
-in theory this should work- Find the part of hadoop code that calculates the number of cores and patch it to always return one. [?] On Wed, Jan 29, 2014 at 3:41 AM, Keith Wiley wrote: > Yeah, it isn't, not even remotely, but thanks. > > On Jan 28, 2014, at 14:06 , Bryan Beaudreault wrote: > >

How to install from downloaded tarball from hadoop.2.2.0.tar.gz

2014-01-28 Thread Bill Bruns
Hello, I downloaded the latest stable hadoop release rfom the mirrors as a tarball: hadoop.2.2.0.tar.gz Then extracted the files with Archive Manager (on Ubuntu 12.10) There are no install docs in the top level and no documentation directory. Then, the "Getting Started" links on http://hadoop.a

AUTO: Jose Luis Mujeriego Gomez is out of the office. (returning 03/02/2014)

2014-01-28 Thread Jose Luis Mujeriego Gomez1
I am out of the office until 03/02/2014. I will be out of the office. For any urgent matter please contact Dittmar Haegele (dittmar.haeg...@de.ibm.com) or Tadhg Murphy (murp...@ie.ibm.com) I will respond to your message when I will be back. Note: This is an automated response to your message

Reducers are launched after jobClient is exited.

2014-01-28 Thread Rohith Sharma K S
Hi All , I ran job with 1 Map and 1 Reducers ( mapreduce.job.reduce.slowstart.completedmaps=1 ). Map failed ( because of error in Mapper implementation), but still Reducers are launched by applicationMaster. These reducers killed by applicationMaster while stopping RMCommunicato

Re: performance of "hadoop fs -put"

2014-01-28 Thread Harsh J
Are you calling one command per file? That's bound to be slow as it invokes a new JVM each time. On Jan 29, 2014 7:15 AM, "Jay Vyas" wrote: > Im finding that "hadoop fs -put" on a cluster is quite slow for me when i > have large amounts of small files... much slower than native file ops. > Note t

Re: BlockMissingException reading HDFS file, but the block exists and fsck shows OK

2014-01-28 Thread Peyman Mohajerian
maybe its inode exhaustion: 'df -i' command can tell you more. On Mon, Jan 27, 2014 at 12:00 PM, John Lilley wrote: > I've found that the error occurs right around a threshold where 20 tasks > attempt to open 220 files each. This is ... slightly over 4k total files > open. > > But that's the t

performance of "hadoop fs -put"

2014-01-28 Thread Jay Vyas
Im finding that "hadoop fs -put" on a cluster is quite slow for me when i have large amounts of small files... much slower than native file ops. Note that Im using the RawLocalFileSystem as the underlying backing filesystem that is being written to in this case, so HDFS isnt the issue. I see that

Re: Force one mapper per machine (not core)?

2014-01-28 Thread Keith Wiley
Yeah, it isn't, not even remotely, but thanks. On Jan 28, 2014, at 14:06 , Bryan Beaudreault wrote: > If this cluster is being used exclusively for this goal, you could just set > the mapred.tasktracker.map.tasks.maximum to 1. > > > On Tue, Jan 28, 2014 at 5:00 PM, Keith Wiley wrote: > I'm ru

Re: HDFS question

2014-01-28 Thread Ognen Duzlevski
OK - I set up a ResourceManager node with a bunch of NodeManager slaves. The set up is as follows: HDFS: machine X is a Name node, it has 16 slaves (IPs: x.x.x.200-215) Resources: machine Y is a Resource manager, it has 16 of the same slaves (IPs: x.x.x.200-215) as Node manager slaves. If I sta

Re: Force one mapper per machine (not core)?

2014-01-28 Thread Bryan Beaudreault
If this cluster is being used exclusively for this goal, you could just set the mapred.tasktracker.map.tasks.maximum to 1. On Tue, Jan 28, 2014 at 5:00 PM, Keith Wiley wrote: > I'm running a program which in the streaming layer automatically > multithreads and does so by automatically detecting

Force one mapper per machine (not core)?

2014-01-28 Thread Keith Wiley
I'm running a program which in the streaming layer automatically multithreads and does so by automatically detecting the number of cores on the machine. I realize this model is somewhat in conflict with Hadoop, but nonetheless, that's what I'm doing. Thus, for even resource utilization, it wou

Re: Hadoop-2.2.0 and Pig-0.12.0 - error "IBM_JAVA"

2014-01-28 Thread Viswanathan J
Hi Serge, I'm using Apache hadoop distribution. On Jan 29, 2014 12:54 AM, "Serge Blazhievsky" wrote: > Which hadoop distribution are you using? > > > On Tue, Jan 28, 2014 at 10:04 AM, Viswanathan J < > jayamviswanat...@gmail.com> wrote: > >> Hi Guys, >> >> I'm running hadoop 2.2.0 version with p

Re: HDFS Federation address performance issue

2014-01-28 Thread Suresh Srinivas
Response inline... On Tue, Jan 28, 2014 at 10:04 AM, Anfernee Xu wrote: > Hi, > > Based on > http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/Federation.html#Key_Benefits, > the overall performance can be improved by federation, but I'm not sure > federation address my userc

Re: Hadoop-2.2.0 and Pig-0.12.0 - error "IBM_JAVA"

2014-01-28 Thread Jay Vyas
Thanks for sharing this as we had the same problem, and We are playing with similar errors. Starting to think that there is something overly difficult about pig/hadoop 2.x deployment, related to which version of pig you use . chelsoo has helped us resolve our issue by pointing us to https://iss

Re: Configuring hadoop 2.2.0

2014-01-28 Thread Ognen Duzlevski
Furthermore, what is the difference between a ResourceManager node and a NodeManager node? Ognen On Tue, Jan 28, 2014 at 1:22 PM, Ognen Duzlevski wrote: > Hello, > > I have set up an HDFS cluster by running a name node and a bunch of data > nodes. I ran into a problem where the files are only st

Re: Hadoop-2.2.0 and Pig-0.12.0 - error "IBM_JAVA"

2014-01-28 Thread Serge Blazhievsky
Which hadoop distribution are you using? On Tue, Jan 28, 2014 at 10:04 AM, Viswanathan J wrote: > Hi Guys, > > I'm running hadoop 2.2.0 version with pig-0.12.0, when I'm trying to run > any job getting the error as below, > > *java.lang.NoSuchFieldError: IBM_JAVA* > > Is this because of Java ver

Configuring hadoop 2.2.0

2014-01-28 Thread Ognen Duzlevski
Hello, I have set up an HDFS cluster by running a name node and a bunch of data nodes. I ran into a problem where the files are only stored on the node that uses the hdfs command and was told that this is because I do not have a job tracker and task nodes set up. However, the documentation for 2.

Re: HDFS Federation address performance issue

2014-01-28 Thread Anfernee Xu
Thanks Daryn, I just want to confirm I can get performance improvement if I go with federation before I start the effort(I have to re-design my data schema so that they can have different namespace). On Tue, Jan 28, 2014 at 10:53 AM, Daryn Sharp wrote: > Hi Anfernee, > > You will achieve imp

Re: Starting... -help needed

2014-01-28 Thread Thomas Bentsen
Anything on this? I am pretty stuck here. _Not_ possible to install and run Hadoop 2.2.0 with the instructions on the website. I am sure this is not how it's supposed to be with SW from the Apache Software Foundation: frustrating! Where is the 'MapReduce tarball' in the binary download? It's me

Re: HDFS Federation address performance issue

2014-01-28 Thread Daryn Sharp
Hi Anfernee, You will achieve improved performance with federation only if you stripe files across the multiple NNs. Federation basically shares DN storage with multiple NNs with the expectation the namespace load will be distributed across the multiple NNs. If everything writes to the exact

HDFS copyToLocal and get crc option

2014-01-28 Thread Tom Brown
I am archiving a large amount of data out of my HDFS file system to a separate shared storage solution (There is not much HDFS space left in my cluster, and upgrading it is not an option right now). I understand that HDFS internally manages checksums and won't succeed if the data doesn't match the

Re: Suggestion technology/design on this usecase

2014-01-28 Thread Peyman Mohajerian
This is what a friend of mine that knows elastic search had to say about this: o Their tagcombinations are no different than say a category or similar grouping for data o A search can then be executed on the index using a mixture of search functions ยง Search on index for the tags category

Hadoop-2.2.0 and Pig-0.12.0 - error "IBM_JAVA"

2014-01-28 Thread Viswanathan J
Hi Guys, I'm running hadoop 2.2.0 version with pig-0.12.0, when I'm trying to run any job getting the error as below, *java.lang.NoSuchFieldError: IBM_JAVA* Is this because of Java version or compatibility issue with hadoop and pig. I'm using Java version - *1.6.0_31* Please help me out. -- R

HDFS Federation address performance issue

2014-01-28 Thread Anfernee Xu
Hi, Based on http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/Federation.html#Key_Benefits, the overall performance can be improved by federation, but I'm not sure federation address my usercase, could someone elaborate it? My usercase is I have one single NM and several DN, a

Re: Suggestion technology/design on this usecase

2014-01-28 Thread Naresh Yadav
i had tried on cassandra, that attempt was not convincing, but not used distributed countersi actually needed tagcombination ids in output, not the no of matches, for the given set of tags.. please illustrate a little your thought by taking my tag combination table design.. On Tue, Jan 28, 2

Re: Suggestion technology/design on this usecase

2014-01-28 Thread Peyman Mohajerian
No-sql solution with real-time counters would work, e.g. Cassandra or hbase. But I think elastic search or Solr would be simpler and can do the counting on access. There are solutions that are the combination of both these approaches. On Tue, Jan 28, 2014 at 8:51 AM, Naresh Yadav wrote: > pleas

Re: HDFS question

2014-01-28 Thread Ognen Duzlevski
There is a lesson in this by the way, I just realized I pasted my access/secret access key to the bucket in the public email. DOH, changed ;) Ognen On Tue, Jan 28, 2014 at 10:55 AM, Ognen Duzlevski wrote: > Ahh. No, I do not have a job tracker. OK - I guess I need to set one up :) > > Thanks! >

Re: HDFS question

2014-01-28 Thread Ognen Duzlevski
Ahh. No, I do not have a job tracker. OK - I guess I need to set one up :) Thanks! Ognen On Tue, Jan 28, 2014 at 10:51 AM, Bryan Beaudreault < bbeaudrea...@hubspot.com> wrote: > Do you have a jobtracker? Without a jobtracker and tasktrackers, distcp > is running in LocalRunner mode. I.E. it i

Re: Suggestion technology/design on this usecase

2014-01-28 Thread Naresh Yadav
please give suggestions on this... On Tue, Jan 28, 2014 at 3:18 PM, Naresh Yadav wrote: > Hi all, > > I am new to big data technologies and design so looking for help from java > world. > > I have concept of tags and tagcombinations. > For example U.S.A and Pen are two tags AND if they come tog

Re: HDFS question

2014-01-28 Thread Bryan Beaudreault
Do you have a jobtracker? Without a jobtracker and tasktrackers, distcp is running in LocalRunner mode. I.E. it is running a single-threaded process on the local machine. The default behavior of the DFSClient is to write data locally first, with replicas being placed off-rack then on-rack. This

HDFS question

2014-01-28 Thread Ognen Duzlevski
Hello, I am new to Hadoop and HDFS so maybe I am not understanding things properly but I have the following issue: I have set up a name node and a bunch of data nodes for HDFS. Each node contributes 1.6TB of space so the total space shown on the hdfs web front end is about 25TB. I have set the re

Re: memory management module of Namenode

2014-01-28 Thread Harsh J
Hi, The central class is FSNamesystem.java downwards. I'd advise drilling down from the NameNodeRpcServer sources at https://github.com/apache/hadoop-common/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java, for any client operat

memory management module of Namenode

2014-01-28 Thread Chetan Agrawal
I read that the namenode keeps all the metadata record in main memory right for fast access. I want to study the code where the namenode is instructed to do so i.e. to keep metadata record in main memory. where can i find the source code file for this namenode memory management? i am using githu

Suggestion technology/design on this usecase

2014-01-28 Thread Naresh Yadav
Hi all, I am new to big data technologies and design so looking for help from java world. I have concept of tags and tagcombinations. For example U.S.A and Pen are two tags AND if they come together in some definition then register a tagcombination(U.S.A-Pen) for that.. *tags *(U.S.A, Pen, Penci