Re: distcp fails with "source and target differ in block-size"

2016-05-24 Thread Dmitry Sivachenko
> On 24 May 2016, at 19:53, Chris Nauroth wrote: > > Hello Dmitry, > > To clarify, the intent of MAPREDUCE-5065 was to message the user that > using different block sizes on source and destination might cause a > failure to checksum mismatch. The message to the user

Re: distcp fails with "source and target differ in block-size"

2016-05-22 Thread Dmitry Sivachenko
> On 21 May 2016, at 09:34, Dmitry Sivachenko <trtrmi...@gmail.com> wrote: > > >> On 21 May 2016, at 02:15, Chris Nauroth <cnaur...@hortonworks.com> wrote: >> >> Hello Dmitry, >> >> MAPREDUCE-5065 has been included in these branches f

Re: distcp fails with "source and target differ in block-size"

2016-05-21 Thread Dmitry Sivachenko
> On 21 May 2016, at 02:15, Chris Nauroth wrote: > > Hello Dmitry, > > MAPREDUCE-5065 has been included in these branches for a long time. Are > you certain that you passed a dfs.blocksize equal to what was used in the > source files? Did all source files use the

distcp fails with "source and target differ in block-size"

2016-05-20 Thread Dmitry Sivachenko
Hello, When I copy files with distcp and -D dfs.blocksize=XXX (hadoop-2.7.2), it fails with "Source and target differ in block-size" error despite MAPREDUCE-5065 was committed 3 years ago. Is it possible to merge this change to 2.7 / 2.8 branches? Thanks.

nodemanager receives signal 15

2016-05-15 Thread Dmitry Sivachenko
Hello, I setup I hadoop 2.7.2 cluster on Ubuntu 16.04 with OpenJDK8. After running TeraGen from examples jar: hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar teragen 100 /user/mitya/terasort-input I see that many NodeManages are not running

Re: node remains unused after reboot

2015-09-25 Thread Dmitry Sivachenko
> On 23 сент. 2015 г., at 22:08, Naganarasimha Garla > wrote: > > Sorry for the late Reply, thought of providing you some search strings for > blackListing hence got lil delayed. > As varun mentioned it looks more like app blacklisting case. >

Re: node remains unused after reboot

2015-09-23 Thread Dmitry Sivachenko
is 1000 What does these mean? > > > Regards, > + Naga > > > From: Dmitry Sivachenko [trtrmi...@gmail.com] > Sent: Wednesday, September 23, 2015 03:57 > To: user@hadoop.apache.org > Subject: node remains unused after reboot > > Hello! > > I am using h

node remains unused after reboot

2015-09-22 Thread Dmitry Sivachenko
Hello! I am using hadoop-2.7.1. I have a large map job running (total cores available on the cluster about 3000, total tasks 35000). In the middle of this process one server reboots. After reboot, nodemanager starts successfully end registers with resource manager: 2015-09-23 01:06:24,656 INFO

rolling upgrade without downtime

2015-09-02 Thread Dmitry Sivachenko
Hello! I am trying to perform a rolling upgrade of a cluster running hadoop-2.4.1 without downtime following procedure described at http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html The first command to execute is: % hdfs dfsadmin -rollingUpgrade

Unable to build hadoop with --offline option

2015-08-25 Thread Dmitry Sivachenko
Hello! I am using the following procedure to build hadoop from sources: First, run mvn -Dmaven.repo.local=/path/to/m2 to populate /path/to/m2 directory with required artifacts; Then I always run mvn -Dmaven.repo.local=/path/to/m2 --offline so it does not download anything during the build.

Log Aggregation

2015-02-13 Thread Dmitry Sivachenko
Hello! I am using hadoop-2.4.1 in distributed mode. After a job completes, logs are aggregated to hdfs and are available via history server. Sometimes logs appear very fast after the job completes (or fails), but sometimes it takes long (10-20-30 minutes). During that period history server

Re: Writing output from streaming task without dealing with key/value

2014-09-11 Thread Dmitry Sivachenko
After streaming job outputs some data to stdout, some hadoop code receives it and splits into key/value pair before it reaches TextOutputFormat. Can anyone point me to that piece of code please? Thanks! On 11 сент. 2014 г., at 0:37, Dmitry Sivachenko trtrmi...@gmail.com wrote: On 10 сент

Re: Writing output from streaming task without dealing with key/value

2014-09-11 Thread Dmitry Sivachenko
Okay, FWIW I found the solution: https://issues.apache.org/jira/browse/MAPREDUCE-6085 Thanks for all who replied. On 11 сент. 2014 г., at 11:16, Dmitry Sivachenko trtrmi...@gmail.com wrote: After streaming job outputs some data to stdout, some hadoop code receives it and splits into key

Writing output from streaming task without dealing with key/value

2014-09-10 Thread Dmitry Sivachenko
Hello! Imagine the following common task: I want to process big text file line-by-line using streaming interface. Run unix grep command for instance. Or some other line-by-line processing, e.g. line.upper(). I copy file to HDFS. Then I run a map task on this file which reads one line,

Re: Writing output from streaming task without dealing with key/value

2014-09-10 Thread Dmitry Sivachenko
it in python. On 9/10/14, Dmitry Sivachenko trtrmi...@gmail.com wrote: Hello! Imagine the following common task: I want to process big text file line-by-line using streaming interface. Run unix grep command for instance. Or some other line-by-line processing, e.g. line.upper(). I copy

Re: Writing output from streaming task without dealing with key/value

2014-09-10 Thread Dmitry Sivachenko
On 10 сент. 2014 г., at 22:19, Rich Haase rdha...@gmail.com wrote: You can write a custom output format Any clues how can this can be done? , or you can write your mapreduce job in Java and use a NullWritable as Susheel recommended. grep (and every other *nix text processing

Re: Writing output from streaming task without dealing with key/value

2014-09-10 Thread Dmitry Sivachenko
in the original line) or not. What is the proper way to workaround that isuue? Regards, Shahab On Wed, Sep 10, 2014 at 2:28 PM, Dmitry Sivachenko trtrmi...@gmail.com wrote: On 10 сент. 2014 г., at 22:19, Rich Haase rdha...@gmail.com wrote: You can write a custom output format Any

Re: Writing output from streaming task without dealing with key/value

2014-09-10 Thread Dmitry Sivachenko
) { writeObject(value); } out.write(newline); } On Sep 10, 2014, at 1:37 PM, Dmitry Sivachenko trtrmi...@gmail.com wrote: On 10 сент. 2014 г., at 22:33, Felix Chern idry...@gmail.com wrote: Use ‘tr -s’ to stripe out tabs? $ echo -e a\t\t\tb

org.apache.hadoop.io.nativeio.NativeIO: Unable to initialize NativeIO libraries

2014-06-29 Thread Dmitry Sivachenko
I am trying hadoop-1.2.1 on FreeBSD-10 (installed from ports). I see the following exception in datanode and tasktracker's log: 2014-06-29 10:13:17,105 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library 2014-06-29 10:13:17,106 ERROR

hadoop-2.2: build error on FreeBSD

2014-06-17 Thread Dmitry Sivachenko
Hello! FreeBSD does not need -ldl when linking programs that use dlopen() (dlopen is in libc). Now I am getting the following error trying to compile hadoop-2.2.0 on FreeBSD: [exec] /usr/bin/cc -fPIC -g -Wall -O2 -D_REENTRANT -D_GNU_SOURCE -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64

Idle tasktracker eats CPU

2014-06-09 Thread Dmitry Sivachenko
Hello! I set up hadoop-1.2.1 on FreeBSD-10/stable with openjdk version 1.7.0_60. On the first glance it is doing well except one annoying thing: after executing some tasks, tasktracker process starts to eat CPU when idle. Sometimes it is 10-20% (numbers from top(1) output), sometimes it is

Re: Idle tasktracker eats CPU

2014-06-09 Thread Dmitry Sivachenko
, at 2:15, Dmitry Sivachenko trtrmi...@gmail.com wrote: Hello! I set up hadoop-1.2.1 on FreeBSD-10/stable with openjdk version 1.7.0_60. On the first glance it is doing well except one annoying thing: after executing some tasks, tasktracker process starts to eat CPU when idle. Sometimes