RE: Folder not created using Hadoop Mapreduce code

2013-11-14 Thread java8964 java8964
Maybe just a silly guess, did you close your Writer? Yong Date: Thu, 14 Nov 2013 12:47:13 +0530 Subject: Re: Folder not created using Hadoop Mapreduce code From: unmeshab...@gmail.com To: user@hadoop.apache.org @rab ra: ys using filesystem s mkdir() we can create folders and we can also create i

RE: Why the reducer's input group count is higher than my GroupComparator implementation

2013-10-30 Thread java8964 java8964
. Date: Tue, 29 Oct 2013 08:57:32 +0100 Subject: Re: Why the reducer's input group count is higher than my GroupComparator implementation From: drdwi...@gmail.com To: user@hadoop.apache.org Did you overwrite the partitioner as well? 2013/10/29 java8964 java8964 Hi, I have a stran

RE: Why the reducer's input group count is higher than my GroupComparator implementation

2013-10-29 Thread java8964 java8964
than 11. Date: Tue, 29 Oct 2013 08:57:32 +0100 Subject: Re: Why the reducer's input group count is higher than my GroupComparator implementation From: drdwi...@gmail.com To: user@hadoop.apache.org Did you overwrite the partitioner as well? 2013/10/29 java8964 java8964 Hi, I have

Why the reducer's input group count is higher than my GroupComparator implementation

2013-10-28 Thread java8964 java8964
Hi, I have a strange question related to my secondary sort implementation in the MR job.Currently I need to support 2nd sort in one of my MR job. I implemented my custom WritableComparable like following: public class MyPartitionKey implements WritableComparable { String type;long id1;

RE: Mapreduce outputs to a different cluster?

2013-10-26 Thread java8964 java8964
has url "hdfs://machine.domain:8080" and data folder "/tmp/myfolder", what should I specify as the output path for MR job? Thanks On Thursday, October 24, 2013 5:31 PM, java8964 java8964 wrote: Just specify the output location using the URI to another cluster. As long as the

RE: Mapreduce outputs to a different cluster?

2013-10-24 Thread java8964 java8964
Just specify the output location using the URI to another cluster. As long as the network is accessible, you should be fine. Yong Date: Thu, 24 Oct 2013 15:28:27 -0700 From: myx...@yahoo.com Subject: Mapreduce outputs to a different cluster? To: user@hadoop.apache.org The scenario is: I run mapr

RE: enable snappy on hadoop 1.1.1

2013-10-07 Thread java8964 java8964
snappy on hadoop 1.1.1 whats the output of ldd on that lib? Does it link properly? You should compile natives for your platforms as the packaged ones may not link properly. On Sat, Oct 5, 2013 at 2:37 AM, java8964 java8964 wrote: I kind of read the hadoop 1.1.1 source code for this,

RE: enable snappy on hadoop 1.1.1

2013-10-04 Thread java8964 java8964
I kind of read the hadoop 1.1.1 source code for this, it is very strange for me now. >From the error, it looks like runtime JVM cannot find the native method of >org/apache/hadoop/io/compress/snappy/SnappyCompressor.compressBytesDirect()I, >that my guess from the error message, but from the log,

enable snappy on hadoop 1.1.1

2013-10-04 Thread java8964 java8964
Hi, I am using hadoop 1.1.1. I want to test to see the snappy compression with hadoop, but I have some problems to make it work on my Linux environment. I am using opensuse 12.3 x86_64. First, when I tried to enable snappy in hadoop 1.1.1 by: conf.setBoolean("mapred.compress.map.outp

Will different files in HDFS trigger different mapper

2013-10-02 Thread java8964 java8964
Hi, I have a question related to how the mapper generated for the input files from HDFS. I understand the split and blocks concept in the HDFS, but my originally understanding is that one mapper will only process data from one file in HDFS, no matter how small this file it is. Is that correct? T

RE: File formats in Hadoop: Sequence files vs AVRO vs RC vs ORC

2013-09-30 Thread java8964 java8964
I am also thinking about this for my current project, so here I share some of my thoughts, but maybe some of them are not correct. 1) In my previous projects years ago, we store a lot of data as plain text, as at that time, people thinks the Big data can store all the data, no need to worry abou

RE: All datanodes are bad IOException when trying to implement multithreading serialization

2013-09-30 Thread java8964 java8964
Not exactly know what you are trying to do, but it seems like the memory is your bottle neck, and you think you have enough CPU resource, so you want to use multi-thread to utilize CPU resources? You can start multi-threads in your mapper, as if you think your mapper logic is very cpu intensive

RE: Extending DFSInputStream class

2013-09-26 Thread java8964 java8964
Just curious, any reason you don't want to use the DFSDataInputStream? Yong Date: Thu, 26 Sep 2013 16:46:00 +0200 Subject: Extending DFSInputStream class From: tmp5...@gmail.com To: user@hadoop.apache.org Hi I would like to wrap DFSInputStream by extension. However it seems that the DFSInputStr

Hadoop sequence file's benefits

2013-09-17 Thread java8964 java8964
Hi, I have a question related to sequence file. I wonder why I should use it under what kind of circumstance? Let's say if I have a csv file, I can store that directly in HDFS. But if I do know that the first 2 fields are some kind of key, and most of MR jobs will query on that key, will it make

RE: MAP_INPUT_RECORDS counter in the reducer

2013-09-17 Thread java8964 java8964
Or you do the calculation in the reducer close() method, even though I am not sure in the reducer you can get the Mapper's count. But even you can't, here is what can do:1) Save the JobConf reference in your Mapper conf metehod2) Store the Map_INPUT_RECORDS counter in the configuration object as

Looking for some advice

2013-09-14 Thread java8964 java8964
Hi, I currently have a project to process the data using MR. I have some thoughts about it, and am looking for some advices if anyone had any feedback. Currently in this project, I have lot of events data related to email tracking coming into the HDFC. So the events are the data for email trackin

RE: help!!!,what is happened with my project?

2013-09-11 Thread java8964 java8964
Did you do a hadoop version upgrade before this error happened? Yong Date: Wed, 11 Sep 2013 16:57:54 +0800 From: heya...@jiandan100.cn To: user@hadoop.apache.org CC: user-unsubscr...@hadoop.apache.org Subject: help!!!,what is happened with my project? Hi: Today when I

RE: distcp failed "Copy failed: ENOENT: No such file or directory"

2013-09-06 Thread java8964 java8964
The error doesn't mean the file not existed in the HDFS, but it means local disk. If you read the error stack trace: at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:581) It indicates the error happened on Local file system. If you try to copy data from an existing

RE: secondary sort - number of reducers

2013-08-30 Thread java8964 java8964
Well, The reducers normally will take much longer than the mappers stage, because the copy/shuffle/sort all happened at this time, and they are the hard part. But before we simply say it is part of life, you need to dig into more of your MR jobs to find out if you can make it faster. You are the

RE: secondary sort - number of reducers

2013-08-29 Thread java8964 java8964
The method getPartition() needs to return a positive number. Simply use hashCode() method is not enough. See the Hadoop HashPartitioner implementation: return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks; When I first read this code, I always wonder why not use Math.abs? Is ( & I

RE: copy files from hdfs to local fs

2013-08-29 Thread java8964 java8964
What's wrong by using old Unix pipe? hadoop fs -cat /user/input/foo.txt | head -100 > local_file Date: Thu, 29 Aug 2013 13:50:37 -0700 Subject: Re: copy files from hdfs to local fs From: chengi.liu...@gmail.com To: user@hadoop.apache.org tail will work as well.. ??? but i want to extract just (sa

RE: Jar issue

2013-08-27 Thread java8964 java8964
I am not sure the original suggestion will work for your case. My understanding is the you want to use some API, only exists in slf4j versiobn 1.6.4, but this library with different version already existed in your hadoop environment, which is quite possible. To change the maven build of the appli

RE: Partitioner vs GroupComparator

2013-08-23 Thread java8964 java8964
As Harsh said, sometime you want to do the 2nd sort, but for MR, it can only be sorted by key, not by value. A lot of time, you want to the reducer output sort by a field, but only do the sort within a group, kind of like 'windowing sort' in relation DB SQL. For example, if you have a data about

RE: running map tasks in remote node

2013-08-23 Thread java8964 java8964
lave nodes, it works fine. I am not able to figure out how to fix this and the reason for the error. I am not understand why it complains about the input directory is not present. As far as I know, slave nodes get a map and map method contains contents of the input file. This should be fine f

RE: running map tasks in remote node

2013-08-22 Thread java8964 java8964
If you don't plan to use HDFS, what kind of sharing file system you are going to use between cluster? NFS?For what you want to do, even though it doesn't make too much sense, but you need to the first problem as the shared file system. Second, if you want to process the files file by file, inste

java.io.IOException: Task process exit with nonzero status of -1

2013-08-15 Thread java8964 java8964
Hi, This is a 4 node hadoop cluster running on CentOS 6.3 with Oracle JDK (64bit) 1.6.0_43. Each node has 32G memory, with max 8 mapper tasks and 4 reducer tasks being set. The hadoop version is 1.0.4. This is setup on Datastax DES 3.0.2, which is using Cassandra CFS as underline DFS, instead o

RE: Encryption in HDFS

2013-02-26 Thread java8964 java8964
I am also interested in your research. Can you share some insight about the following questions? 1) When you use CompressionCodec, can the encrypted file split? From my understand, there is no encrypt way can make the file decryption individually by block, right? For example, if I have 1G file

RE: Question related to Decompressor interface

2013-02-12 Thread java8964 java8964
Can someone share some idea what the Hadoop source code of class org.apache.hadoop.io.compress.BlockDecompressorStream, method rawReadInt() is trying to do here? There is a comment in the code this this method shouldn't return negative number, but in my testing file, it contains the following b

RE: Loader for small files

2013-02-12 Thread java8964 java8964
Hi, Davie: I am not sure I understand this suggestion. Why smaller block size will help this performance issue? >From what the original question about, it looks like the performance problem >is due to that there are a lot of small files, and each file will run in its >own mapper. As hadoop nee

RE: number input files to mapreduce job

2013-02-12 Thread java8964 java8964
I don't think you can get list of all input files in the mapper, but what you can get is the current file's information. In the context object reference, you can get the InputSplit(), which should give you all the information you want of the current input file. http://hadoop.apache.org/docs/r2.0

RE: Confused about splitting

2013-02-10 Thread java8964 java8964
Hi, Chris: Here is my understand about the file split and Data block. The HDFS will store your file into multi data blocks, each block will be 64M or 128M depend on your setting. Of course, the file could contain multi records. So the boundary of the record won't match with the block boundary (i

RE: Question related to Decompressor interface

2013-02-10 Thread java8964 java8964
e can convert any existing Writable into an encrypted form. Dave From: java8964 java8964 [mailto:java8...@hotmail.com] Sent: Sunday, February 10, 2013 3:50 AM To: user@hadoop.apache.org Subject: Question related to Decompressor interface HI, Currently I am researching about options of encry

RE: What to do/check/debug/root cause analysis when jobtracker hang

2013-02-06 Thread java8964 java8964
Our cluster on cdh3u4 has the same problem. I think it is caused by some bugs in JobTracker. I believe Cloudera knows about this issue. After upgrading to cdh3u5, we havn't faced this issue yet, but I am not sure if it is confirmed to fix in the CDH3U5. Yong > Date: Mon, 4 Feb 2013 15:21:18 -08

RE: Profiling the Mapper using hprof on Hadoop 0.20.205

2013-02-06 Thread java8964 java8964
What range you gave it for mapred.task.profile.maps? And you sure your mapper will invoke the methods you expect in the traces? Yong Date: Wed, 6 Feb 2013 23:50:08 +0200 Subject: Profiling the Mapper using hprof on Hadoop 0.20.205 From: yaron.go...@gmail.com To: user@hadoop.apache.org Hi,I wish

RE: Cumulative value using mapreduce

2012-10-05 Thread java8964 java8964
Ted comments on performance are spot on. Regards Bertrand On Thu, Oct 4, 2012 at 9:02 PM, java8964 java8964 wrote: I did the cumulative sum in the HIVE UDF, as one o

RE: Cumulative value using mapreduce

2012-10-04 Thread java8964 java8964
I did the cumulative sum in the HIVE UDF, as one of the project for my employer. 1) You need to decide the grouping elements for your cumulative. For example, an account, a department etc. In the mapper, combine these information as your omit key.2) If you don't have any grouping requirement, yo

why hadoop does not provide a round robin partitioner

2012-09-20 Thread java8964 java8964
Hi, During my development of ETLs on hadoop platform, there is one question I want to ask, why hadoop didn't provide a round robin partitioner? >From my experience, it is very powerful option for small limited distinct >value keys case, and balance the ETL resource. Here is what I want to say: 1