Dose Yarn shared cache use memory ?

2019-07-03 Thread kevin su
specific data in memory, maybe it could let Mapreduce much faster. Thanks in advanced BR, Kevin Su

yarn launch docker error

2019-06-09 Thread kevin su
is kevin Creating script paths... Creating local dirs... [2019-06-10 05:32:42.367]Container exited with a non-zero exit code 29. [2019-06-10 05:32:42.367]Container exited with a non-zero exit code 29. my container-executor.cfg 1 yarn.nodemanager.linux-container-executor.group=kevin 2

Can start two RM in one cluster

2019-06-07 Thread kevin su
report, or there is a meta-data in RM, so it will directly find one of NM to get resources ? Thanks in advanced Kevin Best regards

When workers >= 2 and ps == 0 worker should throw exception

2019-05-29 Thread kevin su
0) { return true; } else if (nWorkers <= 1 && nPS > 0) { throw new ParseException("Only specified one worker but non-zero PS, " + "please double check."); } return false; } should throw a exception ? Kevin, Best Regards

Quick start on github can not use

2019-04-01 Thread kevin su
Hi user, Quick start on github can not use, Maybe the website address should be updated. Best Regards, Kevin

Re: Hadoop 2.8.0: Job console output suggesting non-existent rmserver 8088:proxy URI

2017-09-12 Thread Kevin Buckley
doesn't take anyone anywhere. Thanks again for the insight: I think know where I need to look now, Kevin - To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org For additional commands, e-mail: user-h...@hadoop.apache.org

Hadoop 2.8.0: Job console output suggesting non-existent rmserver 8088:proxy URI

2017-09-07 Thread Kevin Buckley
ssing from the configuration, that produces the "8088/proxy" path in the URIs that the console output presents ? Kevin --- Kevin M. Buckley eScience Consultant School of Engineering and Computer Science Victoria University of We

Hadoop 2.8.0: Use of container-executor.cfg to restrict access to MapReduce jobs

2017-08-06 Thread Kevin Buckley
the ability via the container-executor.cfg list seemed a simple way to achieve that. Any clues/insight welcome, Kevin --- Kevin M. Buckley eScience Consultant School of Engineering and Computer Science Victoria Univer

Re: Kerberised JobHistory Server not starting: User jhs trying to create the /mr-history/done directory

2017-08-06 Thread Kevin Buckley
On 25 July 2017 at 03:21, Erik Krogen <ekro...@linkedin.com> wrote: > Hey Kevin, > > Sorry, I missed your point about using auth_to_local. You're right that you > should be able to use that for what you're trying to achieve. I think it's > just that your rule is wrong

Re: Kerberised JobHistory Server not starting: User jhs trying to create the /mr-history/done directory

2017-07-23 Thread Kevin Buckley
On 21 July 2017 at 13:25, Kevin Buckley <kevin.buckley.ecs.vuw.ac...@gmail.com> wrote: > On 21 July 2017 at 04:04, Erik Krogen <ekro...@linkedin.com> wrote: >> Hi Kevin, >> >> Since you are using the "jhs" keytab with principal "jhs/_h...@realm.tld&

Re: Kerberised JobHistory Server not starting: User jhs trying to create the /mr-history/done directory

2017-07-20 Thread Kevin Buckley
On 21 July 2017 at 04:04, Erik Krogen <ekro...@linkedin.com> wrote: > Hi Kevin, > > Since you are using the "jhs" keytab with principal "jhs/_h...@realm.tld", > the JHS is authenticating itself as the jhs user (which is the actual > important part, ra

v2.8.0: Setting PID dir EnvVars: libexec/thing-config.sh or etc/hadoop/thing-env.sh ?

2017-06-14 Thread Kevin Buckley
a "to be preferred" file choice, between those two, within which to set certain classes of EnvVars ? Any info/pointers welcome, Kevin --- Kevin M. Buckley eScience Consultant School of Engineering and Computer Science Victoria University of Wellington New Zealand

Hive _SUCCESS flag

2017-02-14 Thread Kevin Lasenby
a/browse/HIVE-3700 <-- created 2012 with no updates Discussion about this behavior: http://stackoverflow.com/questions/13017433/override-hadoops-mapreduce-fileoutputcommitter-marksuccessfuljobs-in-oozie Thanks, Kevin Lasenby

Re: hdfs2.7.3 kerberos can not startup

2016-09-21 Thread kevin
4:57:31 [hadoop@dmp1 ~]$ I have run kinit had...@example.com before . 2016-09-21 10:14 GMT+08:00 Wei-Chiu Chuang <weic...@cloudera.com>: > You need to run kinit command to authenticate before running hdfs dfs -ls > command. > > Wei-Chiu Chuang > > On Sep 20, 2016, at 6:

Re: hdfs2.7.3 kerberos can not startup

2016-09-20 Thread kevin
This is probably due to some missing configuration. > > Could you please re-check the ssl-server.xml, keystore and truststore > properties: > > > > ssl.server.keystore.location > > ssl.server.keystore.keypassword > > ssl.client.truststore.location > > ssl.

Re: hdfs2.7.3 kerberos can not startup

2016-09-20 Thread kevin
https://goo.gl/M6l3vv may help you. > > > >>>>>>>2016-09-20 00:54:06,665 INFO org.apache.hadoop.http.HttpServer2: > HttpServer.start() threw a non Bind IOException > java.io.IOException: !JsseListener: java.lang.NullPointerException > > This is probably due to some mis

hdfs2.7.3 kerberos can not startup

2016-09-19 Thread kevin
*hi,all:* *My environment : Centos7.2 hadoop2.7.3 jdk1.8* *after I config hdfs with kerberos ,I can't start up with sbin/start-dfs.sh* *::namenode log as below * *STARTUP_MSG: build = Unknown -r Unknown; compiled by 'root' on 2016-09-18T09:05Z* *STARTUP_MSG: java = 1.8.0_102*

Re: About Archival Storage

2016-07-20 Thread kevin
gt; and how to set storage polices accordingly. > > > > The effort was suspended somehow because the contributors are working on > HDFS erasure coding feature. It will be revived soon and feedback are > welcome! > > > > Regards, > > Kai > > > > *From:* k

Re: About Archival Storage

2016-07-20 Thread kevin
-hdfs/ArchivalStorage.html > to know more about storage types, storage policies and hdfs commands. Hope > this helps. > > Rakesh > > On Wed, Jul 20, 2016 at 10:30 AM, kevin <kiss.kevin...@gmail.com> wrote: > >> Thanks again. "automatically" what I mean is

Re: About Archival Storage

2016-07-19 Thread kevin
use algorithm like LRU、LFU ? > It will simply iterating over the lists in the order of files/dirs given > to this tool as an argument. afaik, its just maintains the order mentioned > by the user. > > Regards, > Rakesh > > > On Wed, Jul 20, 2016 at 7:05 AM, kevin <ki

Re: About Archival Storage

2016-07-19 Thread kevin
N with low latency. HDFS will store block data > in memory and lazily save it to disk avoiding incurring disk write latency > on the hot path. By writing to local memory we can also avoid checksum > computation on the hot path. > > Regards, > Rakesh > > On Tue, Jul 19,

About Archival Storage

2016-07-19 Thread kevin
I don't quite understand :"Note that the Lazy_Persist policy is useful only for single replica blocks. For blocks with more than one replicas, all the replicas will be written to DISK since writing only one of the replicas to RAM_DISK does not improve the overall performance." Is that mean I

Re: Using YARN with native applications

2015-05-27 Thread Kevin
how exactly the memory usage is being calculated. -Varun From: Kevin Reply-To: user@hadoop.apache.org Date: Wednesday, May 27, 2015 at 6:22 PM To: user@hadoop.apache.org Subject: Re: Using YARN with native applications Varun, thank you for helping me understand this. You pointed out

Re: Using YARN with native applications

2015-05-27 Thread Kevin
then looks at the memory usage of the process tree and compares it to the limits for the container. -Varun From: Kevin Reply-To: user@hadoop.apache.org Date: Tuesday, May 26, 2015 at 7:22 AM To: user@hadoop.apache.org Subject: Re: Using YARN with native applications Thanks

Re: Using YARN with native applications

2015-05-27 Thread Kevin
Ah, okay. That makes sense. Thanks for all your help, Varun. -Kevin On Wed, May 27, 2015 at 9:53 AM Varun Vasudev vvasu...@hortonworks.com wrote: For CPU isolation, you have to use Cgroups with the LinuxContainerExecutor. We don’t enforce cpu limits with the DefaultContainerExecutor

Re: Using YARN with native applications

2015-05-25 Thread Kevin
its eye on the JVM it spins up for the container (under the DefaultContainerExecutor). -Kevin On Mon, May 25, 2015 at 4:17 AM, Varun Vasudev vvasu...@hortonworks.com wrote: Hi Kevin, By default, the NodeManager monitors physical and virtual memory usage of containers. Containers that exceed

Using YARN with native applications

2015-05-21 Thread Kevin
cgroups. I'm interested to know how YARN reacts to non-Java applications running inside of it. Thanks, Kevin

Re: Lifetime of jhist files

2015-05-14 Thread Kevin
Thanks, Naga, that worked. I didn't catch that property in the mapred-default.xml. Sorry for such a late response. On Tue, Apr 28, 2015 at 11:01 PM, Naganarasimha G R (Naga) garlanaganarasi...@huawei.com wrote: Hi Kevin, Could check the below configuration for the job history server

Lifetime of jhist files

2015-04-28 Thread Kevin
in /user/history/done/year/month/day Any feedback would be great. Thanks, Kevin

Re: Copying many files to HDFS

2015-02-16 Thread Kevin
Johny, NiFi looks interesting but I can't really grasp how it will help me. If you could provided some example code or a more detail explanation of how you set up a topology, then that would be great. On Fri, Feb 13, 2015 at 10:38 AM, johny casanova pcgamer2...@outlook.com wrote: Hi Kevin

Re: Copying many files to HDFS

2015-02-13 Thread Kevin
, 2015 at 9:03 AM, Alexander Alten-Lorenz wget.n...@gmail.com wrote: Kevin, Slurper can help here: https://github.com/alexholmes/hdfs-file-slurper BR, Alexander On 13 Feb 2015, at 14:28, Kevin kevin.macksa...@gmail.com wrote: Hi, I am setting up a Hadoop cluster (CDH5.1.3) and I need

Re: Run a c++ program using opencv libraries in hadoop

2014-12-18 Thread Kevin
You could run it as a shell action using Oozie. Write a shell script to run your application. Put all the application's dependencies (e.g., *.so) into a lib directory. Put the shell script in the parent directory of that lib directory that I just mentioned. Create a simple Oozie workflow that runs

Permissions issue with launching MR job from Oozie shell

2014-11-17 Thread Kevin
needs to create the partition file, determine the number of reducers, etc. I chose the shell action as my solution. As user 'kevin', I submit and run my Oozie workflow (using the oozie client command). I understand that Oozie executes the shell as the yarn user, but it appears that the user 'kevin

Re: run arbitrary job (non-MR) on YARN ?

2014-10-29 Thread Kevin
=blots=psGuJYlY1Ysig=khp3b3hgzsZLZWFfz7GOe2yhgyYhl=ensa=Xei=0U5RVKzDLeTK8gGgoYGoDQved=0CFcQ6AEwCA#v=onepageqf=false Hopefully this helps, Kevin On Mon Oct 27 2014 at 2:21:18 AM Yang tedd...@gmail.com wrote: I happened to run into this interesting scenario: I had some mahout seq2sparse jobs

MapReduce data decompression using a custom codec

2014-09-10 Thread POUPON Kevin
Hello, I developed a custom compression codec for Hadoop. Of course Hadoop is set to use my codec when compressing data. For testing purposes, I use the following two commands: Compression test command: --- hadoop jar

2.4 / yarn pig jobs fail due to exit code 1 from container.

2014-05-23 Thread Kevin Burton
Trying to track down exactly what's happening. Right now I'm getting this (see below). The setup documentation for 2.4 could definitely be better. Probably with a sample/working config. Looks like too much of this is left up as an exercise to the user. 2014-05-23 21:20:30,652 INFO

debugging class path issues with containers.

2014-05-23 Thread Kevin Burton
What's the best way to debug yarn container issues? I was going to try to tweak the script but it gets deleted after the job fails. Looks like I'm having an issue with the classpath.. I'm getting a basic hadoop NCDFE on startup so I think it just has a broken class path. but of course I need to

The documentation for permissions of ./bin/container-executor should be more clear.

2014-05-23 Thread Kevin Burton
This just bit me… spent half a day figuring it out! :-( The only way I was able to debug it was with ./bin/container-executor --checksetup Once that stopped complaining my jobs were working ok. this shouldn't have taken that much time… initial setup documentation could be seriously improved.

Re: How to unsubscribe from this list?

2013-12-18 Thread Kevin O'dell
Hi Alex, http://hadoop.apache.org/mailing_lists.html Check out On Dec 18, 2013 7:37 AM, Alex Luya alexander.l...@gmail.com wrote: Hello Can anybody tell me a way to unsubscribe from this list?

Fwd: class not found on namenode/datanode startup

2013-11-19 Thread Kevin D'Elia
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello, I have configured hadoop install according to instructions I found on the internet; when I start hadoop namenode/datanode, I get: java.lang.NoClassDefFoundError: org/apache/hadoop/hdfs/server/namenode/NameNode Caused by:

RE: Permission problem

2013-04-30 Thread Kevin Burton
I have relaxed it even further so now it is 775 kevin@devUbuntu05:/var/log/hadoop-0.20-mapreduce$ hadoop fs -ls -d / Found 1 items drwxrwxr-x - hdfs supergroup 0 2013-04-29 15:43 / But I still get this error: 2013-04-30 07:43:02,520 FATAL

RE: Permission problem

2013-04-30 Thread Kevin Burton
the permission to 775 so that the group would also have write permission but that didn't seem to help. From: Mohammad Tariq [mailto:donta...@gmail.com] Sent: Tuesday, April 30, 2013 8:20 AM To: Kevin Burton Subject: Re: Permission problem user?ls shows hdfs and the log says mapred.. Warm

RE: Permission problem

2013-04-30 Thread Kevin Burton
for hadoop hdfs and mr. Ideas? From: Kevin Burton [mailto:rkevinbur...@charter.net] Sent: Tuesday, April 30, 2013 8:31 AM To: user@hadoop.apache.org Cc: 'Mohammad Tariq' Subject: RE: Permission problem That is what I perceive as the problem. The hdfs file system was created with the user 'hdfs

RE: Permission problem

2013-04-30 Thread Kevin Burton
Thank you. mapred.system.dir is not set. I am guessing that it is whatever the default is. What should I set it to? /tmp is already 777 kevin@devUbuntu05:~$ hadoop fs -ls /tmp Found 1 items drwxr-xr-x - hdfs supergroup 0 2013-04-29 15:45 /tmp/mapred kevin@devUbuntu05

RE: Permission problem

2013-04-30 Thread Kevin Burton
namehadoop.tmp.dir/name value/data/hadoop/tmp/hadoop-${user.name}/value descriptionHadoop temporary folder/description /property From: Arpit Gupta [mailto:ar...@hortonworks.com] Sent: Tuesday, April 30, 2013 9:48 AM To: Kevin Burton Cc: user@hadoop.apache.org Subject: Re: Permission

RE: Permission problem

2013-04-30 Thread Kevin Burton
on HDFS. I already have this created. Found 1 items drwxr-xr-x - mapred supergroup 0 2013-04-29 15:45 /tmp/mapred kevin@devUbuntu05:/etc/hadoop/conf$ hadoop fs -ls -d /tmp Found 1 items drwxrwxrwt - hdfs supergroup 0 2013-04-29 15:45 /tmp When you suggest that I 'chmod

RE: Permission problem

2013-04-30 Thread Kevin Burton
[mailto:ar...@hortonworks.com] Sent: Tuesday, April 30, 2013 10:48 AM To: Kevin Burton Cc: user@hadoop.apache.org Subject: Re: Permission problem It looks like hadoop.tmp.dir is being used both for local and hdfs directories. Can you create a jira for this? What i recommended is that you create

Can't initialize cluster

2013-04-30 Thread Kevin Burton
I have a simple MapReduce job that I am trying to get to run on my cluster. When I run it I get: 13/04/30 11:27:45 INFO mapreduce.Cluster: Failed to use org.apache.hadoop.mapred.LocalClientProtocolProvider due to error: Invalid mapreduce.jobtracker.address configuration value for

RE: Can't initialize cluster

2013-04-30 Thread Kevin Burton
To be clear when this code is run with 'java -jar' it runs without exception. The exception occurs when I run with 'hadoop jar'. From: Kevin Burton [mailto:rkevinbur...@charter.net] Sent: Tuesday, April 30, 2013 11:36 AM To: user@hadoop.apache.org Subject: Can't initialize cluster I have

RE: Can't initialize cluster

2013-04-30 Thread Kevin Burton
/hadoop/tmp/hadoop-mapred/mapred/staging/kevin/. staging/job_201304301251_0003 13/04/30 12:59:40 ERROR security.UserGroupInformation: PriviledgedActionException as:kevin (auth:SIMPLE) cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://devubuntu05:9000

Re: Warnings?

2013-04-29 Thread Kevin Burton
If it doesn't work what are my options? Is there source that I can download and compile? On Apr 29, 2013, at 10:31 AM, Ted Xu t...@gopivotal.com wrote: Hi Kevin, Native libraries are those implemented using C/C++, which only provide code level portability (instead of binary level

Re: Incompartible cluserIDS

2013-04-29 Thread Kevin Burton
PM, Mohammad Tariq donta...@gmail.com wrote: Hello Kevin, Have you reformatted the NN(unsuccessfully)?Was your NN serving some other cluster earlier or your DNs were part of some other cluster?Datanodes bind themselves to namenode through namespaceID and in your case the IDs

Re: Incompartible cluserIDS

2013-04-29 Thread Kevin Burton
It is '/'? On Apr 29, 2013, at 5:09 PM, Mohammad Tariq donta...@gmail.com wrote: make it 755. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Tue, Apr 30, 2013 at 3:30 AM, Kevin Burton rkevinbur...@charter.net wrote: Thank you the HDFS system seems to be up

Re: M/R job to a cluster?

2013-04-28 Thread Kevin Burton
Shriparv On Sun, Apr 28, 2013 at 1:18 AM, sudhakara st sudhakara...@gmail.com wrote: Hello Kevin, In the case: JobClient client = new JobClient(); JobConf conf - new JobConf(WordCount.class); Job client(default in local system) picks configuration information by referring

RE: Warnings?

2013-04-28 Thread Kevin Burton
? Thanks again. Kevin From: Ted Xu [mailto:t...@gopivotal.com] Sent: Friday, April 26, 2013 10:49 PM To: user@hadoop.apache.org Subject: Re: Warnings? Hi Kevin, Please see my comments inline, On Sat, Apr 27, 2013 at 11:24 AM, Kevin Burton rkevinbur...@charter.net wrote

Re: M/R job to a cluster?

2013-04-26 Thread Kevin Burton
It is hdfs://devubuntu05:9000. Is this wrong? Devubuntu05 is the name of the host where the NameNode and JobTracker should be running. It is also the host where I am running the M/R client code. On Apr 26, 2013, at 4:06 PM, Rishi Yadav ri...@infoobjects.com wrote: check core-site.xml and see

RE: M/R Staticstics

2013-04-26 Thread Kevin Burton
Answers below. From: Omkar Joshi [mailto:ojo...@hortonworks.com] Sent: Friday, April 26, 2013 7:15 PM To: user@hadoop.apache.org Subject: Re: M/R Staticstics Have you enabled security? No can you share the output for your hdfs? bin/hadoop fs -ls / kevin@devUbuntu05:~$ hadoop

Re: Warnings?

2013-04-26 Thread Kevin Burton
Is the native library not available for Ubuntu? If so how do I load it? Can I tell which key is off? Since I am just starting I would want to be as up to date as possible. It is out of date probably because I copied my examples from books and tutorials. The main class does derive from Tool.

Comparison between JobClient/JobConf and Job/Configuration

2013-04-25 Thread Kevin Burton
I notice that in some beginning texts on starting a Hadoop MapReduce job sometimes JobClient/JobConf is used and sometimes Job/Configuration is used. I have yet to see anyone comment on the features/benefits of either set of methods. Could someone comment on their preferred method for starting a

Import with Sqoop

2013-04-23 Thread Kevin Burton
to do anything. I executed 'hadoop fs -ls' and I didn't see anything. Any ideas what I have done wrong? Kevin

Re: HDFS using SAN

2012-10-17 Thread Kevin O'dell
** ** -- Have a Nice Day! Lohit -- Kevin O'Dell Customer Operations Engineer, Cloudera

Re: cdh free manager install

2012-10-12 Thread Kevin O'dell
installation section. So I tried to install oracle jdk1.6.0_31 first and tried to install the cdh free manager 4 again, but it's still failed on the same section. Is there something that I should do to make it run correctly? Thanks. -- Kevin O'Dell Customer Operations Engineer, Cloudera

TableReducer keyout

2012-06-18 Thread Kevin
specified? Can I just use null when writing the output key in the reducer class (e.g., context.write(null, MyPut))? It seems like in this usage of MapReduce the keyout would be only used when chaining jobs. -Kevin

Re: MapReduce jobs remotely

2012-05-03 Thread Kevin
this will be helpful to someone else. Thanks. On Thu, May 3, 2012 at 1:02 AM, Harsh J ha...@cloudera.com wrote: Kevin, What version of Pig are you using? Have you tried setting the right MR home directory to point Pig to the local MR configuration for YARN? $ HADOOP_MAPRED_HOME=/usr/lib

MapReduce jobs remotely

2012-05-02 Thread Kevin
to a remote Hadoop cluster to work in distributed mode? -Kevin

Re: Sharing data between maps

2012-04-04 Thread Kevin Savage
On 4 Apr 2012, at 22:07, John Armstrong j...@ccri.com wrote: On 04/04/2012 05:00 PM, Kevin Savage wrote: However, what we have is one big file of design data that needs to go to all the maps and many big files of climate data that need to go to one map each. I've not been able to work out

Re: A new map reduce framework for iterative/pipelined jobs.

2011-12-27 Thread Kevin Burton
Thanks for sharing. I'd love to play with it, do you have a README/user-guide for systat? Not a ton but I could write some up... Basically I modeled it after vmstat/iostat on Linux. http://sebastien.godard.pagesperso-orange.fr/documentation.html The theory is that most platforms have

A new map reduce framework for iterative/pipelined jobs.

2011-12-26 Thread Kevin Burton
One key point I wanted to mention for Hadoop developers (but then check out the announcement). I implemented a version of sysstat (iostat, vmstat, etc) in Peregrine and would be more than happy to move it out and put it in another dedicated project.

Re: Performance of direct vs indirect shuffling

2011-12-21 Thread Kevin Burton
, Kevin Burton burtona...@gmail.comwrote: We've discussed 'push' v/s 'pull' shuffle multiple times and each time turned away due to complexities in MR1. With MRv2 (YARN) this would be much more doable. Ah gotcha. This is what I expected as well. It would be interesting to see a list

Performance of direct vs indirect shuffling

2011-12-20 Thread Kevin Burton
The current hadoop implementation shuffles directly to disk and then those disk files are eventually requested by the target nodes which are responsible for doing the reduce() on the intermediate data. However, this requires more 2x IO than strictly necessary. If the data were instead shuffled

Re: Performance of direct vs indirect shuffling

2011-12-20 Thread Kevin Burton
On Tue, Dec 20, 2011 at 4:53 PM, Todd Lipcon t...@cloudera.com wrote: The advantages of the pull based shuffle is fault tolerance - if you shuffle to the reducer and then the reducer dies, you have to rerun *all* of the earlier maps in the push model. you would have the same situation if you

Re: Performance of direct vs indirect shuffling

2011-12-20 Thread Kevin Burton
We've discussed 'push' v/s 'pull' shuffle multiple times and each time turned away due to complexities in MR1. With MRv2 (YARN) this would be much more doable. Ah gotcha. This is what I expected as well. It would be interesting to see a list of changes like this in MR1 vs MR2 to see what

output from one map reduce job as the input to another map reduce job?

2011-09-27 Thread Kevin Burton
Is it possible to connect the output of one map reduce job so that it is the input to another map reduce job. Basically… then reduce() outputs a key, that will be passed to another map() function without having to store intermediate data to the filesystem. Kevin -- Founder/CEO Spinn3r.com

Re: Has anyone ever written a file system where the data is held in resources

2011-09-14 Thread Kevin Burton
You would probably have to implement your own Hadoop filesystem similar to S3 and KFS integrate. I looked at it a while back and it didn't seem insanely difficult … Kevin On Wed, Sep 14, 2011 at 9:47 AM, Steve Lewis lordjoe2...@gmail.com wrote: No - the issue is I want is I want Hadoop

RE: Hadoop 0.20.2: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_201008131730_0001/attempt_201008131730_0001_m_000000_2/output/file.out.index in any of

2010-08-16 Thread Kevin .
From: awittena...@linkedin.com To: common-user@hadoop.apache.org Date: Mon, 16 Aug 2010 06:44:37 + On Aug 15, 2010, at 8:07 PM, Kevin . wrote: I tried your recommendation, absolute path, it worked, I was able to run the jobs successfully. Thank you! I was wondering why

RE: Hadoop 0.20.2: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_201008131730_0001/attempt_201008131730_0001_m_000000_2/output/file.out.index in any of

2010-08-15 Thread Kevin .
Hi, Hemanth. Thinks for your reply! I tried your recommendation, absolute path, it worked, I was able to run the jobs successfully. Thank you! I was wondering why hadoop.tmp.dir ( or mapred.local.dir ? ) with relative path didn't work. Thanks. Date: Fri, 13 Aug 2010 16:35:24 +0530 Subject:

RE: Hadoop 0.20.2: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_201008131730_0001/attempt_201008131730_0001_m_000000_2/output/file.out.index in any of

2010-08-13 Thread Kevin Chen
Thinks for your reply! 1. I login through SSH without password from master and slaves, it's all right :-) 2. property namehadoop.tmp.dir/name valuetmp/value /property In fact, 'tmp' is what I want :-) $HADOOP_HOME + tmp

Reduce 99.44% complete and job is marked as successful?

2010-06-21 Thread Kevin Tse
Hi, everyone Is there some mistakes in the display of the completeness percentage? or the job is not completed successfully? this is the job details page: *User:* root *Job Name:* MyMRJob *Job File: *hdfs://hadoop-master:9000 /hadoop/dfs/mapred/system/job_201006171158_0002/job.xml *Job Setup:*

Re: No KeyValueTextInputFormat in hadoop-0.20.2?

2010-06-20 Thread Kevin Tse
Is there anybody knowing about this, please? On Mon, Jun 14, 2010 at 10:21 PM, Kevin Tse kevintse.on...@gmail.comwrote: Hi Ted, I mean the new API: org.apache.hadoop.mapreduce.Job.setInputFormatClass(org.apache.hadoop.mapreduce.InputFormat) Job.setInputFormatClass() only accepts

No KeyValueTextInputFormat in hadoop-0.20.2?

2010-06-14 Thread Kevin Tse
at the SecondarySort.java example code, it uses TextInputFormat and StringTokenizer to split each line, it is ok but kinda awkward to me. Do I have to implement a new InputFormat myself or there's a KeyValueTextInputFormat that exists somewhere I didn't notice? Thank you. Kevin Tse

Re: No KeyValueTextInputFormat in hadoop-0.20.2?

2010-06-14 Thread Kevin Tse
parameter. On Mon, Jun 14, 2010 at 10:03 PM, Ted Yu yuzhih...@gmail.com wrote: Have you checked src/mapred/org/apache/hadoop/mapred/KeyValueTextInputFormat.java ? On Mon, Jun 14, 2010 at 6:51 AM, Kevin Tse kevintse.on...@gmail.com wrote: Hi, I am upgrading my code from hadoop-0.19.2

Is it possible to sort values before they are sent to the reduce function?

2010-06-13 Thread Kevin Tse
Hi, For each key, there might be millions of values(LongWritable), but I only want to emit top 20 of these values which I want to be sorted in descending order. So is it possible to sort these values before they enter the reduce phase? Thank you in advance! Kevin

Re: Is it possible to sort values before they are sent to the reduce function?

2010-06-13 Thread Kevin Tse
Hi Alex, I am was reading Tom's book, but I have not reached chapter 6 yet. I just read it, it is really helpful. Thank you for mentioning it, and Thanks also goes to Tom. Kevin On Mon, Jun 14, 2010 at 10:22 AM, Alex Kozlov ale...@cloudera.com wrote: Hi Kevin, This is a very common technique

Questions about mapred.local.dir

2010-06-08 Thread Kevin Tse
directories I don't know whether this is harmless, but it seems so cause my MR job completed successfully. And another question, is it possible to make the reduces start to run before all the maps complete? Thank you in advance. Kevin Tse

Re: Why does the default packet size in HDFS is 64k?

2010-06-06 Thread Kevin Tse
you mean data blocks in HDFS? take a look at this and read the Data Block section.http://hadoop.apache.org/common/docs/r0.19.1/hdfs_design.html On Mon, Jun 7, 2010 at 8:59 AM, ChingShen chingshenc...@gmail.com wrote: Hi all, Why does the default packet size in HDFS is 64k? How do we know 64k

Re: Is Hadoop applicable to this problem.

2010-05-31 Thread Kevin Tse
Aleksandar, Thank you so much, now I think I have all I need to give Hadoop a run in a cluster environment. - Kevin Tse On Mon, May 31, 2010 at 2:54 PM, Aleksandar Stupar stupar.aleksan...@yahoo.com wrote: Hi guys, this looks to me as a set self join problem. As I see it the easiest way

Hadoop sort

2010-05-30 Thread Kevin Tse
If the data generated by the Map function and Reduce function is far bigger than the available RAM on my server, is it possible to sort the data?

Re: Is Hadoop applicable to this problem.

2010-05-29 Thread Kevin Tse
. the format of the final result I wish to get from the original data is like the following: 111 222,333,444,888 (if there are more than 20 items here, I just want the top 20) 222 111,333,444,888 333 111,222,444 444 111,222,333 888 111 Your help will be greatly appreciated. - Kevin Tse On Sat, May

Is Hadoop applicable to this problem.

2010-05-28 Thread Kevin Tse
333 list1,list3 444 list2 888 list4 My question is: Is hadoop applicable to this problem, if so, would you please give me a clue on how to implement the Map function and the Reduce function. Thank you in advance. - Kevin Tse

Inverted word index...

2010-05-17 Thread Kevin Apte
say ro to ru ?Or do I have to lookup Bloom Filters for every tablet? Kevin

Re: Using HBase on other file systems

2010-05-11 Thread Kevin Apte
- storage bricks, striping files across multiple nodes and automatic self healing- I am assuming these features exist in all of the file systems- but Gluster seems to be low cost and professionally supported, as is Cloudera. Kevin On Wed, May 12, 2010 at 5:10 AM, Buttler, David buttl...@llnl.gov

Using HBase on other file systems

2010-05-09 Thread Kevin Apte
not work for Gluster. I have just started researching this, so I have not fact checked it adequately. Kevin

Re: How does HBase perform load balancing?

2010-05-08 Thread Kevin Apte
Are these the good links for the Yahoo Benchmarks? http://www.brianfrankcooper.net/pubs/ycsb-v4.pdf http://research.*yahoo*.com/files/ycsb.pdf Kevin On Sat, May 8, 2010 at 3:00 PM, Ryan Rawson ryano...@gmail.com wrote: hey, HBase currently uses region count to load balance. Regions

Re: How is column timestamp useful?

2010-05-06 Thread Kevin Apte
files compressed using gZip, multiple versions of a row may compress very well. Kevin On Fri, May 7, 2010 at 10:14 AM, tsuna tsuna...@gmail.com wrote: In addition to what Ryan said, even if the default maximum number of versions for a cell is 3 doesn't mean that you end up wasting space

Re: How is column timestamp useful?

2010-05-06 Thread Kevin Apte
should be turned off. Kevin On Fri, May 7, 2010 at 10:49 AM, Takayuki Tsunakawa tsunakawa.ta...@jp.fujitsu.com wrote: Hello, Kevin-san Yes, Hadoop DFS maintains three copies of the same data (version) at the file system level. What I'm wondering about is the necessity of different versions

Re: Improving HBase scanner

2010-05-05 Thread Kevin Apte
If you add secondary indexing- where does the index get stored? Are there separate set of files for every index? For example, if I index on Fields A, B, C and D will there be a separate set of files for the 4 indices? Kevin On Wed, May 5, 2010 at 8:31 PM, Seraph Imalia ser...@eisp.co.za wrote

Re: Partially partitioned connectivity

2010-04-29 Thread Kevin Webb
On Thu, 29 Apr 2010 13:24:45 -0700 Mahadev Konar maha...@yahoo-inc.com wrote: Hi Kevin, I had the response set up but didn't hit send. Ted already answered your question, but to give you a more technical background assuming that you know a little bit more about transaction ids in ZooKeeper

Re: znode cversion decreasing?

2010-04-12 Thread Kevin Webb
On Mon, 12 Apr 2010 09:27:46 -0700 Mahadev Konar maha...@yahoo-inc.com wrote: HI Kevin, The cversion should be monotonically increasing for the the znode. It would be a bug if its not. Can you please elaborate in which cases you are seeing the cversion decreasing? If you can reproduce

Re: znode cversion decreasing?

2010-04-12 Thread Kevin Webb
On Mon, 12 Apr 2010 14:33:44 -0700 Mahadev Konar maha...@yahoo-inc.com wrote: Hi Kevin, Thanks for the info. Could you cut and paste the code you are using that prints the view info? That would help. We can then create a jira and follow up on that. Also, a zookeeper client can never go

znode cversion decreasing?

2010-04-11 Thread Kevin Webb
the way node version numbers work? Is there a better/recommended way to implement a monotonically increasing group number? Thanks! Kevin [1] http://hadoop.apache.org/zookeeper/docs/r3.2.2/recipes.html [2] http://eng.kaching.com/2010/01/actually-implementing-group-management.html

  1   2   3   >