RE: Is FileSystem thread-safe?

2013-05-17 Thread John Lilley
aren't allowed for any given file. Thanks, +Vinod Kumar Vavilapalli On May 17, 2013, at 6:40 AM, John Lilley wrote: Thanks! Does this also imply that multiple clients may open the same HDFS file for append simultaneously, and expect append requests to be interleaved? john From: Arpit Agarwal

RE: which hadoop version to use

2013-05-17 Thread John Lilley
Have you looked at HDP for Windows? http://hortonworks.com/download/ It is a 1.1-based distro and is designed for easier Windows install. I haven't used it myself. john From: Cheng, Yi [mailto:yi.ch...@hp.com] Sent: Friday, May 17, 2013 5:41 PM To: user@hadoop.apache.org Subject: which hadoop

YARN in 0.23 vs 2.0

2013-05-16 Thread John Lilley
We will be programming to the YARN resource manager and scheduler in an upcoming project, but I am unclear regarding its level of integration in each version. Searching for, say, ApplicationSubmissionContext in versions of Hadoop, I see it in 0.23 and 2.0, but not in 1.0 or 1.1. Does this

Distribution of native executables and data for YARN-based execution

2013-05-16 Thread John Lilley
I am attempting to distribute the execution of a C-based program onto a Hadoop cluster, without using MapReduce. I read that YARN can be used to schedule non-MapReduce applications by programming to the ASM/RM interfaces. As I understand it, eventually I get down to specifying each sub-task

Question about writing HDFS files

2013-05-16 Thread John Lilley
I seem to recall reading that when a MapReduce task writes a file, the blocks of the file are always written to local disk, and replicated to other nodes. If this is true, is this also true for non-MR applications writing to HDFS from Hadoop worker nodes? What about clients outside of the

RE: Is FileSystem thread-safe?

2013-03-31 Thread John Lilley
From: Ted Yu [mailto:yuzhih...@gmail.com] Subject: Re: Is FileSystem thread-safe? FileSystem is an abstract class, what concrete class are you using (DistributedFileSystem, etc) ? Good point. I am calling FileSystem.get(URI uri, Configuration conf) with an URI like hdfs://server:port/... on a

RE: Why big block size for HDFS.

2013-03-31 Thread John Lilley
From: Rahul Bhattacharjee [mailto:rahul.rec@gmail.com] Subject: Why big block size for HDFS. Many places it has been written that to avoid huge no of disk seeks , we store big blocks in HDFS , so that once we seek to the location , then there is only data transfer rate which would be

Recommandations for fast OS cloning on cluster

2013-01-16 Thread John Lilley
Are there standard approaches for setting up a Hadoop cluster quickly by cloning most of the data nodes?

RE: Query mongodb

2013-01-16 Thread John Lilley
How does one schedule mappers to read MongoDB or HBase in a data-locality-aware fashion? -john From: Mohammad Tariq [mailto:donta...@gmail.com] Sent: Wednesday, January 16, 2013 3:29 AM To: user@hadoop.apache.org Subject: Re: Query mongodb Yes. You can use MongoDB-Hadoop adapter to achieve

RE: Query mongodb

2013-01-16 Thread John Lilley
the jobs on the nodes where data is located. It is its fundamental nature. You don't have to do anything extra. *I am sorry if I misunderstood the question. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.comhttp://cloudfront.blogspot.com On Wed, Jan 16, 2013 at 8:10 PM, John

RE: Hadoop execution sequence

2013-01-15 Thread John Lilley
I think it will help for Ouch to clarify what is meant by in order. If one JSON file must be completely processed before the next file starts, there is not much point to using MapReduce at all, since your problem cannot be partitioned. On the other hand, there may be ways around this, for

RE: request on behalf of newbies

2013-01-13 Thread John Lilley
I think there's quite a few people like me here asking basic questions on the user@ group. From: Monkey2Code [mailto:monkey2c...@gmail.com] Sent: Sunday, January 13, 2013 2:23 PM To: gene...@hadoop.apache.org; user@hadoop.apache.org Subject: request on behalf of newbies Hi all, Am a newbie in

Scheduling non-MR processes

2013-01-12 Thread John Lilley
I am trying to understand how one can make a side process cooperate with the Hadoop MapReduce task scheduler. Suppose that I have an application that is not directly integrated with MapReduce (i.e., it is not a MapReduce job at all; there are no mappers or reducers). This application could

RE: Scheduling non-MR processes

2013-01-12 Thread John Lilley
on JNI for that? John -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: Saturday, January 12, 2013 9:41 AM To: user@hadoop.apache.org Subject: Re: Scheduling non-MR processes Hi, Inline. On Sat, Jan 12, 2013 at 9:39 PM, John Lilley john.lil...@redpoint.net wrote: I am

Getting started recommendations

2013-01-11 Thread John Lilley
We are somewhat new to Hadoop and are looking to run some experiments with HDFS, Pig, and HBase. With that in mind, I have a few questions: What is the easiest (preferably free) Hadoop distro to get started with? Cloudera? What host OS distro/release is recommended? What is the easiest

RE: Binary Search in map reduce

2013-01-07 Thread John Lilley
It depends. What data is going into the table, and what keys will drive the lookup? Let's suppose that you have a single JSON file that has some reasonable number of key/value tuples. You could easily load a Hashtable to associate the integer keys with the values (which appear to be lists of

RE: Hello and request some advice.

2013-01-04 Thread John Lilley
If you like RedHat, consider Centos also; it is a nearly-complete clone of the RHEL distro. John From: Nitin Pawar [mailto:nitinpawar...@gmail.com] Sent: Friday, January 04, 2013 10:46 AM To: user@hadoop.apache.org Subject: Re: Hello and request some advice. - Is Ubuntu a good O.S. for running

RE: Hello and request some advice.

2013-01-04 Thread John Lilley
perspective of Hadoop? --- On Fri, 4/1/13, John Lilley john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote: From: John Lilley john.lil...@redpoint.netmailto:john.lil...@redpoint.net Subject: RE: Hello and request some advice. To: user@hadoop.apache.orgmailto:user@hadoop.apache.org user

RE: Hadoop throughput question

2013-01-03 Thread John Lilley
throughput question Hadoop is using OneFS, not HDFS in our configuration. Isilon NAS and the Hadoop nodes are in the same datacenter but as far as rack locations, I cannot tell. From: John Lilley [mailto:john.lil...@redpoint.net] Sent: Thursday, January 03, 2013 5:15 PM To: user

RE: Hadoop throughput question

2013-01-03 Thread John Lilley
Perhaps if Artem posted the presumably-simple code we could get other users to benchmark other 4-node systems and compare. --John Lilley Artem Ervits are9...@nyp.org wrote: Setting the property to 64k made the throughput jump to 36mb/sec, 39mb for 128k. Thank you for the tip. From: Michael

<    1   2   3