Re: error when starup with two dfs.name.dir

2010-03-03 Thread 方阳
Is "nfslock" already running? If not, these steps may help: sudo /etc/init.d/portmap start sudo /etc/init.d/nfs start sudo /etc/init.d/nfslock start sudo /sbin/chkconfig --level 12345 portmap on sudo /sbin/chkconfig --level 12345 nfs on sudo /sbin/chkconfig --level 12345 nfs

Re: Hadoop as master's thesis

2010-03-03 Thread Thomas Koch
Hi Tonci, > I'm thinking of using Hadoop as a subject in my master's thesis in > Computer Science. I'm supposed to solve some kind of a problem with > Hadoop, but can't think of any :)). I've a question, that could be a topic for a master thesis, although it's more a question about hadoop and

error when starup with two dfs.name.dir

2010-03-03 Thread Zheng Lv
Hello Everyone, I added a NFS mount point to the dfs.name.dir configuration option, but after that when I restarted the hadoop cluster I got these: / 2010-03-03 18:32:59,708 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing

Re: Unexpected termination of a job

2010-03-03 Thread Rakhi Khatwani
Hi, I tried running it on eclipse, the job starts... but somehow it terminates throwing an exception, Job Failed. thats why i wanted to run on jobtracker to check the logs but the execution terminates even before the job starts(during the preprocessing). How do i ensure that the job runs in

Will interactive password authentication fail talk between namenode-datanode/jobtracker-tasktracker?

2010-03-03 Thread jiang licht
I set up a simple cluster with one master (namen...@50001 and jobtrac...@50002) and one slave. The problem is that although namenode/datanode and jobracker/tasktracker are running but there is no datanode in the dfs! Both datanode and tasktracker reports similar messages in their logs: INFO org

Re: "lost task" suspect Distributed Cache to blame

2010-03-03 Thread Edward Capriolo
On Tue, Mar 2, 2010 at 6:55 PM, Allen Wittenauer wrote: > > > > On 3/2/10 12:49 PM, "Edward Capriolo" wrote: > >> This job is somewhat special (for us) in that in involves shipping >> large files over distributed cache. My working theory is that >> something goes wrong with the distributed cache/

Re: dataset

2010-03-03 Thread Gang Luo
That is a good idea, but doesn't work in my case. What I want to do is to test how my partitioner could divide the workload properly. It is supposed to go against skew, but not to generate skew. I still need a skewed data source. Any ideas? Thanks, -Gang - 原始邮件 发件人: Aaron Kimball

Re: Separate mail list for streaming?

2010-03-03 Thread Michael Kintzer
I am thanks. The community is great. But the noise level is sometimes a little high for me as a newbie. A list dedicated to Streaming would be easier to search and would be more focused. But I totally understand the concern. No worries.Was just curious if anyone else felt similarly. -

Re: Unexpected termination of a job

2010-03-03 Thread Aaron Kimball
If it's terminating before you even run a job, then you're in luck -- it's all still running on the local machine. Try running it in Eclipse and use the debugger to trace its execution. - Aaron On Wed, Mar 3, 2010 at 4:13 AM, Rakhi Khatwani wrote: > Hi, >I am running a job which has lot

Re: Separate mail list for streaming?

2010-03-03 Thread Aaron Kimball
We've already got a lot of mailing lists :) If you send questions to mapreduce-user, are you not getting enough feedback? - Aaron On Wed, Mar 3, 2010 at 12:09 PM, Michael Kintzer wrote: > Hi, > > Was curious if anyone else thought it would be useful to have a separate > mail list for discussion/

Re: dataset

2010-03-03 Thread Aaron Kimball
Look at implementing your own Partitioner implementation to control which records are sent to which reduce shards. - Aaron On Wed, Mar 3, 2010 at 12:15 PM, Gang Luo wrote: > Hi all, > I want to generate some datasets with data skew to test my mapreduce jobs. > I am using TPC-DS but it seems I c

Efficient implementation of "MapReduceReduce" in Hadoop

2010-03-03 Thread Jørn Schou-Rode
After mapping and reducing some data, I need to do an additional processing step. This additional step shares the conract of a reduce function, expecting its input data (the output from the original reduce) to be grouped by key. Currently, I achieve the above using two iterations: 1. MyMapper ->

dataset

2010-03-03 Thread Gang Luo
Hi all, I want to generate some datasets with data skew to test my mapreduce jobs. I am using TPC-DS but it seems I cannot control the data skew level. There is a suite from Microsoft that could generate skewed datasets based on TPC-D, but only workable in windows. I haven't succeed make it comp

Separate mail list for streaming?

2010-03-03 Thread Michael Kintzer
Hi, Was curious if anyone else thought it would be useful to have a separate mail list for discussion/issues specific to Hadoop Streaming? Thanks, Michael

RE: Hbase VS Hive

2010-03-03 Thread Michael Segel
> Date: Thu, 4 Mar 2010 00:42:11 +0700 > From: fitrah.fird...@gmail.com > To: common-user@hadoop.apache.org > Subject: Hbase VS Hive > > Hello Everyone > > I want to ask about Hbase and Hive. > > What is the different between Hbase and Hive? and then what is the > consideration for > choose

Re: Hbase VS Hive

2010-03-03 Thread Jean-Daniel Cryans
HBase is used to do random reads on files stored in Hadoop, among other things. It's really a database. Hive is a data warehousing infrastructure built on top of Hadoop and will even soon work on top of HBase too. J-D On Wed, Mar 3, 2010 at 9:42 AM, Fitrah Elly Firdaus wrote: > Hello Everyone >

Hadoop User Group (Bay Area) - March 24th at Yahoo!

2010-03-03 Thread Dekel Tankel
Hi all, RSVP is open for the next monthly Bay Area Hadoop user group at the Yahoo! Sunnyvale Campus, Wednesday, March 24th, 6PM Please note that due to the growing demand we are moving the meeting location to a larger facility *Building C, Second Floor, Classroom 5* (It's in the same campus, j

Re: Cluster Summary, Heap Size

2010-03-03 Thread Saptarshi Guha
Oh, I presume this is the value of HADOOP_HEAPSIZE specified in the hadoop-env.sh and is the heap given to the deamon by Java. I guess this answers the question. On Tue, Mar 2, 2010 at 10:46 PM, Saptarshi Guha wrote: > Hello, > I often notice "Cluster Summary (Heap Size is 281/889)" where the 281

Hbase VS Hive

2010-03-03 Thread Fitrah Elly Firdaus
Hello Everyone I want to ask about Hbase and Hive. What is the different between Hbase and Hive? and then what is the consideration for choose between Hbase or Hive? Kind regards

Hbase VS Hive

2010-03-03 Thread Fitrah Elly Firdaus
Hello Everyone I want ask about Hbase and Hive. What the different Hbase and Hive? and then what the consideration for choose Hbase or Hive? Kind regards

Re: hadoop from apache or cloudera?

2010-03-03 Thread Todd Lipcon
On Wed, Mar 3, 2010 at 8:45 AM, Michael Segel wrote: > > Not trying to speak for Cloudera, but the last time I spoke with them, the > release shipped with Cloudera is the same as the latest release of HBase on > Apache. > > This is true for HBase, but for the case of Hadoop we apply 200+ patches o

Re: hadoop from apache or cloudera?

2010-03-03 Thread Fitrah Elly Firdaus
On 03/03/2010 11:48 PM, David Rosenstrauch wrote: On 03/03/2010 11:41 AM, Fitrah Elly Firdaus wrote: Dear all, I'm new comer in hadoop. I want ask about hadoop. which one should i install, hadoop from apache, or hadoop from cloudera? and then what the different between hadoop from apache and c

Re: Tools to automatically setup new slaves (PXE boot?)

2010-03-03 Thread Steve Loughran
sagar_shukla wrote: Hi Paul, Alternative to PXE boot is - do the installation on a single box, create a ghost image of that setup and then replicate ghost image on new servers. PXE boot is mainly used when you do not have physical access to the servers and installation needs to be done rem

Re: hadoop from apache or cloudera?

2010-03-03 Thread David Rosenstrauch
On 03/03/2010 11:41 AM, Fitrah Elly Firdaus wrote: Dear all, I'm new comer in hadoop. I want ask about hadoop. which one should i install, hadoop from apache, or hadoop from cloudera? and then what the different between hadoop from apache and cloudera? Regards See: http://www.cloudera.com/h

RE: hadoop from apache or cloudera?

2010-03-03 Thread Michael Segel
Not trying to speak for Cloudera, but the last time I spoke with them, the release shipped with Cloudera is the same as the latest release of HBase on Apache. We decided to standardize on Cloudera for Hadoop, so we pull HBase from Cloudera because we want to deal with a single source. HTH -M

hadoop from apache or cloudera?

2010-03-03 Thread Fitrah Elly Firdaus
Dear all, I'm new comer in hadoop. I want ask about hadoop. which one should i install, hadoop from apache, or hadoop from cloudera? and then what the different between hadoop from apache and cloudera? Regards

Re: Tools to automatically setup new slaves (PXE boot?)

2010-03-03 Thread Edward Capriolo
Also you could utilize a pxe boot and serve a shared root / file system over nfs, swap can be a local or remote disk. I think that may be more of what you are asking about? On 3/2/10, sagar_shukla wrote: > Hi Paul, > Alternative to PXE boot is - do the installation on a single box, create

Re: Tools to automatically setup new slaves (PXE boot?)

2010-03-03 Thread Steve Loughran
Edward Capriolo wrote: In a redhat environment PXE+ KICKSTART is a great way to go. You can get nice fast consistent builds automatically. The post section of a kickstart allows you to run scripts or install RPM's. There are some tools Kobbler that give you some nice PXE install management, an

Unexpected termination of a job

2010-03-03 Thread Rakhi Khatwani
Hi, I am running a job which has lotta preprocessing involved. so whn i run my class from a jarfile, somehow it terminates after sometime without giving any exception, i have tried running the same program several times, and everytime it terminates at different locations in the code(during

Re: Hadoop as master's thesis

2010-03-03 Thread Tonci Buljan
Sounds great!!! Thank you. On 3 March 2010 09:05, Huy Phan wrote: > Hi Matteo, > it sounds good :) > We will wait for your work. > > > On 03/03/2010 02:28 PM, Matteo Nasi wrote: > >> hi guys, >> sorry for the delay, it's a busy week :-) There's no problem about sharing >> my work. >> However th

Re: Hadoop as master's thesis

2010-03-03 Thread Huy Phan
Hi Matteo, it sounds good :) We will wait for your work. On 03/03/2010 02:28 PM, Matteo Nasi wrote: hi guys, sorry for the delay, it's a busy week :-) There's no problem about sharing my work. However there are some issue to consider: - my final doc is a 135 page description of what I did, but i