RE: How to select random n records using mapreduce ?

2011-06-27 Thread Habermaas, William
I did something similar. Basically I had a random sampling algorithm that I called from the mapper. If it returned true I would collect the data, otherwise I would discard it. Bill -Original Message- From: ni...@basj.es [mailto:ni...@basj.es] On Behalf Of Niels Basjes Sent: Monday, J

RE: Hadoop on windows with bat and ant scripts

2011-06-10 Thread Habermaas, William
It is more than just getting away from shell script usage. Hadoop invokes the BASH shell internally to run commands like DF, DU, etc to perform operating system functions. CYGWIN is not intended as a production platform and its BASH shell doesn't always work which becomes a problem for Hadoop.

RE: remotely downloading file

2011-06-07 Thread Habermaas, William
e.org/hdfs/docs/current/api/org/apache/hadoop/hdfs/DistributedFileSystem.html But the examples i'm seeing are using the Configuration but i don't see that being used in those classes. Thanks again, Joe On Fri, Jun 3, 2011 at 5:05 PM, Habermaas, William < william.haberm...@fatwire

RE: remotely downloading file

2011-06-03 Thread Habermaas, William
You can access HDFS for reading and writing from other machines. The API works through the HDFS client which can be anywhere on the network and not just on the namenode. You just have to have the Hadoop core jar with your application wherever it is going to run. Bill -Original Message

RE: Bad connection to FS. command aborted

2011-05-11 Thread Habermaas, William
em if I am running a single-node cluster. Version mismatch with whom ? On Wed, May 11, 2011 at 7:07 PM, Habermaas, William < william.haberm...@fatwire.com> wrote: > The Hadoop IPCs are version specific. That is done to prevent an older > version from talking to a newer one. Even if noth

RE: Bad connection to FS. command aborted

2011-05-11 Thread Habermaas, William
The Hadoop IPCs are version specific. That is done to prevent an older version from talking to a newer one. Even if nothing has changed in the internal protocols the version check is enforced. Make sure the new hadoop-core.jar from your modification is on the classpath used by the hadoop shel

RE: hi

2011-05-02 Thread Habermaas, William
Look at your namenode log. From the log info you supplied one possibility is that HDFS doesn't think there is any space available on your system. Some CYGWIN installs do not work properly because BASH doesn't work when called internally by Hadoop. HDFS depends on running shell commands like DF

RE: Hadoop Developer Question

2011-03-04 Thread Habermaas, William
How come all the Hadoop jobs are in the Bay area? Doesn't anybody use Hadoop in NY? -Original Message- From: Brady Banks [mailto:br...@venatorventures.com] Sent: Thursday, March 03, 2011 12:49 PM To: common-user@hadoop.apache.org Subject: Hadoop Developer Question Hi All, I have a

RE: hadoop installation problem(single-node)

2011-03-02 Thread Habermaas, William
If you are interested in a quick start hadoop and don't mind if hbase is included take a look at the dashboard application at www.habermaas.com It is a free packaged hadoop setup. Just unzip it and run it. Bill -Original Message- From: Manish Yadav [mailto:manish.ya...@orkash.com] S

RE: Cluster setup

2010-11-09 Thread Habermaas, William
Visit the quickstart page and setup pseudo distributed mode on a single machine. http://hadoop.apache.org/common/docs/r0.20.0/quickstart.html Bill -Original Message- From: Fabio A. Miranda [mailto:fabio.a.mira...@gmail.com] Sent: Tuesday, November 09, 2010 1:54 PM To: common-user@hado

RE: Cluster setup

2010-11-09 Thread Habermaas, William
Fabio, You don't need 4 machines. You can put everything on a single machine. That is the easiest to get started. Once you have a cluster running on a single machine then you can spread out over multiple machines. Best, Bill -Original Message- From: Fabio A. Miranda [mailto:fabio

RE: Hadoop cluster setup

2010-02-03 Thread Habermaas, William
You can setup the machines and configure them without being connected over a network. But once you want to start up the services all machines have to be active and reachable on the LAN. Bill -Original Message- From: janani venkat [mailto:janani.cs...@gmail.com] Sent: Wednesday, February

Why DrWho

2009-12-07 Thread Habermaas, William
I am running Hadoop-0.20.1 on a Solaris box with dfs.permissions set to false. There is a working version of whoami on the system. Folders and files created by my program show up with an owner of DrWho. Folders and files created by Hbase-0.20.1 appear with the proper owner name. Do I nee

Bypassing SSH

2009-12-07 Thread Habermaas, William
Does anyone run Hadoop without SSH? Windows/Vista has a lot of problems with CYGWIN and SSHD. Unless the phase of the moon is just right and you have a magic rabbits foot it just doesn't work. I've spent much time trying to fix it just so I can do some Hadoop development. Since it doesn't w

RE: Using Hadoop in non-typical large scale user-driven environment

2009-12-02 Thread Habermaas, William
Hadoop isn't going to like losing its datanodes when people shutdown their computers. More importantly, when the datanodes are running, your users will be impacted by data replication. Unlike Seti, Hadoop doesn't know when the user's screensaver is running so it will start doing things when it

RE: Access Error

2009-11-19 Thread Habermaas, William
49 PM, Habermaas, William < william.haberm...@fatwire.com> wrote: > Hadoop will perform a 'whoami' to identify the user that is making the > HDFS request. If you have not turned off file permissions in the Hadoop > configuration, the user name will be matched to the permission sett

RE: Access Error

2009-11-19 Thread Habermaas, William
Hadoop will perform a 'whoami' to identify the user that is making the HDFS request. If you have not turned off file permissions in the Hadoop configuration, the user name will be matched to the permission settings related to the path you are going after. Think of it as a mechanism similar (but n

RE: indexing log files for adhoc queries - suggestions?

2009-10-01 Thread Habermaas, William
I had a similar requirement and wrote my reducer output to hbase. I used hbase versions to segregate the data by timestamps and formed the hbase keys to satisfy my retrieval requirements. Bill -Original Message- From: ishwar ramani [mailto:rvmish...@gmail.com] Sent: Thursday, October 01