Re: measuring iops

2012-10-23 Thread Brian Bockelman
IOPS are quite relevant. Just recall that they are not the end-all, be-all for HDFS performance measurement. It's not the primary number I would look for! Each install will have their own requirements. Brian On Oct 23, 2012, at 6:01 PM, Rita rmorgan...@gmail.com wrote: I was curious because

Re: libhdfs install dep

2012-09-25 Thread Brian Bockelman
hadoop hadoop = 1.0.3-1 Normally, you would expect to see something like this (using the CDH4 distribution as an example) as it contains a shared library: [bbockelm@brian-test ~]$ rpm -q --provides hadoop-libhdfs libhdfs.so.0()(64bit) hadoop-libhdfs = 2.0.0+88-1.cdh4.0.0.p0.30.osg.el5 libhdfs.so

Re: Hadoop-on-demand and torque

2012-05-20 Thread Brian Bockelman
Hi Ralph, I admit - I've only been half-following the OpenMPI progress. Do you have a technical write-up of what has been done? Thanks, Brian On May 20, 2012, at 9:31 AM, Ralph Castain wrote: FWIW: Open MPI now has an initial cut at MR+ that runs map-reduce under any HPC environment. We

Re: Similar frameworks like hadoop and taxonomy of distributed computing

2012-01-11 Thread Brian Bockelman
the same level of data-integration as Hadoop does, so tackles a much simpler problem (i.e., bring-your-own-data-management!). /condor-geek Brian smime.p7s Description: S/MIME cryptographic signature

Re: More cores Vs More Nodes ?

2011-12-14 Thread Brian Bockelman
an estimated CPU-millenia per byte of data… they needed a general purpose cluster for a certain value of general purpose. Brian On Dec 14, 2011, at 7:29 AM, Michael Segel wrote: Aw Tommy, Actually no. You really don't want to do this. If you actually ran a cluster and worked in the real world

Re: heap size problem durning mapreduce

2011-11-28 Thread Brian Bockelman
Erm - actually, the heap size you are specifying for the child is 1TB if I'm counting numbers correctly. Is it possible that Java is bombing because Linux isn't allowing you to overcommit that much memory? Brian On Nov 28, 2011, at 6:21 AM, Harsh J wrote: Hoot, Your settings of 10 GB per

Re: Adding a new platform support to Hadoop

2011-11-17 Thread Brian Bockelman
was changed, why it worked better on the target platform, and how the optimization will affect your target platform. Extremely hard difficulty. Brian On Nov 17, 2011, at 9:02 AM, Amir Sanjar wrote: Is there any specific development, build, and packaging guidelines to add support for a new

Re: FUSE CRASHING

2011-10-14 Thread Brian Bockelman
Hi Deepti, That appears to crash deep in pthread, which would scare me a bit. Are you using a strange/non-standard platform? What Java version? What HDFS version? Brian On Oct 14, 2011, at 3:59 AM, Banka, Deepti wrote: Hi, I am trying to run FUSE and, its crashing randomly

Re: performance normal?

2011-10-08 Thread Brian Bockelman
Normal operation is a function of hardware. Giving the version without the underlying hardware means I get to make up any answer I feel like. I can't imagine a rational set of hardware where 10 megabits (as in, one megabyte) a second is normal. Brian On Oct 8, 2011, at 3:04 AM, Bochun Bai

Re: Is SAN storage is a good option for Hadoop ?

2011-09-29 Thread Brian Bockelman
latency. 2) As Paul pointed out, you have to ask yourself whether the SAN is shared or dedicated. Many SANs don't have the ability to strongly partition workloads between users.. Brian

Re: libhdfs: 32 bit jvm on 64 bit machine

2011-09-27 Thread Brian Bockelman
On Sep 27, 2011, at 9:24 AM, Vivek K wrote: Hi Brian Thanks for a prompt response. The machines on cluster didn't have libhdfs.so.0 file. So I copied my libhdfs.so (that came with cloudera vm - libhdfs0 and libhdfs0-dev) on the cluster machine. So it should be 32-bit. The wrong ELF

Re: libhdfs: 32 bit jvm on 64 bit machine

2011-09-27 Thread Brian Bockelman
be difficult to achieve. Brian On Sep 27, 2011, at 9:33 AM, Vivek K wrote: Thanks Brian. A quick question: can we have both 32bit and 64bit jvms on the cluster machines ? Vivek -- On Tue, Sep 27, 2011 at 10:28 AM, Brian Bockelman bbock...@cse.unl.eduwrote: On Sep 27, 2011, at 9:24

Re: risks of using Hadoop

2011-09-17 Thread Brian Bockelman
workflows, it doesn't measure up against specialized systems. You really want to make sure that Hadoop is the best tool for your job. Brian

Re: risks of using Hadoop

2011-09-17 Thread Brian Bockelman
definitely don't want to shrug off data loss / downtime. However, there's many people who simply don't need this. If I'm told that I can buy a 10% larger cluster by accepting up to 15 minutes of data loss, I'd do it in a heartbeat where I work. Brian On Sep 17, 2011, at 6:38 PM, Tom Deutsch wrote

Re: risks of using Hadoop

2011-09-17 Thread Brian Bockelman
:) I think we can agree to that point. Hopefully a plethora of viewpoints is good for the community! (And when we run into something that needs higher availability, I'll drop by and say hi!) On Sep 17, 2011, at 8:32 PM, Tom Deutsch wrote: Not trying to give you a hard time Brian - we just

Re: How big data and/or how many machines do I need to take advantage of Hadoop?

2011-08-31 Thread Brian Bockelman
Hi Kuro, A 100MB file should take 1 second to read; typically, MR jobs get scheduled on the order of seconds. So, it's unlikely you'll see any benefit. You'll probably want to have a look at Amdahl's law: http://en.wikipedia.org/wiki/Amdahl%27s_law Brian On Aug 31, 2011, at 3:48 AM

Re: error -2 (No such file or directory) when mounting fuse-dfs

2011-06-07 Thread Brian Bockelman
Hi Elena, FUSE-DFS is extremely picky about hostnames. All of the following should have the exact same string: - Output of hostname on the namenode. - fs.default.name - Primary reverse-DNS of the namenode's IP. localhost is almost certainly not what you want. Brian On Jun 7, 2011, at 9:47

Re: Does Hadoop 0.20.2 and HBase 0.90.3 compatible ??

2011-06-03 Thread Brian Bockelman
doing it. In all likelihood, if you're going to be working with a piece of software (Hadoop-based or not!), you'll re-install it a few times. The install of HDFS should take roughly the same amount of time on 2, 20, or 200 nodes. Brian On Jun 3, 2011, at 6:47 AM, Andrew Purtell wrote

Re: Poor IO performance on a 10 node cluster.

2011-05-30 Thread Brian Bockelman
CFQ scheduler is inappropriate for batch workloads. Finally, if you don't have enough host-level monitoring to indicate the current bottleneck (CPU, memory, network, or I/O?), you likely won't ever be able to solve this riddle Brian

Re: Are hadoop fs commands serial or parallel

2011-05-20 Thread Brian Bockelman
multiple processes, but it beats any api-overcomplications, imho. Simple doesn't imply scalable, unfortunately. Brian Dieter On Wed, 18 May 2011 11:39:36 -0500 Patrick Angeles patr...@cloudera.com wrote: kinda clunky but you could do this via shell: for $FILE in $LIST_OF_FILES ; do

Re: measure throughput of cluster

2011-05-03 Thread Brian Bockelman
Hi Rita, An open file in HDFS doesn't take up any resources in the NN, so there is no corresponding close operation. Probably you want to increase the logging in the datanodes, which will print out activity per client. Brian On May 3, 2011, at 6:58 AM, Rita wrote: I am trying to acquire

Re: cannot run more than 6 jvms on one machine

2011-05-02 Thread Brian Bockelman
Check the overcommit VM settings on your kernel. These prevent swap from being used on older JVMs, and cause out-of-memory errors to be given by Java even when there is free memory. Brian On May 2, 2011, at 11:51 AM, Steve Loughran wrote: On 29/04/2011 03:37, stanley@emc.com wrote: Hi

Re: Running C hdfs Code in Hadoop

2011-04-28 Thread Brian Bockelman
Hi Adarsh, It appears you don't have the JVM libraries in your LD_LIBRARY_PATH. Try this: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$JAVA_HOME/jre/lib/amd64:$JAVA_HOME/jre/lib/amd64/server Brian On Apr 27, 2011, at 11:31 PM, Adarsh Sharma wrote: Dear all, Today I am trying to run a simple

Re: Fixing a bad HD

2011-04-25 Thread Brian Bockelman
Much quicker, but less safe: data might become inaccessible between boots if you simultaneously lose another node. Probably not an issue at 3 replicas, but definitely an issue at 2. Brian On Apr 25, 2011, at 7:58 PM, James Seigel wrote: Quicker: Shut off power Throw hard drive out put

Re: DFSClient: Could not complete file

2011-03-29 Thread Brian Bockelman
Hi Chris, One thing we've found helping in ext3 is examining your I/O scheduler. Make sure it's set to deadline, not CFQ. This will help prevent nodes from being overloaded; when du -sk is performed and the node is already overloaded, things quickly roll downhill. Brian On Mar 29, 2011

Re: Is there useradd in Hadoop

2011-03-23 Thread Brian Bockelman
In .20 and later, user and group information is taken from the NN's OS. There is no useradd or groupadd. Brian On Mar 23, 2011, at 1:19 AM, springring wrote: Hi, There are chmod、chown、chgrp in HDFS, is there some command like useradd -g to add a user in a group,? Even more

Re: hadoop fs -rmr /*?

2011-03-16 Thread Brian Bockelman
Hi W.P., Hadoop does apply permissions taken from the shell. So, if the directory is owned by user brian and user ted does a rmr /user/brian, then you get a permission denied error. By default, this is not safeguarded against malicious users. A malicious user will do whatever they want

Re: Anybody can help with MountableHDFS fuse-dfs please?

2011-03-08 Thread Brian Bockelman
Hi, Sounds like an issue with your Hadoop runtime environment. What does ldd /path/to/libhdfs.so say? What happens if you try one of the libhdfs test applications? Brian On Mar 8, 2011, at 12:50 PM, yxxtdc wrote: Hi, I am a new user to HDFS and am trying to following the instruction

Re: Anybody can help with MountableHDFS fuse-dfs please?

2011-03-08 Thread Brian Bockelman
On Mar 8, 2011, at 1:04 PM, yxxtdc wrote: Hi, xsn95:/ # ldd /root/build_src/hadoop-0.20.2/lib/libhdfs.so.0 linux-vdso.so.1 = (0x7413c000) libjvm.so = /root/build_src/hadoop-0.20.2/lib/libjvm.so (0x2adbd5442000) This is likely problematic. Why is libjvm.so

Re: Anybody can help with MountableHDFS fuse-dfs please?

2011-03-08 Thread Brian Bockelman
$JVM_LIB`:/usr/lib/ fi Brian On Mar 8, 2011, at 1:38 PM, yxxtdc wrote: I copied the libjvm.so from /usr/java/jdk1.6.0_24/jre/lib/amd64/server/libjvm.so. I added /usr/java/jdk1.6.0_24/jre/lib/amd64/server to the LD_LIBRARY_PATH and it did not work so I made a manual copy to $HADOOP_HOME/lib

Re: Anybody can help with MountableHDFS fuse-dfs please?

2011-03-08 Thread Brian Bockelman
On Mar 8, 2011, at 3:41 PM, yxxtdc wrote: Thanks Brian. Got over that by fixing a typo in LD_LIBRARY_PATH and now fuse_dfs is mounting albeit very very very slowly, like one inode a minute. What do you mean by mounting? Do you mean listing? Anything on the order of a minute is out

Re: Hadoop Developer Question

2011-03-04 Thread Brian Bockelman
Try living in Nebraska... By time the fun stuff gets here, it's COBOL. :) On Mar 4, 2011, at 7:59 AM, Habermaas, William wrote: How come all the Hadoop jobs are in the Bay area? Doesn't anybody use Hadoop in NY? -Original Message- From: Brady Banks

Re: HDFS file content restrictions

2011-03-04 Thread Brian Bockelman
, but the point is that MR will feed you whole records regardless of whether they are stored on one or two blocks. Brian On Mar 4, 2011, at 2:24 PM, Kelly Burkhart wrote: On Fri, Mar 4, 2011 at 1:42 PM, Harsh J qwertyman...@gmail.com wrote: HDFS does not operate with records in mind. So does

Re: EXT :Re: Problem running a Hadoop program with external libraries

2011-03-04 Thread Brian Bockelman
Hi, Check your kernel's overcommit settings. This will prevent the JVM from allocating memory even when there's free RAM. Brian On Mar 4, 2011, at 3:55 PM, Ratner, Alan S (IS) wrote: Aaron, Thanks for the rapid responses. * ulimit -u unlimited is in .bashrc

Re: Performance Test

2011-03-03 Thread Brian Bockelman
for. Brian On Mar 2, 2011, at 9:39 PM, Ted Dunning wrote: It will be very difficult to do. If you have n machines running 4 different things, you will probably get better results segregating tasks as much as possible. Interactions can be very subtle and can have major impact on performance

Re: Hadoop and image processing?

2011-03-03 Thread Brian Bockelman
be silly to not consider Hadoop. If you currently run a bag full of shell scripts and C++ code, it's a tougher decision to make. Brian smime.p7s Description: S/MIME cryptographic signature

Re: Comparison between Gzip and LZO

2011-03-02 Thread Brian Bockelman
whatever you do for LZO and Gzip/Hadoop has a large startup overhead? Again, sounds like you'll be spending an hour or so with a profiler. Brian On Mar 2, 2011, at 2:16 PM, Niels Basjes wrote: Question: Are you 100% sure that nothing else was running on that system during the tests? No cron jobs

Re: start anyways with missing blocks

2011-01-21 Thread Brian Bockelman
Hi Mike, You want to take things out of safemode before you can make these changes. hadoop dfsadmin -safemode leave Then you can do the hadoop fsck / -delete Brian On Jan 21, 2011, at 2:12 PM, mike anderson wrote: Also, here's the output of dfsadmin -report. What seems weird is that it's

Re: Import data from mysql

2011-01-14 Thread Brian McSweeney
at my gmail address and I'd be very happy to help all I can. kind regards and best of luck with the book! Brian On Fri, Jan 14, 2011 at 6:02 AM, Mark Kerzner markkerz...@gmail.com wrote: Brian, I read with fascination your thread on MySQL and Hadoop. I enjoyed your polite answers to every

Re: Import data from mysql

2011-01-10 Thread Brian
or not using hadoop for this would make sense in order to parallelize the task if it gets too slow. Thanks again, Brian On 10 Jan 2011, at 13:21, Black, Michael (IS) michael.bla...@ngc.com wrote: I had no idea the kimono comment would be so applicable to your problem... Everything makes

Re: Import data from mysql

2011-01-10 Thread Brian McSweeney
Thanks Michael, As you say, I'll give your suggestion a try and see how it performs. thanks for all your help. I really appreciate it, Brian On Mon, Jan 10, 2011 at 8:46 PM, Black, Michael (IS) michael.bla...@ngc.com wrote: You need to stop looking at this as an all-or-nothing...and look

Re: Import data from mysql

2011-01-10 Thread Brian McSweeney
Thanks Ted, Good to know that hadoop can help. I'll look more into it also. really appreciate it. Brian On Mon, Jan 10, 2011 at 9:51 PM, Ted Dunning tdunn...@maprtech.com wrote: Yes. Hadoop can definitely help with this. On Mon, Jan 10, 2011 at 12:00 PM, Brian brian.mcswee...@gmail.com

Re: Import data from mysql

2011-01-09 Thread Brian McSweeney
thanks Sonal, I'll check it out On Sun, Jan 9, 2011 at 2:57 AM, Sonal Goyal sonalgoy...@gmail.com wrote: Hi Brian, You can check HIHO at https://github.com/sonalgoyal/hiho which can help you load data from any JDBC database to the Hadoop file system. If your table has a date or id field

Re: Import data from mysql

2011-01-09 Thread Brian McSweeney
that some of this functionality is perhaps now in the main api. I suppose any experience people have is welcome. I would want to run a batch job to export every day, perform my map reduce, and then import the results back into mysql afterwards. cheers, Brian On Sun, Jan 9, 2011 at 3:18 AM, Konstantin

Re: Import data from mysql

2011-01-09 Thread Brian McSweeney
of the values in the rows have to be multiplied together, some have to be compared, some have to have a function run against them etc. cheers, Brian On Sun, Jan 9, 2011 at 8:55 AM, Ted Dunning tdunn...@maprtech.com wrote: It is, of course, only quadratic, even if you compare all rows to all other rows

Re: Import data from mysql

2011-01-09 Thread Brian McSweeney
processing way. cheers, Brian On Sun, Jan 9, 2011 at 12:20 PM, Black, Michael (IS) michael.bla...@ngc.com wrote: What kind of compare do you have to do? You should be able to compute a checksum or such for each row when you insert them and only have to look at the subset that matches

Re: Import data from mysql

2011-01-09 Thread Brian McSweeney
and I hope I have opened up my kimono enough for you to get a sense of what I'm talking about :) thanks very much, Brian On Sun, Jan 9, 2011 at 1:51 PM, Black, Michael (IS) michael.bla...@ngc.comwrote: All you're doing is delaying the inevitable by going to hadoop. There's no magic to hadoop

Re: Import data from mysql

2011-01-09 Thread Brian McSweeney
Hi Ted, I agree about reducing the quadratic cost and hopefully my reply to Michael will show what my idea has been in this regard. I really appreciate the pointers on LSH and Mahoot and I'll read up on it and see if it helps out. thanks very much for your help. cheers, Brian On Sun, Jan 9

Re: Import data from mysql

2011-01-09 Thread Brian McSweeney
Thanks Jeff, Great info and I really appreciate it. cheers, Brian On Mon, Jan 10, 2011 at 12:00 AM, Jeff Hammerbacher ham...@cloudera.comwrote: Hey Brian, One final point about Sqoop: it's a part of Cloudera's Distribution for Hadoop, so it's Apache 2.0 licensed and tightly integrated

Re: Import data from mysql

2011-01-09 Thread Brian McSweeney
Hi Arvind, thanks very much for that. Very good to know. Sounds like Sqoop is just what I'm looking for. cheers, Brian On Sun, Jan 9, 2011 at 9:37 PM, arv...@cloudera.com arv...@cloudera.comwrote: Hi Brian, Sqoop supports incremental imports that can be run against a live database system

Import data from mysql

2011-01-08 Thread Brian McSweeney
://architects.dzone.com/articles/tools-moving-sql-database any advice on what approach to use? cheers, Brian

Re: monit? daemontools? jsvc? something else?

2011-01-04 Thread Brian Bockelman
in and figure out what's wrong - or just keep that node dead. Brian On Jan 3, 2011, at 10:40 PM, Allen Wittenauer wrote: On Jan 3, 2011, at 2:22 AM, Otis Gospodnetic wrote: I see over on http://search-hadoop.com/?q=monit+daemontools that people *do* use tools like monit and daemontools

Re: Is it possible to specify different replication factors for different files?

2010-12-22 Thread Brian Bockelman
are using. Brian On Dec 22, 2010, at 2:40 AM, Zhenhua Guo wrote: I know there is a configuration parameter that can be used to specify number of replicas. I wonder whether I can specify different values for some files in my program by using HDFS APIs. Thanks Gerald smime.p7s Description: S

Re: Mounting HDFS as local file system

2010-12-02 Thread Brian Bockelman
, and I typically remount things after a month or two of *heavy* usage. Across all the nodes in our cluster, we probably do a few billion HDFS operations per day over FUSE. Brian smime.p7s Description: S/MIME cryptographic signature

Re: Mounting HDFS as local file system

2010-12-02 Thread Brian Bockelman
On Dec 2, 2010, at 8:52 AM, Mark Kerzner wrote: Thank you, Brian. I found your paper Using Hadoop as grid storage, and it was very useful. One thing I did not understand in it is your file usage pattern - do you deal with small or large files, and do you delete them often enough? My

Re: Mounting HDFS as local file system

2010-12-02 Thread Brian Bockelman
On Dec 2, 2010, at 9:22 AM, Mark Kerzner wrote: Brian, that almost answers my question. Still, are you saying that the problem of Hadoop hates small files does not exist? Well, I'd say hates is too strong of a word. Several of the costs (NN memory, latency, efficiency) in HDFS

Re: How does hadoop uses ssh

2010-11-16 Thread Brian Bockelman
To be clear, You only need to use SSH if you don't have any other way to start processes on your worker nodes. Lots of larger production sites have ways to manage this without SSH, but this really gets down to whatever the site prefers (and their security team allows). Brian On Nov 16, 2010

Re: Configure Ganglia with Hadoop

2010-11-08 Thread Brian Bockelman
Ganglia31Context. Brian On Nov 8, 2010, at 8:34 AM, Shuja Rehman wrote: Hi I have cluster of 4 machines and want to configure ganglia for monitoring purpose. I have read the wiki and add the following lines to hadoop-metrics.properties on each machine. dfs.class

Re: Running A Paralle Job On HDFS

2010-11-08 Thread Brian Bockelman
is very sensitive to latency, so HDFS is likely not idea for your application. However, don't take my word for it, feel free to explore for yourself. Brian On Nov 8, 2010, at 6:29 AM, ranga_balim...@dell.com wrote: Hi, I've MPI-BLAST application to run on HDFS and evaluate Parallel I/O. Can

Re: is there any way to set the niceness of map/reduce jobs

2010-11-03 Thread Brian Bockelman
are the incorrect solution to separate the services if performance is the issue (they may be the solution if the issue is migration, future growth, complicated deployment, or somewhat security). You're just adding another layer that obfuscates what's happening on the hardware. Brian On Nov 3, 2010

Re: Large amount of corruption after balancer

2010-10-27 Thread Brian Bockelman
. However, let's say your cluster is corrupting data at a network level at a large scale. Then, why would you see it only with the balancer running? It's hard to see this as a plausible scenario, but, on the other hand, something happened. It's possible it's just an outright coincidence. Brian On Oct

Re: FUSE HDFS significantly slower

2010-10-26 Thread Brian Bockelman
are limited by the latency of spinning disk and random reads, we don't particularly hurt by going only 60MB/s on our nodes. If we wanted to go faster, we use the native clients. Of course, if anyone wants to donate a lowly university 1.5PB of SSDs, I'm all ears :) Brian On Oct 26, 2010, at 12

Re: Namenode corruption: need help quickly please

2010-10-25 Thread Brian Bockelman
of the NN classes? Brian On Oct 25, 2010, at 6:12 PM, phil young wrote: Wow. I could use help quickly... My name node is reporting a null BV. All the data nodes report the same Build Version. We were not upgrading the DFS, but did stop, restart, after adding a jar to $HADOOP_HOME/lib. So, we

Re: Ganglia 3.1 on Hadoop 0.20.2 ...

2010-08-25 Thread Brian Bockelman
Hi Gautam, Yup - that's one possible way to configure Ganglia and is common at many sites. That's why I usually recommend the telnet trick to determine what IP address your configuration is using. Brian On Aug 25, 2010, at 5:53 AM, Gautam wrote: Brian, Works for me now.. one should

Re: Ganglia 3.1 on Hadoop 0.20.2 ...

2010-08-24 Thread Brian Bockelman
configuration, it is set up to listen on UDP and write on TCP of the same port. A third thing to test is to switch the hadoop-metrics back to the file output, and make sure something gets written to the log file. The issue might be upstream. Brian This is what most of my hadoop-metrics looks

Re: namenode crash in centos. can anybody recommend jdk ?

2010-08-16 Thread Brian Bockelman
for minutes while it spent an increasing amount of time in GC routines. Brian On Aug 16, 2010, at 4:49 AM, Steve Loughran wrote: On 13/08/10 22:24, Allen Wittenauer wrote: On Aug 13, 2010, at 11:41 AM, Jinsong Hu wrote: and run the namenode with the following jvm config -Xmx1000m -XX

Re: Changing hostnames of tasktracker/datanode nodes - any problems?

2010-08-10 Thread Brian Bockelman
. This is useful in sites like ours where we have 24/7 usage and try to avoid any unnecessary downtime. Brian On Aug 10, 2010, at 8:42 AM, Allen Wittenauer wrote: On Aug 10, 2010, at 3:51 AM, Erik Forsberg wrote: Hi! Due to network reconfigurations, I need to change the hostnames of some of my

Re: Best practices - Large Hadoop Cluster

2010-08-10 Thread Brian Bockelman
by your operating system or some other service management tool accepted by your organization (for example, SmartFrog from HP Labs goes above and beyond Linux's somewhat antiquated system). This statement does not change if X=Hadoop. Brian On Aug 10, 2010, at 1:13 PM, Gokulakannan M wrote: Hi

Re: Is it safe to set default/minimum replication to 2?

2010-07-21 Thread Brian Bockelman
sites that roughly follow the same rules. We haven't discovered any fatal software bugs that cause data loss since the various ones in 0.19 were ironed out. Brian On Jul 21, 2010, at 8:29 PM, Bobby Dennett wrote: The team that manages our Hadoop clusters is currently being pressured

Newbie question...is hadoop right for my app

2010-07-09 Thread Brian
? And is HBase a suitable storage mechanism for this type of data? I know it's a total newbie question so any help is greatly appreciated. Cheers, Brian Sent from my iPhone

Re: iozone is not working with HDFS over libfuse

2010-07-07 Thread Brian Bockelman
On Jul 7, 2010, at 2:56 AM, Christian Baun wrote: Hi Brian, I wanted to test HDFS against several distributed filesystems. Do you know any popular performance benchmarks that run with HDFS? I can't think of anything off the top of my head. Any ideas out there on the list? The issue

Re: iozone is not working with HDFS over libfuse

2010-07-06 Thread Brian Bockelman
of harddrives / network file systems / cluster file systems, you will find it doesn't capture well the performance aspects of a distributed file system. In other words, you are performance testing an apple with a test suite designed for oranges. Brian On Jul 6, 2010, at 1:14 PM, Christian

Re: Caching in HDFS C API Client

2010-06-14 Thread Brian Bockelman
outsmarts you (and I don't know about you, but it often outsmarts me...). Brian On Jun 14, 2010, at 9:35 AM, Owen O'Malley wrote: Indeed. On the terasort benchmark, I had to run intermediate jobs that were larger than ram on the cluster to ensure that the data was not coming from the file cache

Re: calling C programs from Hadoop

2010-05-30 Thread Brian Bockelman
Uh... So you want a batch system? Look up PBS (Torque/Maui), SGE, or Condor. Brian On May 29, 2010, at 8:17 PM, Michael Robinson wrote: Thanks for your answers. I have read hadoop streaming and I think it is great, however what I am trying to do is to run a C program that I have

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread Brian Bockelman
Hey Pierre, These are not traditional filesystem blocks - if you save a file smaller than 64MB, you don't lose 64MB of file space.. Hadoop will use 32KB to store a 32KB file (ok, plus a KB of metadata or so), not 64MB. Brian On May 18, 2010, at 7:06 AM, Pierre ANCELOT wrote: Hi, I'm

Re: Data node decommission doesn't seem to be working correctly

2010-05-18 Thread Brian Bockelman
dfsadmin -report to determine precisely which IP address your datanode is listening on. Brian On May 17, 2010, at 11:32 PM, Scott White wrote: I followed the steps mentioned here: http://developer.yahoo.com/hadoop/tutorial/module2.html#decommission to decommission a data node. What I see from

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread Brian Bockelman
reading the HDFS design document for background issues like this: http://hadoop.apache.org/common/docs/r0.20.0/hdfs_design.html Brian On Tue, May 18, 2010 at 2:34 PM, Brian Bockelman bbock...@cse.unl.eduwrote: Hey Pierre, These are not traditional filesystem blocks - if you save a file

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread Brian Bockelman
heavily upon your implementation and hardware. Our HDFS routinely serves 5-10 Gbps. Brian On May 18, 2010, at 10:29 AM, Nyamul Hassan wrote: This is a very interesting thread to us, as we are thinking about deploying HDFS as a massive online storage for a on online university, and then serving

Re: Data node decommission doesn't seem to be working correctly

2010-05-18 Thread Brian Bockelman
. Brian On May 18, 2010, at 12:02 PM, Scott White wrote: Dfsadmin -report reports the hostname for that machine and not the ip. That machine happens to be the master node which is why I am trying to decommission the data node there since I only want the data node running on the slave nodes. Dfs

Re: NameNode deadlocked (help?)

2010-05-17 Thread Brian Bockelman
On May 17, 2010, at 5:25 AM, Steve Loughran wrote: Brian Bockelman wrote: On May 14, 2010, at 8:27 PM, Todd Lipcon wrote: Hey Brian, Yep, excessive GC definitely sounds like a likely culprit. I'm surprised you didn't see OOMEs in the log, though. We didn't until the third restart today

NameNode deadlocked (help?)

2010-05-14 Thread Brian Bockelman
restarted this cluster a few hours ago and made the following changes: 1) Increased the number of datanode handlers from 10 to 40. 2) Increased ipc.server.listen.queue.size from 128 to 256. If nothing else, I figure a deadlocked NN might be interesting to devs... Brian 2010-05-14 17:11:30 Full thread

Re: Try to mount HDFS

2010-04-23 Thread Brian Bockelman
the built-in utilities is that it will give you better terminal feedback. Alternately, I find myself mounting things in debug mode to see the Hadoop issues printed out to the terminal. Brian On Apr 23, 2010, at 8:30 AM, Christian Baun wrote: Brian, You got it!!! :-) It works (partly)! i

Re: Try to mount HDFS

2010-04-22 Thread Brian Bockelman
Hey Christian, I've run into this before. Make sure that the hostname/port you give to fuse is EXACTLY the same as listed in hadoop-site.xml. If these aren't the same text string (including the :8020), then you get those sort of issues. Brian On Apr 22, 2010, at 5:00 AM, Christian Baun

Re: Libhdfs - Client's retry behaviour

2010-04-21 Thread Brian Bockelman
an iptables firewall on it. Try to see if you can open the port manually (telnet server name-node-A 4600) from the client node and namenode to see if there's any difference. This will allow you to distinguish between two possible error cases. Brian smime.p7s Description: S/MIME cryptographic

Re: Hadoop, C API, and fork

2010-04-06 Thread Brian Bockelman
everything. No idea if this would work. Brian On Apr 6, 2010, at 9:51 AM, Patrick Donnelly wrote: Hi, I have a distributed file server front end to Hadoop that uses the libhdfs C API to talk to Hadoop. Normally the file server will fork on a new client connection but this does not work

Re: Hadoop for accounting?

2010-03-23 Thread Brian Bockelman
what hardware or user we throw at it. Our scientists love it. However, there's a damn good reason that transactions were invented, especially for accounting/billing matters... Brian On Mar 23, 2010, at 11:30 AM, Allen Wittenauer wrote: On 3/23/10 4:04 AM, Marcos Medrado Rubinelli marc

Re: hadoop under cygwin issue

2010-03-17 Thread Brian Wolf
Alex Kozlov wrote: Hi Brian, Is your namenode running? Try 'hadoop fs -ls /'. Alex On Mar 12, 2010, at 5:20 PM, Brian Wolf brw...@gmail.com wrote: Hi Alex, I am back on this problem. Seems it works, but I have this issue with connecting to server. I can connect 'ssh localhost' ok

Re: hadoop under cygwin issue

2010-03-13 Thread Brian Wolf
Hi Alex, seems to: $ bin/hadoop fs -ls / Found 1 items drwxr-xr-x - brian supergroup 0 2010-03-13 10:45 /tmp However, I think this might be the source of the problems, whenever I invoke any of the scripts, I get always get these issues: localhost: /usr/bin/bash: /usr/local

Re: hadoop under cygwin issue

2010-03-12 Thread Brian Wolf
Hi Alex, I am back on this problem. Seems it works, but I have this issue with connecting to server. I can connect 'ssh localhost' ok. Thanks Brian $ bin/hadoop jar hadoop-*-examples.jar pi 2 2 Number of Maps = 2 Samples per Map = 2 10/03/12 17:16:17 INFO ipc.Client: Retrying connect

Re: bulk data transfer to HDFS remotely (e.g. via wan)

2010-03-02 Thread Brian Bockelman
was selected because it is common in our field, and we already have the certificate infrastructure well setup. GridFTP is fast too - many Gbps is not too hard. YMMV Brian On Mar 2, 2010, at 1:30 AM, jiang licht wrote: I am considering a basic task of loading data to hadoop cluster

This is not a DFS error starting secondarynamenode when using S3FileSystem

2010-03-02 Thread Brian Long
for that matter?) Thanks, Brian

Re: bulk data transfer to HDFS remotely (e.g. via wan)

2010-03-02 Thread Brian Bockelman
as a common protocol, (b) we have a long history with using GridFTP, and (c) we need to transfer many TB on a daily basis. Brian On Mar 2, 2010, at 12:10 PM, jiang licht wrote: Hi Brian, Thanks a lot for sharing your experience. Here I have some questions to bother you for more help :) So

Re: bulk data transfer to HDFS remotely (e.g. via wan)

2010-03-02 Thread Brian Bockelman
On Mar 2, 2010, at 3:51 PM, jiang licht wrote: Thanks, Brian. There is no certificate/grid infrastructure as of now yet for us. But I guess I can still use gridftp by noticing the following from its FAQ page: GridFTP can be run in a mode using standard SSH security credentials. It can

Re: basic hadoop job help

2010-02-18 Thread Brian Wolf
since i'm more or less in the same boat, this is the best I've seen, and the 2009 book is also very good: http://developer.yahoo.com/hadoop/ Brian On Thu, Feb 18, 2010 at 12:26 PM, Amogh Vasekar am...@yahoo-inc.com wrote: Hi, The hadoop meet last year has some very interesting business

Re: Inverse of a matrix using Map - Reduce

2010-02-04 Thread Brian Bockelman
Hey Abhishek, Why would you want to fully invert a matrix that large? How is it preconditioned? What is the condition number of the matrix? Why not just use ScaLAPACK? It's a hairy beast, but you should definitely consider it. Brian On Feb 3, 2010, at 9:57 PM, aa...@buffalo.edu wrote: Hi

Re: hadoop under cygwin issue

2010-02-03 Thread Brian Wolf
Alex Kozlov wrote: Live Nodes http://localhost:50070/dfshealth.jsp#LiveNodes : 0 You datanode is dead. Look at the logs in the $HADOOP_HOME/logs directory (or where your logs are) and check the errors. Alex K On Mon, Feb 1, 2010 at 1:59 PM, Brian Wolf brw...@gmail.com wrote

Re: aws

2010-02-02 Thread Brian Wolf
Now there's a deal! Thanks Sirota, Peter wrote: Hi Brian, AWS has Elastic MapReduce service where you can run Hadoop starting at 10 cents per hour. Check it out at http://aws.amazon.com/ank Disclamer: I work at AWS Sent from my phone On Feb 2, 2010, at 11:09 PM, Brian Wolf brw

Re: hadoop under cygwin issue

2010-02-01 Thread Brian Wolf
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=brian,None,Administrators,Usersip=/127.0.0.1cmd=create src=/cygwin/tmp/hadoop-SYSTEM/mapred/system/job_201002011323_0001/job.jar dst=nullperm=brian:supergroup:rw-r--r-- 2010-02-01 13:26:30,045 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3

hadoop under cygwin issue

2010-01-30 Thread Brian Wolf
org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext 2010-01-30 00:03:34,603 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=brian,None,Administrators,Users 2010-01-30 00:03

Re: Failed to install Hadoop on WinXP

2010-01-28 Thread brian
this interesting: Karmasphere Studio for Hadoop. http://www.hadoopstudio.org/ although I haven't fully tested it myself Brian

  1   2   >