hbase/hadoop cluster providers

2010-04-23 Thread Sujee Maniyam
HI all, We have a small cluster on EC2. It has been good so far. - c1.xlarge instance costs about $500 / month. (reserved instances cost about $150 / month for 1 yr term) - EBS volume IO throughput is just ok, and I have seen fluctuations. I'd like to see if there are any other providers tha

Re: Several questions about running HBase on EC2

2010-04-21 Thread Sujee Maniyam
Nice presentation Andy! Sean, I am experimenting with a small cluster on EC2 right now. Here is my experience. 1) it is a 5 node cluster (1 master + 4 slaves). All c1.xlarge instances. 2) I initially tried m1.large, but ran into some stability issues. So moved to c1.xlarge. Cluster is more s

can't run data-import and map-reduce-job on a Htable simultaneously

2010-04-14 Thread Sujee Maniyam
scenario: - I am writing data into Hbase - I am also kicking off a MR job that READS from the same table When the MR job starts, data-inserts pretty much halt, as if the table is 'locked out'. Is this behavior to be expected? my pseudo write code : HBaseConfiguration hbaseConfig = new HBaseConf

Re: hitting xceiverCount limit (2047)

2010-04-13 Thread Sujee Maniyam
t; How many regions do you have and how many families per region? Looks > like your datanodes have to keep a lot of xcievers opened. > > J-D > > On Tue, Apr 13, 2010 at 9:03 PM, Sujee Maniyam wrote: >> Thanks Stack. >> Do I also need to tweak timeouts?  right now they ar

Re: hitting xceiverCount limit (2047)

2010-04-13 Thread Sujee Maniyam
13, 2010 at 11:37 AM, Sujee Maniyam wrote: >> Hi all, >> >> I have been importing a bunch of data into my hbase cluster, and I see >> the following error: >> >> Hbase error : >> hdfs.DFSClient: Exception in createBlockOutputStream >> java.io.IOExcep

hitting xceiverCount limit (2047)

2010-04-13 Thread Sujee Maniyam
Hi all, I have been importing a bunch of data into my hbase cluster, and I see the following error: Hbase error : hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink A.B.C.D Hadoop data node error: DataXceiver : java.io.IOException: xc

hbase map reduce tutorial ... seeking feedback

2010-04-10 Thread Sujee Maniyam
Hi All, I have a tutorial on Hbase MapReduce here : http://sujee.net/tech/articles/hbase-map-reduce-freq-counter/ It is rated PG-13 (i.e. for beginners). Uses v0.20+ mapreduce APIs. I'd appreciate any comments & feedback from this group. thanks Sujee http://sujee.net

using sysbench to evaluate disk performance for hdfs

2010-04-08 Thread Sujee Maniyam
Wondering if I can use any tool like sysbench to get a _approximate idea_ of performance of various disk setups (RAID-0, RAID-1, ext4, xfs ...etc) that would be used by HDFS. (I do understand the the real performance of HDFS/HBase depends on the final overall system and workload. ) for example t

Re: hbase performance

2010-04-02 Thread Sujee Maniyam
check your ULIMIT config also: http://wiki.apache.org/hadoop/Hbase/FAQ#A6 http://sujee.net

Re: truncate shell command hanging

2010-02-26 Thread Sujee Maniyam
replying to myself here: The exception I found was : NativeException: org.apache.hadoop.hbase.client.RegionOfflineException: region offline: impressions_users,,1267133399076 Is there a way to 'force' disable/drop a table? thanks Sujee http://sujee.net On Thu, Feb 25, 2010 at 4:04

truncate shell command hanging

2010-02-25 Thread Sujee Maniyam
Hbase version : 0.20.3, r902334 EC2 c1.xlarge, 5 machine cluster (1 + 4 I have a couple of tables with 300M rows. Truncate command hangs... hbase shell> truncate 'tablename' Truncating impressions_users; it may take a while Disabling table... < ^C at this point > ^CNativeException: java.io.IO

Re: ext3 or ext4 filesystem for Hadoop/Hbase?

2010-02-20 Thread Sujee Maniyam
On Sat, Feb 20, 2010 at 11:23 AM, Andrew Purtell wrote: > So use xfs... It's better than ext3 also. > I don't use Ubuntu on the server so can't say for sure if the support > is there. apt-get install xfsprogs and see if mkfs.xfs works? > Yes, xfs is supported in the kernel (cat /proc/filesystems

Re: ext3 or ext4 filesystem for Hadoop/Hbase?

2010-02-20 Thread Sujee Maniyam
10:24 AM, Andrew Purtell wrote: > ext4 is the clear winner over ext3. > > xfs if ext4 is not available (RHEL, CentOS, etc.) This is what our EC2 > scripts use. > > Both ext4 and xfs use extents and do lazy/group allocation. > > > > - Original Message >&g

ext3 or ext4 filesystem for Hadoop/Hbase?

2010-02-19 Thread Sujee Maniyam
wondering if there a compelling reason to go one way or another for a Hadoop/Hbase cluster on EC2 EBS volume. host OS : Ubuntu 9.04 x64 thanks Sujee http://sujee.net

EC2 cluster setup, user account question...

2010-02-19 Thread Sujee Maniyam
Hi All, this is more of a logistical question... setting up a small Hbase cluster (5-10 nodes) on EC2. Wondering if I should just setup Hadoop as ROOT user or create another user account (say, hadoop). I do understand and follow the convention that ROOT user is for admin only and not running pro

Re: how to calculate top-xxx rowkeys

2010-02-14 Thread Sujee Maniyam
ave in your table?  Keeping a count in memory has it's > obvious problems but if it's a small table then I guess it would work... > > How fast do you need to get this information?  Maybe a map reduce job would > be a better way of doing it? > > Cheers, > Dan > >

how to calculate top-xxx rowkeys

2010-02-14 Thread Sujee Maniyam
HI I have a table with rowkey is composed of userid + timestamp. I need to figure out 'top-100' users. One approach is running a scanner and keeping a hashmap of user-count in memory. Wondering if there is an hbase-trick I could use? thanks Sujee

reading a row with lots of cells (wide-table) causing out-of-memory error

2009-12-01 Thread Sujee Maniyam
row data has to completely fit into memory? 3) I will want to iterate through all the cell values, wondering what is the best way to do that? 4) if this is the limitation for 'wide tables', then I will redesign to table to use composite keys ( row = userid + timestamp) thanks so much for your help. Sujee Maniyam -- http://sujee.net

Re: HBase 0.20.1 Distributed Install Problems

2009-11-09 Thread Sujee Maniyam
using short hostnames (crunch2, crunch3), do they all resolve correctly? or you need to update /etc/hosts to resolve these to an IP address on all machines. regards Sujee Maniyam -- http://sujee.net

Re: long pauses during hbase.PerformanceEvaluation tests

2009-10-30 Thread Sujee Maniyam
EBUG? See > http://wiki.apache.org/hadoop/Hbase/FAQ#A5 > > Thx! > > J-D > > On Thu, Oct 29, 2009 at 3:23 PM, Sujee Maniyam wrote: >> http://pastebin.com/f37d75e1d >> This is what I previously sent out in the email as well. >> >> I don't have the logs a

Re: long pauses during hbase.PerformanceEvaluation tests

2009-10-29 Thread Sujee Maniyam
uld keep an eye out for? thanks Sujee On Thu, Oct 29, 2009 at 11:01 AM, Jean-Daniel Cryans wrote: > Yes, anything there? Care to paste some lines in a pastebin? > > Thx, > > J-D > > On Thu, Oct 29, 2009 at 10:19 AM, Sujee Maniyam wrote: >> Jean, >> that would be the

Re: long pauses during hbase.PerformanceEvaluation tests

2009-10-29 Thread Sujee Maniyam
Jean, that would be the logs under '@hadoop4' section - I believe that was the region server holding the server at the time thanks Sujee On Thu, Oct 29, 2009 at 9:32 AM, Jean-Daniel Cryans wrote: > 14 minutes seems way too much, anything relevant in the region server > logs around the same time?

Re: long pauses during hbase.PerformanceEvaluation tests

2009-10-29 Thread Sujee Maniyam
forgot to add that I am running hbase v0.20.1 hadoop v0.20.1 pretty much at default settings... On Thu, Oct 29, 2009 at 12:01 AM, Sujee Maniyam wrote: > Hi all > > I have been running 'bin/hbase > org.apache.hadoop.hbase.PerformanceEvaluation ' script to get an ide

long pauses during hbase.PerformanceEvaluation tests

2009-10-29 Thread Sujee Maniyam
Hi all I have been running 'bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation ' script to get an idea of the cluster. I see long PAUSES during writes. here is my setup: - 5 nodes on EC2 (m1.large) : 1 hbase master + 4 regions - hadoop & hbase get 2G heap - I don't see any JVM heap gettin

Re: hadoop timeouts on EC2

2009-10-22 Thread Sujee Maniyam
ster can connect to hadoop-master (probably redundant) then things started working :-) Sujee On Thu, Oct 22, 2009 at 7:56 PM, Sujee Maniyam wrote: > HI all, > I just setup a 5 node (1 master + 4 datanodes) on EC2. > Hadoop v0.20.1 > hbase v0.20 > > I can go to the namenode s

hadoop timeouts on EC2

2009-10-22 Thread Sujee Maniyam
HI all, I just setup a 5 node (1 master + 4 datanodes) on EC2. Hadoop v0.20.1 hbase v0.20 I can go to the namenode status page and see it has 4 live nodes. To test that HDFS is working, I did the following bin/hadoop dfs -copyFromLocal conf input-conf5 I see the following error (at the

Re: how to use hbase with eclipse?

2009-10-19 Thread Sujee Maniyam
I have the following JARs in the classapath of the the eclipse porject. hbase-0.20.0.jar hadoop-0.20.0-plus4681-core.jar commons-logging-1.0.4.jar log4j-1.2.15.jar zookeeper-r785019-hbase-1329.jar regards SM

Re: table design suggestions...

2009-09-29 Thread Sujee Maniyam
> You can either create 2 tables. One can have the user as the key and the > other can have the country as the key.. > > Or.. you can create a single table with user+country as the key. > > Third way is to have only one table with user as the key. For the country > query you can scan across the tab

table design suggestions...

2009-09-29 Thread Sujee Maniyam
HI all, I am in the process of migrating a relational table to Hbase. Current table: records user access logs id : PK userId url timestamp refer_url ip_address cc : country code of ip address my potential queries would be - grab all pages visited by a user

can Hive & HBase co-exist on the same cluster?

2009-09-17 Thread Sujee Maniyam
Hi all, I am a newbie doing some research to put-together a system to process large number of log-records. A) Hbase system with clients executing MR jobs on the data B) there may be some instances where we need to run ad-hoc queries on the data. I am trying to see if this can be done with out us