HI all,
We have a small cluster on EC2. It has been good so far.
- c1.xlarge instance costs about $500 / month. (reserved instances
cost about $150 / month for 1 yr term)
- EBS volume IO throughput is just ok, and I have seen fluctuations.
I'd like to see if there are any other providers tha
Nice presentation Andy!
Sean,
I am experimenting with a small cluster on EC2 right now. Here is my
experience.
1) it is a 5 node cluster (1 master + 4 slaves). All c1.xlarge instances.
2) I initially tried m1.large, but ran into some stability issues. So
moved to c1.xlarge. Cluster is more s
scenario:
- I am writing data into Hbase
- I am also kicking off a MR job that READS from the same table
When the MR job starts, data-inserts pretty much halt, as if the table
is 'locked out'.
Is this behavior to be expected?
my pseudo write code :
HBaseConfiguration hbaseConfig = new HBaseConf
t; How many regions do you have and how many families per region? Looks
> like your datanodes have to keep a lot of xcievers opened.
>
> J-D
>
> On Tue, Apr 13, 2010 at 9:03 PM, Sujee Maniyam wrote:
>> Thanks Stack.
>> Do I also need to tweak timeouts? right now they ar
13, 2010 at 11:37 AM, Sujee Maniyam wrote:
>> Hi all,
>>
>> I have been importing a bunch of data into my hbase cluster, and I see
>> the following error:
>>
>> Hbase error :
>> hdfs.DFSClient: Exception in createBlockOutputStream
>> java.io.IOExcep
Hi all,
I have been importing a bunch of data into my hbase cluster, and I see
the following error:
Hbase error :
hdfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink A.B.C.D
Hadoop data node error:
DataXceiver : java.io.IOException: xc
Hi All,
I have a tutorial on Hbase MapReduce here :
http://sujee.net/tech/articles/hbase-map-reduce-freq-counter/
It is rated PG-13 (i.e. for beginners). Uses v0.20+ mapreduce APIs.
I'd appreciate any comments & feedback from this group.
thanks
Sujee
http://sujee.net
Wondering if I can use any tool like sysbench to get a _approximate
idea_ of performance of various disk setups (RAID-0, RAID-1, ext4,
xfs ...etc) that would be used by HDFS.
(I do understand the the real performance of HDFS/HBase depends on the
final overall system and workload. )
for example t
check your ULIMIT config also:
http://wiki.apache.org/hadoop/Hbase/FAQ#A6
http://sujee.net
replying to myself here:
The exception I found was :
NativeException:
org.apache.hadoop.hbase.client.RegionOfflineException: region offline:
impressions_users,,1267133399076
Is there a way to 'force' disable/drop a table?
thanks
Sujee
http://sujee.net
On Thu, Feb 25, 2010 at 4:04
Hbase version : 0.20.3, r902334
EC2 c1.xlarge, 5 machine cluster (1 + 4
I have a couple of tables with 300M rows.
Truncate command hangs...
hbase shell> truncate 'tablename'
Truncating impressions_users; it may take a while
Disabling table...
< ^C at this point >
^CNativeException: java.io.IO
On Sat, Feb 20, 2010 at 11:23 AM, Andrew Purtell wrote:
> So use xfs... It's better than ext3 also.
> I don't use Ubuntu on the server so can't say for sure if the support
> is there. apt-get install xfsprogs and see if mkfs.xfs works?
>
Yes, xfs is supported in the kernel (cat /proc/filesystems
10:24 AM, Andrew Purtell wrote:
> ext4 is the clear winner over ext3.
>
> xfs if ext4 is not available (RHEL, CentOS, etc.) This is what our EC2
> scripts use.
>
> Both ext4 and xfs use extents and do lazy/group allocation.
>
>
>
> - Original Message
>&g
wondering if there a compelling reason to go one way or another for a
Hadoop/Hbase cluster on EC2 EBS volume.
host OS : Ubuntu 9.04 x64
thanks
Sujee
http://sujee.net
Hi All,
this is more of a logistical question...
setting up a small Hbase cluster (5-10 nodes) on EC2. Wondering if I
should just setup Hadoop as ROOT user or create another user account
(say, hadoop).
I do understand and follow the convention that ROOT user is for admin
only and not running pro
ave in your table? Keeping a count in memory has it's
> obvious problems but if it's a small table then I guess it would work...
>
> How fast do you need to get this information? Maybe a map reduce job would
> be a better way of doing it?
>
> Cheers,
> Dan
>
>
HI
I have a table with rowkey is composed of userid + timestamp. I need
to figure out 'top-100' users.
One approach is running a scanner and keeping a hashmap of user-count in memory.
Wondering if there is an hbase-trick I could use?
thanks
Sujee
row data has to completely fit into memory?
3) I will want to iterate through all the cell values, wondering what
is the best way to do that?
4) if this is the limitation for 'wide tables', then I will redesign
to table to use composite keys ( row = userid + timestamp)
thanks so much for your help.
Sujee Maniyam
--
http://sujee.net
using short hostnames (crunch2, crunch3), do they
all resolve correctly? or you need to update /etc/hosts to resolve
these to an IP address on all machines.
regards
Sujee Maniyam
--
http://sujee.net
EBUG? See
> http://wiki.apache.org/hadoop/Hbase/FAQ#A5
>
> Thx!
>
> J-D
>
> On Thu, Oct 29, 2009 at 3:23 PM, Sujee Maniyam wrote:
>> http://pastebin.com/f37d75e1d
>> This is what I previously sent out in the email as well.
>>
>> I don't have the logs a
uld keep an eye out for?
thanks
Sujee
On Thu, Oct 29, 2009 at 11:01 AM, Jean-Daniel Cryans
wrote:
> Yes, anything there? Care to paste some lines in a pastebin?
>
> Thx,
>
> J-D
>
> On Thu, Oct 29, 2009 at 10:19 AM, Sujee Maniyam wrote:
>> Jean,
>> that would be the
Jean,
that would be the logs under '@hadoop4' section - I believe that was
the region server holding the server at the time
thanks
Sujee
On Thu, Oct 29, 2009 at 9:32 AM, Jean-Daniel Cryans wrote:
> 14 minutes seems way too much, anything relevant in the region server
> logs around the same time?
forgot to add that I am running
hbase v0.20.1
hadoop v0.20.1
pretty much at default settings...
On Thu, Oct 29, 2009 at 12:01 AM, Sujee Maniyam wrote:
> Hi all
>
> I have been running 'bin/hbase
> org.apache.hadoop.hbase.PerformanceEvaluation ' script to get an ide
Hi all
I have been running 'bin/hbase
org.apache.hadoop.hbase.PerformanceEvaluation ' script to get an idea
of the cluster. I see long PAUSES during writes.
here is my setup:
- 5 nodes on EC2 (m1.large) : 1 hbase master + 4 regions
- hadoop & hbase get 2G heap
- I don't see any JVM heap gettin
ster can connect to hadoop-master (probably redundant)
then things started working
:-)
Sujee
On Thu, Oct 22, 2009 at 7:56 PM, Sujee Maniyam wrote:
> HI all,
> I just setup a 5 node (1 master + 4 datanodes) on EC2.
> Hadoop v0.20.1
> hbase v0.20
>
> I can go to the namenode s
HI all,
I just setup a 5 node (1 master + 4 datanodes) on EC2.
Hadoop v0.20.1
hbase v0.20
I can go to the namenode status page and see it has 4 live nodes. To
test that HDFS is working, I did the following
bin/hadoop dfs -copyFromLocal conf input-conf5
I see the following error (at the
I have the following JARs in the classapath of the the eclipse porject.
hbase-0.20.0.jar
hadoop-0.20.0-plus4681-core.jar
commons-logging-1.0.4.jar
log4j-1.2.15.jar
zookeeper-r785019-hbase-1329.jar
regards
SM
> You can either create 2 tables. One can have the user as the key and the
> other can have the country as the key..
>
> Or.. you can create a single table with user+country as the key.
>
> Third way is to have only one table with user as the key. For the country
> query you can scan across the tab
HI all,
I am in the process of migrating a relational table to Hbase.
Current table: records user access logs
id : PK
userId
url
timestamp
refer_url
ip_address
cc : country code of ip address
my potential queries would be
- grab all pages visited by a user
Hi all,
I am a newbie doing some research to put-together a system to process
large number of log-records.
A) Hbase system with clients executing MR jobs on the data
B) there may be some instances where we need to run ad-hoc queries on
the data. I am trying to see if this can be done with out us
30 matches
Mail list logo