Re: storefileIndexSize

2009-09-30 Thread Billy Pearson
, entries=48, length=6375 fileinfoOffset=5981, dataIndexOffset=6229, dataIndexCount=1, metaIndexOffset=0, metaIndexCount=0, totalBytes=5981, entryCount=48, version=1 On Sat, Sep 19, 2009 at 2:03 PM, Billy Pearson wrote: My main problem starts to come in to play I would like to store a lot of data

Re: storefileIndexSize

2009-09-19 Thread Billy Pearson
My main problem starts to come in to play I would like to store a lot of data in hbase for its auto ttl / compaction features. The only operations I need for this table is insert/delete/scan no need for random access so if I could keep the index from getting to big or keep it out of the memory th

storefileIndexSize

2009-09-19 Thread Billy Pearson
been trying to reduce my storefileIndexSize default setting hfile.min.blocksize.size = 65536, storefileIndexSize=323 hfile.min.blocksize.size = 67108864, storefileIndexSize=322 I changed the hfile.min.blocksize.size three times before setting it to 64MB and always restart and ran major_compact

Re: Hbase Map-reduce Scheduler

2009-08-26 Thread Billy Pearson
I thank you are looking for this jira for reduce assignment https://issues.apache.org/jira/browse/HBASE-1199 Regions are assigned in the map to be ran on the server hosting the region but if there is an idle server it will pull the next map task. Billy "bharath vissapragada" wrote in mess

Re: HBase schema for crawling

2009-07-04 Thread Billy Pearson
I have stored the spider time in a column like stime: to keep from having to fetch the pages content in the map of the row just for the timestamp then just scan over that one column to get last spider time etc.. In my setup I did not spider from the map reduce job I build a spider list then ran

split test?

2009-06-14 Thread Billy Pearson
Do we have a split test that test to know what the mid key is vs what's I have some regions that have 1kb in size and others that are 1.3GB saying the end key and mid key are the same and they should not be based on my import data should be some what small per key maybe a few kb per key. So I t

Re: HBase Write to Regionservers behavior

2009-06-11 Thread Billy Pearson
once the table has split more you might look in to using org.apache.hadoop.hbase.mapred.HRegionPartitioner.java It will split up the data and only run one reduce per region so all that's regions rows will be sent to just one reducer but does not help much as when the table is small and you have

Re: Changing output class type

2009-06-11 Thread Billy Pearson
have you tried something like this public static class MapClass extends MapReduceBase implements TableMap,IntWritable> { } as for the output of the maps that's the output from "31 2e 66> 69 72...", that from org.apache.hadoop.hbase.io.ImmutableBytesWritable toString() method if you want to Sy

Re: Help with Map/Reduce program

2009-06-11 Thread Billy Pearson
That might be a good idea but you might be able to redesign you layout of the table using a different key then the current one worth barnstorming. Billy "llpind" wrote in message news:23975432.p...@talk.nabble.com... Sorry I forgot to mention the overflow then overflows into new row keys

Re: scanner on a given column: whole table scan or just the rows that have values

2009-06-10 Thread Billy Pearson
till have to scan EVERY row in that family no matter if each cell on that column-label has value or not? -Ric On Wed, Jun 10, 2009 at 1:03 AM, Billy Pearson wrote: > It will not scan every row if there is more then one column family only the > rows that have data for that column. > >

Re: Help with Map/Reduce program

2009-06-10 Thread Billy Pearson
Yes that's what scanners are good for they will return all the columns:lables combos for a row What does the MR job stats say for rows processed for the maps and reduces? Billy Pearson "llpind" wrote in message news:23967196.p...@talk.nabble.com... also, I think what we w

Re: for one specific row: are the values of all columns of one family stored in one physical/grid node?

2009-06-10 Thread Billy Pearson
ecause they are in two different families. Is that correct? Thanks in advance for your clarification. -Ric On Tue, Jun 9, 2009 at 8:35 PM, Billy Pearson wrote: You should read over the http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture The data is sorted by row key, then column:label, timest

Re: HBase Failing on Large Loads

2009-06-09 Thread Billy Pearson
I thank most of your problems are coming from running to many map/reduce task at the same time with so little memory and swapping and regionserver/datanodes/tasktrackers do not have time to check in to tell there masters that there alive still and stuff starts failing. I would try 2 maps 2 red

Re: scanner on a given column: whole table scan or just the rows that have values

2009-06-09 Thread Billy Pearson
It will not scan every row if there is more then one column family only the rows that have data for that column. You do have parallelism when scanning large tables the mr job should be splitting the job in to one mapper per region if coded setup correctly. New patches in dev set for 0.20 will a

Re: Help with Map/Reduce program

2009-06-09 Thread Billy Pearson
x27;m positive my table has that column family, but my output table still has nothing in it. I'm looking at the source code for rowcounter, and it doesn't even require :. does it need to be passed in? I may be going about this wrong, I'm open to ideas. I need a way to iterate

Re: for one specific row: are the values of all columns of one family stored in one physical/grid node?

2009-06-09 Thread Billy Pearson
You should read over the http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture The data is sorted by row key, then column:label, timestamp In that order so if you have row key1 all the labels for columnval1 will be stored together in the same file We do flush more the one file to disk as data is

Re: Help with Map/Reduce program

2009-06-09 Thread Billy Pearson
try with out the * in the column "colFam1:*", try "colFam1:", I do not thank the * works like a all option just leave it blank colFam1: and it will give all results Billy "llpind" wrote in message news:23952252.p...@talk.nabble.com... Hi again, I need some help with a map/reduce program

Re: Help needed - Adding HBase to architecture

2009-06-07 Thread Billy Pearson
If I was going to use a RDBMS to store the meta data then I would just use hadoop hdfs to store the images/video I know that hadoop has a thrift api now http://wiki.apache.org/hadoop/HDFS-APIs Hbase would be better suited to store the meta data in place of the images. The biggest benefit to hbas

Re: Again, HBase Data Lost!!

2009-06-07 Thread Billy Pearson
If the region servers are dying then there logs are more likely to be helpful then the master. Billy "Yabo-Arber Xu" wrote in message news:382e1efc0906072003i9ea9733h5ab51ce12a37...@mail.gmail.com... Hi J-D, Thanks for your reply. We have a 10-node cluster installed with HBase/Hadoop 0.

Re: Frequent changing rowkey - HBase insert

2009-06-06 Thread Billy Pearson
I agree seams in 0.20 that I am testing my clients are the bottleneck now not the db I went from seeing 30K on 8 nodes to around 115k when I could keep all the clients writing at the same time for a few secs. Billy "Ryan Rawson" wrote in message news:78568af10906061819g5949eae8ye0f30653540

Re: blockcache always on?

2009-06-05 Thread Billy Pearson
as it will cause problems if release this way Just trying to find all the bugs I can before we release 0.20 Billy "stack" wrote in message news:7c962aed0906051240v2c77df1ub0beccbe16059...@mail.gmail.com... On Fri, Jun 5, 2009 at 6:19 AM, Billy Pearson wrote: My question is sho

Re: blockcache always on?

2009-06-05 Thread Billy Pearson
o avoid block cache on very large blocks and otherwise let the lru do its job. On Jun 4, 2009 11:25 PM, "Billy Pearson" wrote: I created a table with one column in my cluster and started putting data in to it I noticed that even with blockcache = false it still uses the block cach

blockcache always on?

2009-06-04 Thread Billy Pearson
I created a table with one column in my cluster and started putting data in to it I noticed that even with blockcache = false it still uses the block cache Is this a know problem or is there plans to remove the option from the table create alter commands? Billy

Re: HBase v0.19.3 with Hadoop v0.19.1?

2009-06-04 Thread Billy Pearson
Hbase can be used on the same hadoop release as its version so hbase 0.19.x with hadoop 0.19.x, hbase 0.20.x with hadoop 0.20.x I thank production can be considered safe with version 1.0 we are not there yet. With that said there are many using hbase in production here are a few : http://wiki.ap

Re: Question regarding MR for Hbase

2009-06-04 Thread Billy Pearson
Take a look also at TableMapReduceUtil Its in the api docs for 0.19 and 0.20 Billy "Vijay" wrote in message news:9b40bc2a0906041247j7d25f5a4y61351200ae2bb...@mail.gmail.com... Hello Everyone, I wanted to write a mr for hbase table there is a million record and i wanted to write a Map red

failed compaction clean up?

2009-06-03 Thread Billy Pearson
I found an issues HBASE-1410 on my cluster after I upgraded and started to add data Question what is in charge of cleaning up a failed compaction the file is left and never deleted even after a restart Should I open an issue about this or does the master scan the compaction folders and clean u

Re: trunk giveing my a new error I never seen before

2009-06-03 Thread Billy Pearson
could be I was testing to make sure not > 64K did the max row size change in trunk? Billy "stack" wrote in message news:7c962aed0906030042s13ab89e6n6a4f48be2d92d...@mail.gmail.com... Is your row > 32k Billy? St.Ack On Tue, Jun 2, 2009 at 10:39 PM, Billy Pearson wrote:

trunk giveing my a new error I never seen before

2009-06-02 Thread Billy Pearson
This is my first attempt at loading data in to trunk for testing and now I get this and know can not figure out whats wrong some sata is added to trunk tasktracker error: org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server 10.0.1.6:60020 for region master-i

scaling and compactions

2009-06-02 Thread Billy Pearson
Would it be a good ideal to be thing about adding a option to run more then one compaction thread at a time maybe configurable Just thanking once we get to being able to store tb's of data per node we should be able to keep up on compactions. Billy

Re: State of HA

2009-06-01 Thread Billy Pearson
0.20.0 / trunk is the first release of hbase that will work with zookeeper so likely not going to have the HA stuff in there just getting it to work with zookeeper will likely be targeted in this release I do know there is a whole lot of rework on hbase in this release that should make big impro

Re: Urgent: HBase Data Lost

2009-05-30 Thread Billy Pearson
.Ack On Sat, May 30, 2009 at 2:40 PM, Billy Pearson wrote: I had this happen to me I created a region then ran some import jobs all the clients got stock on could not complete file for hlog and I had to kill -9 them all and reload the data the table was missing when I did that because the meta di

Re: Urgent: HBase Data Lost

2009-05-30 Thread Billy Pearson
I had this happen to me I created a region then ran some import jobs all the clients got stock on could not complete file for hlog and I had to kill -9 them all and reload the data the table was missing when I did that because the meta did not flush could we not add a manual force flush to meta

Re: basic help required

2009-05-29 Thread Billy Pearson
there is also some good links here http://wiki.apache.org/hadoop/Hbase on how to connect and do queries to hbase via other languages look about the middle Thrift REST Ruby Hbase Shell ect.. - Original Message - From: Newsgroups: gmane.comp.java.hadoop.hbase.user To: Sent: Friday,

Re: Decommission of nodes

2009-05-28 Thread Billy Pearson
.gmail.com... Hi, how does hbase react to hadoop decommissioning nodes? Should I shutdown the regionservers first, before decommissioning or does it matter at all? Is it possible that a running hbase prevents decommissioning to finish? Matthias On Wed, May 27, 2009 at 7:44 PM, Billy Pea

Re: Decommission of nodes

2009-05-27 Thread Billy Pearson
the decommissioning process in hadoop takes a little while I thank there the balancing bandwidth has a lot to do with it. but to stop region server you should be able to run bin/hbase-daemon.sh stop regionserver on the server you want to stop the hbase as long as it not the master then you can d

Re: Setting up another machine as secondary node

2009-05-19 Thread Billy Pearson
node started seperately on two machines?? > > On Fri, May 15, 2009 at 9:39 AM, jason hadoop > > >wrote: > > > I agree with billy. conf/masters is misleading as the place for secondary > > namenodes. > > > > On Thu, May 14, 2009 at 8:38 PM, Billy Pears

Re: Setting up another machine as secondary node

2009-05-14 Thread Billy Pearson
I thank the secondary namenode is set in the masters file in the conf folder misleading Billy "Rakhi Khatwani" wrote in message news:384813770905140603g4d552834gcef2db3028a00...@mail.gmail.com... Hi, I wanna set up a cluster of 5 nodes in such a way that node1 - master node2 - secondar

Re: HBase internal data structure ??

2009-05-07 Thread Billy Pearson
hbase data is only on hdfs and in memory noting is stored or processed on local disk unless its in hdfs. The current 0.19 data format is hadoop MapFile you can look it up in there api docs. 0.20 will have a new data file format not MapFile any longer I thank its called HFile in hbase trunk Bil

Re: Using Yahoo Pig to to do adhoc querying on HBase

2009-04-21 Thread Billy Pearson
there is multi ways to query hbase there's hbase shell thrift rest java api and I thank a few more. The easiest with out having to write code or anything would be hbase shell if just wanting to check manually if something is there or the value of it. Billy "Ninad Raut" wrote in message new

Re: Replicating data into HBase

2009-04-17 Thread Billy Pearson
If you data is not to complex with multi fields etc. you could try to use mysql bin logs just use mysqlbinlog http://dev.mysql.com/doc/refman/5.0/en/mysqlbinlog.html to process bin logs and generate a text version of the logs and process them with a map and then reduce in to the table. this woul

Re: Basic questions

2009-04-16 Thread Billy Pearson
hbase and hadoop both have a single node (namenode & master) that tell the clients where stuff is that's all they do is keep up with where stuff is they do not handle the getting of the data the client api will go to the node with the data they tell it to find the data and get the data. as for

Skipping Bad Records

2009-04-13 Thread Billy Pearson
I am running a job that pulls data from hbase but I getting heap errors on some of the records because there to large to fit in the heap of the task I enabled I thought so the skip option in the site conf file and I also added these options to my job conf conf.setMaxMapAttempts(10); Ski

Re: shell 'table_att'

2009-04-12 Thread Billy Pearson
m... Billy, These are currently the only attributes that you can set on the table level in the shell. J-D On Sat, Apr 11, 2009 at 5:37 PM, Billy Pearson wrote: can we get someone to post the all the correct options for 'table_att' in the shell in the wiki faq or somewhere? I know there is t

Re: WARN org.apache.hadoop.hdfs.DFSClient: NotReplicatedYetException sleeping

2009-04-11 Thread Billy Pearson
t; is a HDFS problem. Maybe I am not understanding what you are saying? So you have not increased the number of xceivers in the datanode configs? Are there any messages of interest in the datanode logs? - Andy From: Billy Pearson Subject: Re: WARN org.apache.hadoop.hdfs.DFSClient: NotReplicatedY

Re: WARN org.apache.hadoop.hdfs.DFSClient: NotReplicatedYetException sleeping

2009-04-11 Thread Billy Pearson
you'd see this on the HLogs first. HDFS blocks are allocated most frequently for them, except during compaction. Seems like a classic sign of DFS stress to me. What are your configuration details in terms of max open files, maximum xceiver limit, and datanode handlers? - Andy From

WARN org.apache.hadoop.hdfs.DFSClient: NotReplicatedYetException sleeping

2009-04-11 Thread Billy Pearson
I getting a bunch of WARNS WARN org.apache.hadoop.hdfs.DFSClient: NotReplicatedYetException sleeping This is only happening on the hlogs on the servers while under heave import 30K/sec on 7 server I tried to bump the hlog size between rolls to 100K in stead of 10K thing that would help but the

Re: shell 'table_att'

2009-04-11 Thread Billy Pearson
dld780741f65087...@mail.gmail.com... Billy, These are currently the only attributes that you can set on the table level in the shell. J-D On Sat, Apr 11, 2009 at 5:37 PM, Billy Pearson wrote: can we get someone to post the all the correct options for 'table_att' in the shell in the wiki

shell 'table_att'

2009-04-11 Thread Billy Pearson
can we get someone to post the all the correct options for 'table_att' in the shell in the wiki faq or somewhere? I know there is these below but I thank their is a major compaction setting also but can not find anywhere all the table level options are listed. alter 't1', {METHOD => 'table_att',

Re: Bulk import - does sort order of input data affect success rate?

2009-04-05 Thread Billy Pearson
I found using HRegionPartitioner on tables that are not new and have multi regions per server it speeds things up might look in to making a HServerPartitioner one reduce per server but would lose performance if the server has many spare cores to use. Billy - Original Message - From

Re: Curious about using HBase

2009-03-31 Thread Billy Pearson
T have? Using a cache from above you could pull a players achievement's from the players table and remove the ones he has and use what's left as Jonathan Gray pointed to With the above suggestions they should help to scale with less resources. Billy Pearson "Ma

Re: no memory tables

2009-03-27 Thread Billy Pearson
about 5.5 million rows - 3.7x compression - default block size (pre-compression) of 64kBytes - in-memory block index size: 770kBytes. One problem with 0.19 is the size of in-memory indexes... With hfile in 0.20 we will have many less problems. On Thu, Mar 26, 2009 at 11:20 PM, Billy Pearson wro

no memory tables

2009-03-26 Thread Billy Pearson
size of the data except the hadoop storage size. Anyone else thank they could use something like this also? Billy Pearson

Re: HDFS unbalance issue. (HBase over HDFS)

2009-03-25 Thread Billy Pearson
If you load your data on HDFS from node1 then it will always get more blocks as hadoop saves one copy to local datanode then copies to others to meet replication setting. Same thing should be happening on hbase where ever the region is open and a compaction happens the data should be written l

Re: HBase new user

2009-03-23 Thread Billy Pearson
there is also a web interface here http://MASTER_HOSTNAME:60010/ That will give you some stats but there is no GUI just a API Billy "Ryan Rawson" wrote in message news:78568af10903221449p7431800dwfd6f75f06d8b6...@mail.gmail.com... Hey, Here are some answers: 1. The directory setting is:

Re: Some REST GET questions

2009-03-23 Thread Billy Pearson
also note you might look in to using thrift I thank it took over a lot user from rest. The support for keeping rest up todate and tested may not be there any more. and the php class is from a long time ago 9/2008 there has been lots of changes in hbase sense then. Billy "Chris Hostetter"

Re: Some REST GET questions

2009-03-23 Thread Billy Pearson
I did a php class https://issues.apache.org/jira/browse/HBASE-37 It will give you some clues on some stuff about api if you are using php or can read php then it should help. Billy "Chris Hostetter" wrote in message news:pine.lnx.4.64.0903231341200.22...@radix.cryptio.net... I've got mys

Re: MR Job question

2009-03-06 Thread Billy Pearson
And if you go with the time stamp there is an option issue to deal with this problem HBASE-1170 If you have a set time you want to keep the data then there is always the ttl option on the tables columns. Billy "stack" wrote in message news:7c962aed0903032253n50753c66q57a0c8c4fef2d...@mail

Re: MapReduce job to update HBase table in-place

2009-02-26 Thread Billy Pearson
as long as the column you are updating is the same as you are reading then just using a map will work if the data read will update a different column I would use a reduce step to do all the reading first then write the updates. Billy "Stuart White" wrote in message news:4af5cd780902250908q4

Re: Backup again

2009-02-12 Thread Billy Pearson
because we append edits to the edit logs by default of 300 edits per append. It would not be wise to copy the files with out shutting down the cluster. multi reasons not to do this thank memcache flushes/compactions/updates all working on the data files and directories at the same time. Bu

Re: hbase.client.scanner.caching

2009-02-03 Thread Billy Pearson
hbase.client.scanner.caching may not be the reasion the request are under reported. I set hbase.client.scanner.caching = 1 and still get about 2K request a sec in the gui but when the job is done I take records / job time and get 36,324/ records /sec. So there must be some cacheing out side of t

hbase.client.scanner.caching

2009-02-03 Thread Billy Pearson
Quick question I am seeing lower number of request in the gui then I have seen in 0.18.0 while scanning. I thank part of it is we moved to report per sec request not per 3 secs so the request should be 1/3 of the old numbers I was getting. Does the hbase.client.scanner.caching make the request

Re: Hbase cluster configuration

2009-02-03 Thread Billy Pearson
also recommend upgrading to 0.19.0 hadoop/hbase if you can upgrade. Billy "Andrew Purtell" wrote in message news:519047.50909...@web65513.mail.ac4.yahoo.com... Hi Michael, I have found that trial and error is necessary now. There are no clear formulas. How large the system can scale depend

Re: Hbase 0.19 failed to start: exceeds the limit of concurrent xcievers 3000

2009-01-28 Thread Billy Pearson
Is there updates happening in your MR job? If so the slowness might be cause from memcache flushing and compaction with that many regions on so few servers compaction would take a while to run on all the regions and If its time for a major compaction then you are looking at a lot of cpu/disk/net

Re: Region server memory requirements

2008-12-22 Thread Billy Pearson
hey guys there is a var in hadoop that can help with out having to change the index int its io.map.index.skip this can be changed to lower memory usage without having to wait until the map files are compacted again and you can change as needed. "stack" wrote in message news:494c6e27.7030...@d

thrift mutateRows example

2008-12-15 Thread Billy Pearson
does anyone know the format to build a mutations array in php for the thrift function mutateRows I know how to do mutateRow but have not seen an example in php on how the arrays should look like for mutateRows Billy

Re: merging into MapFile

2008-12-09 Thread Billy Pearson
Hadoop is where this should be posted sense nothing to do with hbase but to answer your question Hadoop 0.19.0 is a write once read many FileSystem with append support only no seek write option as of now. Billy "yoav.morag" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] hi all

Re: HBASE_OPT 's?

2008-12-04 Thread Billy Pearson
let us know if you see any improvements Billy "Tim Sell" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] please ignore this I'm an idiot.. it's HBASE_OPTS not HBASE_OPT sigh 2008/12/4 Tim Sell <[EMAIL PROTECTED]> Due to recent memory issues, I was thinking of trying to run the

Re: A time-stamp

2008-11-12 Thread Billy Pearson
Not sure if that's possible but versions is not timestamp version is how many version of the row/column: you want to keep they are stored by timestamp but if you just want to keep the last inserted version the you want version=1 Billy "Edward J. Yoon" <[EMAIL PROTECTED]> wrote in message new

Re: java.io.IOException: java.util.NoSuchElementException

2008-11-11 Thread Billy Pearson
rote in message news:[EMAIL PROTECTED] Billy Pearson wrote: could it be from the global memcache limit I set my hbase.hregion.memcache.flush.size = hbase.regionserver.globalMemcacheLimit So that memcache flushes are only as needed. That would probably explain it. The global memcache limi

Re: java.io.IOException: java.util.NoSuchElementException

2008-11-11 Thread Billy Pearson
esses the immediate silly error of trying to get a first element from a Set that has none, but do you have an idea why there'd be memory pressure in the hbase heap though no seemingly no regions online? Thanks, St.Ack Billy Pearson wrote: The first post was from the reducer This is from

Re: java.io.IOException: java.util.NoSuchElementException

2008-11-10 Thread Billy Pearson
) "Billy Pearson" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] I started getting these when the server are under heavy load java.io.IOException: java.io.IOException: java.util.NoSuchElementException at java.util.TreeMap.key(TreeMap.java:1206) at java.util.TreeMap.first

java.io.IOException: java.util.NoSuchElementException

2008-11-10 Thread Billy Pearson
I started getting these when the server are under heavy load java.io.IOException: java.io.IOException: java.util.NoSuchElementException at java.util.TreeMap.key(TreeMap.java:1206) at java.util.TreeMap.firstKey(TreeMap.java:267) at org.apache.hadoop.hbase.regionserver.MemcacheFlusher.flushSomeReg

Re: low performance on hadoop

2008-11-10 Thread Billy Pearson
I just added a patch to HBASE-987 https://issues.apache.org/jira/browse/HBASE-987 it has a Partitioner in to to group the records in to a reducer per region you can set it as the partitioner in the job I am not sure if it will work for 0.17 But you can give it a try the file in the patch you a

Re: how many map in a map task?

2008-11-10 Thread Billy Pearson
rst 99 map, only output exists in the last map. So I should know how many map in a map task. On Mon, Nov 10, 2008 at 4:33 PM, Billy Pearson <[EMAIL PROTECTED]> wrote: If you are using TableMap to read the data back each mapper will map over all the rows in a single region. You can use

Re: How to start a Configuration (Java API) with a specific file?

2008-11-10 Thread Billy Pearson
on MR jobs I do this conf = new HBaseConfiguration(); conf.set("hbase.master","123.123.123.123:6000"); but I do this in a JobConf I thank the HBaseConfiguration() site-hbase.xml file for you Billy "Yossi Ittach" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] Hi all I want to spe

Re: how many map in a map task?

2008-11-10 Thread Billy Pearson
If you are using TableMap to read the data back each mapper will map over all the rows in a single region. You can use counters to count the records so we can get a total records. but there is not set number of maps a mapper will process it depends on the input. Billy "ma qiang" <[EMAIL P

Re: Low-cost and fast HTable.exists(...)?

2008-11-09 Thread Billy Pearson
We have an open issue for stuff similar to this Atomic increment operations https://issues.apache.org/jira/browse/HBASE-803 either that or use something like memcache or scan the table. Billy "Lars George" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] Hi, I was wondering if the

hbase Partitioner for MR Jobs

2008-11-08 Thread Billy Pearson
Does anyone out there have any experience writing a hadoop partitioner we need on for hbase to split the records from Map outputs So that all records will for a region will fall in one partition would need something that is fast as each output would have to be ran by it Then if we setup our new

Re: Unable to create RegionHistorian with NoServerForRegionException

2008-11-08 Thread Billy Pearson
illy, Currently trunk is broken by those. I think that Jim is on the issue. J-D On Sat, Nov 8, 2008 at 8:29 PM, Billy Pearson <[EMAIL PROTECTED]>wrote: working from trunk downloaded today I get a bunch of these on start up and from shell after Root Region is reported available even on s

Unable to create RegionHistorian with NoServerForRegionException

2008-11-08 Thread Billy Pearson
working from trunk downloaded today I get a bunch of these on start up and from shell after Root Region is reported available even on same server as the root region start up head from a RS: 2008-11-08 19:25:15,565 INFO org.apache.hadoop.ipc.Server: IPC Server handler 96 on 60020: starting 2008-1

Re: Map File index bug?

2008-11-06 Thread Billy Pearson
e same in memory whether it was compressed in the filesystem, or not? Or am I missing something Billy? St.Ack On Thu, Nov 6, 2008 at 7:55 AM, Billy Pearson <[EMAIL PROTECTED]>wrote: There is no method to change the compression of the index its just always block compressed. I hacked the code

Re: Map File index bug?

2008-11-06 Thread Billy Pearson
files. "stack" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] On Wed, Nov 5, 2008 at 11:52 PM, Billy Pearson <[EMAIL PROTECTED]>wrote: I ran a job on 80 mapfile to write 80 new file with non compressed indexes and still took ~4X the memory of the sizes of

Re: Map File index bug?

2008-11-05 Thread Billy Pearson
help any. Billy "Billy Pearson" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] I been looking over the MapFile class on hadoop for memory problems and thank I might have found an index bug org.apache.hadoop.io.MapFile line 202 if (size % indexInterval == 0) {

Map File index bug?

2008-11-04 Thread Billy Pearson
I been looking over the MapFile class on hadoop for memory problems and thank I might have found an index bug org.apache.hadoop.io.MapFile line 202 if (size % indexInterval == 0) {// add an index entry this is where its writing the index and skipping every indexInterval rows then o

Re: how many rows per GB memory

2008-10-27 Thread Billy Pearson
earch, Microsoft Corporation) -Original Message- From: news [mailto:[EMAIL PROTECTED] On Behalf Of Billy Pearson Sent: Monday, October 27, 2008 12:07 PM To: hbase-user@hadoop.apache.org Subject: Re: how many rows per GB memory Is the 64MB changeable or is that a hard limit in the code? Billy

Re: how many rows per GB memory

2008-10-27 Thread Billy Pearson
(number of rows) which is limited to 64MB per memcache. The heap size required is determined by: (number of regions being hosted) * (number of families) * 64MB --- Jim Kellerman, Powerset (Live Search, Microsoft Corporation) -Original Message- From: news [mailto:[EMAIL PROTECTED] On

Re: how many rows per GB memory

2008-10-26 Thread Billy Pearson
sorry late here but looking to see if there is a way to figure how much memory each row uses of the heap Billy "Billy Pearson" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] Is there some numbers I can figure on per GB of heap that I can get in to a reg

how many rows per GB memory

2008-10-26 Thread Billy Pearson
Is there some numbers I can figure on per GB of heap that I can get in to a regionserver? say something like this: (x bytes avg per rowkey * max rows) / index interval = y GB Heap Billy

Re: Improving locality of table access...

2008-10-22 Thread Billy Pearson
generate a patch and post it here https://issues.apache.org/jira/browse/HBASE-675 Billy "Jim Kellerman (POWERSET)" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] In the future, you should send HBase questions to the HBase user mailing list: hbase-user@hadoop.apache.org if you wa

Re: HBase (0.18) RegionServer : "unable to report to master for X ms - aborting server"

2008-10-22 Thread Billy Pearson
I thank your problem has an open issue here https://issues.apache.org/jira/browse/HBASE-616 Billy "Yossi Ittach" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] Hi all I'm using HBase 0.18 and Hadoop 0.18.1 and I'm running some benchmarks. Every now and then , a RegionServer pri

Re: TableOutputFormat

2008-10-17 Thread Billy Pearson
Appears to be a problem on the reduce task reporting in higher then 100% done I see task reporting as high as 377% done when viewing running task. Billy "Billy Pearson" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] I am running jobs and I am noticing that t

Re: set split size per table

2008-10-14 Thread Billy Pearson
Yes we have that work around to use for now also we have an open issue to hand getting it fixed so shell, thrift, and rest can change these setting. https://issues.apache.org/jira/browse/HBASE-800 Billy "Dru Jensen" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] nm. Found the wo

TableOutputFormat

2008-10-08 Thread Billy Pearson
I am running jobs and I am noticing that the % done on the reduce jobs is hitting 100% way before the job is done writing to the table I run a map and reduce and use setOutputFormat(TableOutputFormat.class) to write the reducer's BatchUpdates to the table I need but on say 10M rows to update th

Re: I made a ppt to introduce the bigtable and hbase.

2008-09-30 Thread Billy Pearson
Looks like stuff from bigtable not hbase! Just a few points page 3 Architecture we do not have chubby or a CMS server we have Jobtracker and zookeeper coming soon page 4 data model our rows/columns/data are not strings we use byte arrays now. The above stuff about chubby stuff is repeated in

Re: table structure best practice

2008-09-28 Thread Billy Pearson
If there all related to the app I would stick them all in once table just use different columns for each type of doc Billy "Avitzur, Aharon" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] Hi, I wonder if there are some guidelines/best practices how to structure the data in Hbas

Re: [ANN] hbase 0.18.0 available

2008-09-21 Thread Billy Pearson
There are two ways I know of right of hand Thrift and Rest View the api docs on usage Billy "Guo Leitao" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] Many thanks! I'd like to know whether there is plan for Hbase to provide c/c++ programming interface? Is there any way to access

Re: [VOTE] HBase 0.18.0 Release Candidate 1

2008-09-18 Thread Billy Pearson
+1 "Jim Kellerman" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] HBase 0.18.0 candidate 1 is now available at http://people.apache.org/~jimk/hbase-0.18.0-candidate-1 Please download it, beat it up, and vote +1 or -1 by Friday September 19 --- Jim Kellerman, Powerset (Live Searc

Re: Starting HBase from an old snapshot

2008-09-18 Thread Billy Pearson
Was the old directory made form the same version of hbase as the current one you are trying to use? Billy "Иван" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] The situation is quite simple: I'm just trying to launch an HBase instance from a hbase directory in HDFS which was rema

Re: BatchUpdate

2008-09-18 Thread Billy Pearson
I thank what you are looking for is here HBASE-493 https://issues.apache.org/jira/browse/HBASE-493 Billy Pearson "Slava Gorelik" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] Hi.Thank You for a quick response. About question 3, i want to clarify my self: For e

Re: BatchUpdate and BatchOperation

2008-09-13 Thread Billy Pearson
t implemented yet, it is scheduled for 0.19.0. Please leave some comments in the jira regarding which design you prefer. Thx, J-D On Fri, Sep 12, 2008 at 1:37 AM, Billy Pearson <[EMAIL PROTECTED]>wrote: Thanks looks like HBASE-882 solves my problem in trunk I am using 0.2.1 right now so I

  1   2   >