Re: Please welcome new HBase committer Jing Chen (Jerry) He

2015-04-02 Thread Andrey Stepachev
Congrats, Jerry! On Thu, Apr 2, 2015 at 8:34 AM, abhishek kr wrote: > Congrats, Jerry. > > Regards, > Abhishek > > -Original Message- > From: Ted Yu [mailto:yuzhih...@gmail.com] > Sent: 02 April 2015 01:56 > To: user@hbase.apache.org > Cc: d...@hbase.apache.org > Subject: Re: Please welc

Re: Please welcome new HBase committer Srikanth Srungarapu

2015-04-02 Thread Andrey Stepachev
Congratulations, Srikanth! On Thu, Apr 2, 2015 at 7:57 AM, abhishek kr wrote: > Congrats, Srikanth. > > Regards, > Abhishek > > -Original Message- > From: Andrew Purtell [mailto:apurt...@apache.org] > Sent: 02 April 2015 01:53 > To: d...@hbase.apache.org; user@hbase.apache.org > Subject:

Re: significant scan performance difference between Thrift(c++) and Java: 4X slower

2015-03-09 Thread Andrey Stepachev
0m17.624s, sys 0m4.779s > > To be fair, I am able to setCaching on Java Client, but didn't find a way > to do the same through the C++ API, which also make some difference > > Demai > > > On Sun, Mar 8, 2015 at 1:40 PM, Andrey Stepachev wrote: > > >

Re: significant scan performance difference between Thrift(c++) and Java: 4X slower

2015-03-08 Thread Andrey Stepachev
Hi Demai. Thats seems odd for me, in my tests I got very similar performance. I'd like to suggest to check that scans have identical parameters (cache size in particular). That can bring very different performance in you case. Thanks. On Sun, Mar 8, 2015 at 6:50 PM, Mike Axiak wrote: > If you'

Re: [ANNOUNCE] Apache HBase 1.0.0 is now available for download

2015-02-24 Thread Andrey Stepachev
Congrats, almost a decade passed at once! And finally this happens. Awesome work! On Tue, Feb 24, 2015 at 7:26 PM, Shahab Yunus wrote: > Congrats an thanks to everyone involved. A big milestone! HBase *1.0* > > Regards > Shahab > > On Tue, Feb 24, 2015 at 2:24 PM, anil gupta wrote: > > > Kudos

Re: Streaming data to htable

2015-02-13 Thread Andrey Stepachev
this > > something you've done in the past? Are there any sample codes we could > use > > as guide? On another side, what would you consider "big enough" to switch > > from regular Puts to HFiles-writing? > > > > Thanks! > > > > On Fri Feb

Re: Streaming data to htable

2015-02-13 Thread Andrey Stepachev
Hi hongbin, It seems that depend on how many data you ingest. In case of big enough I'd look at creating HFiles directly without mapreduce (for example using HFileOutputFormat without mapreduce or using HFileWriter directly). Created files can be imported by LoadIncrementalHFiles#doBulkLoad direct

Re: Loading hbase from parquet files

2014-10-08 Thread Andrey Stepachev
ort. > > > > > > > On Wed, Oct 8, 2014 at 10:20 AM, Andrey Stepachev > wrote: > > > Hi Nishanth. > > > > Not clear what exactly you are building. > > Can you share more detailed description of what you are building, how > > parquet files are su

Re: Loading hbase from parquet files

2014-10-08 Thread Andrey Stepachev
Hi Nishanth. Not clear what exactly you are building. Can you share more detailed description of what you are building, how parquet files are supposed to be ingested. Some questions arise: 1. is that online import or bulk load 2. why rules need to be deployed to cluster. Do you suppose to do readi

Re: Filtering out metrics emitted to ganglia?

2014-10-06 Thread Andrey Stepachev
Hello Manish. You can try to filter them out in hadoop-metrics2-hbase.properties with lines like this: hbase.sink.ganglia.source.filter.exclude=*Regions For that worked. (cdh 5.0+ was tested) On Mon, Oct 6, 2014 at 3:19 AM, Manish wrote: > Hello All, > We are plotting metrics using ganglia31 c

Re: HBase MOB performance

2014-09-04 Thread Andrey Stepachev
e handler monitor? in other words, how to measure the busy > degree of handler? jstack or else? > > thanks > > > 2014-09-04 20:27 GMT+08:00 Andrey Stepachev : > > > Hi Mike. > > > > Need to know how many handler you have and how many clients. > > Can it hap

Re: HBase MOB performance

2014-09-04 Thread Andrey Stepachev
Hi Mike. Need to know how many handler you have and how many clients. Can it happen, that you have all you handlers busy with writes? On Wed, Sep 3, 2014 at 11:30 PM, Bi,hongyu—mike wrote: > btw, i disable the block cache for the hyperloglog table to avoid the cache > pollution > > > 2014-09-0

Re: HBase java.util.concurrent.RejectedExecutionException

2014-09-03 Thread Andrey Stepachev
Hello Serega. Looks like closed HTable.pool. Thats could happen if HTable is closed while operation in progress (put in your case). Check that HTable is not closed too early or concurrently with put operation. On Wed, Sep 3, 2014 at 12:15 AM, Serega Sheypak wrote: > Hi, I'm using HBase CDH 4.7

Re: Getting "Table Namespace Manager not ready yet" while creating table in hbase

2014-08-27 Thread Andrey Stepachev
Hi, Praveen. Can you share more details on HBase version? You can look at hbase:namespace table to find out is it enabled and deployed somewhere. (typically you can find this on master http://master:60010/table.jsp?name=hbase:namespace Andrey. On Wed, Aug 27, 2014 at 10:56 AM, Praveen G wrote:

Re: Hbase InputFormat for multi-row + column range, how to do it?

2014-08-20 Thread Andrey Stepachev
Hi Jianshi. You can create your own. Just inherit from TableInputFormatBase or TableInputFormat and add ColumnRangeFilter to scan (either construct your own, or intercept setScan method). Hope this helps. -- Andrey. On Wed, Aug 20, 2014 at 1:35 PM, Jianshi Huang wrote: > Hi, > > I know Table

Re: Hbase region count and RS count for 2TB+

2014-08-15 Thread Andrey Stepachev
Here can be another problem - timestamp based keys or too many deletion. Compaction removes most and regions become nearly empty. lars hofhansl 15 Aug 2014 22:12 HBase initially tries to spread the load out to more region servers by splitting regions early when there a

Re: Storing extremely large size file

2012-04-18 Thread Andrey Stepachev
I think, that HBase should have something like http://www.mongodb.org/display/DOCS/GridFS+Specification. On Wed, Apr 18, 2012 at 10:55 AM, kim young ill wrote: > +1 documentation please > > On Wed, Apr 18, 2012 at 7:21 AM, anil gupta wrote: > >> +1 for documentation. It will help a lot of people

Re: Disk Seeks and Column families

2012-01-23 Thread Andrey Stepachev
2012/1/24 Andrey Stepachev : > 2012/1/24 Praveen Sripati : > > a) As in 1), add something to key. For example each 5 minutes. Later your > can issue 16 queries and merge them (for realtime) eah... 3 minutes :) -- Andrey.

Re: Disk Seeks and Column families

2012-01-23 Thread Andrey Stepachev
2012/1/24 Praveen Sripati : > Thanks for the response. I am just getting started with HBase. And before > getting into the code/api level details, I am trying to understand the > problem area HBase is trying to address through it's architecture/design. > > 1) So, what are the recommendations for ha

Re: Disk Seeks and Column families

2012-01-21 Thread Andrey Stepachev
r some attribute. Such situation not scales well, and not handled well by hbase. (due of splits, which are performed on rows boundary). > > > > > On 1/21/12 8:52 AM, "Doug Meil" wrote: > >> >>Also, for #2 Hbase supports large-scale aggregation through MapRedu

Re: Disk Seeks and Column families

2012-01-21 Thread Andrey Stepachev
2012/1/21 Praveen Sripati : > Hi, > > 1) According to the this url (1), HBase performs well for two or three > column families. Why is it so? Frist, each column family stored in separate location, so, as stated in '6.2.1. Cardinality of ColumnFamilies', such schema design can lead to many small pi

Re: ceph and hbase.

2011-12-28 Thread Andrey Stepachev
28.12.2011 19:52 пользователь "Mikael Sitruk" написал: > > On Dec 24, 2011 10:40 PM, "Andrey Stepachev" wrote: > > > > 23 декабря 2011 г. 22:48 пользователь Todd Lipcon >написал: > > > > > On Tue, Dec 20, 2011 at 12:56 AM, Andrey Stepachev

Re: hdfs-1623 Was: ceph and hbase.

2011-12-27 Thread Andrey Stepachev
Thanks Todd. I'll check svn. 27 декабря 2011 г. 5:00 пользователь Todd Lipcon написал: > That's my github but I don't regularly push there -- so it's a bit out of > date. Working from the apache svn is probably the best bet > > Todd > > On Sunday, Dece

Re: hdfs-1623 Was: ceph and hbase.

2011-12-24 Thread Andrey Stepachev
Thanks! 25 декабря 2011 г. 3:13 пользователь Ted Yu написал: > Andrey: > A quick search led me to https://github.com/toddlipcon where you would be > able to find: > https://github.com/toddlipcon/hadoop-common > > Cheers > > On Sat, Dec 24, 2011 at 12:39 PM, Andrey Step

Re: ceph and hbase.

2011-12-24 Thread Andrey Stepachev
23 декабря 2011 г. 22:48 пользователь Todd Lipcon написал: > On Tue, Dec 20, 2011 at 12:56 AM, Andrey Stepachev > wrote: > > I see, that most issues addressed to HA branch. > > What version this branch is applied to? 0.22 or > > i should build hadoop from sources?

Re: ceph and hbase.

2011-12-20 Thread Andrey Stepachev
tainly has > promise and is a nice design, but it hasn't had the years of pounding > that HDFS has. > > -Todd > > On Mon, Dec 19, 2011 at 10:23 PM, Andrey Stepachev > wrote: > > Hi all. > > > > I have requirements to use hbase in several datacenter

ceph and hbase.

2011-12-19 Thread Andrey Stepachev
Hi all. I have requirements to use hbase in several datacenters. But HDFS has as SPOF, so we can't use it. I plan to use ceph as the file system for hbase. In general, interested in issues of: a) use the hadoop + сeph in the production environment b) using hbase + ceph in the production environme

Re: HBase performance troubleshooting

2011-09-07 Thread Andrey Stepachev
Hi Dmitry. Looks like high network latency. Do you run this test with client and server on the same machine, or you test from another machine? May be over wireless? 2011/9/6 Дмитрий > Hello everyone! > We started using hbase (hadoop) system and faced some performance issues. > Actually we are

Re: Observer/Observable MapReduce

2011-03-25 Thread Andrey Stepachev
Look at http://yahoo.github.com/oozie/. May be it will helps you. 2011/3/25 Vishal Kapoor > Can someone give me a direction on how to start a map reduce based on > an outcome of another map reduce? ( nothing common between them apart > from the first decides about the scope of the second. > > I

Re: OT - Hash Code Creation

2011-03-16 Thread Andrey Stepachev
Try hash table with double hashing. Something like this http://www.java2s.com/Code/Java/Collections-Data-Structure/Hashtablewithdoublehashing.htm 2011/3/17 Peter Haidinyak > Hi, >This is a little off topic but this group seems pretty swift so I > thought I would ask. I am aggregating a d

Re: backup utility (HBASE-897)

2011-03-01 Thread Andrey Stepachev
Why you can't use org.apache.hadoop.hbase.mapreduce.Export ? ./hadoop/bin/hadoop jar hbase/hbase-0.89.20100830-ncb3-hadoop737.jar An example program must be given as the first argument. Valid program names are: Copy Table: Export a table from local cluster to peer cluster completebulkload: Com

Re: Truncate tables

2011-02-15 Thread Andrey Stepachev
It is strage thing in hbase. Operations like create or drop are asyncronous, so immidiately after first rpc 'disable' hbase client try to check succesfull execution. Often it is not really complete yet, so hbase client pauses an amount of time configured in 'hbase.client.pause'. If you change it in

Re: Parent/child relation - go vertical, horizontal, or many tables?

2011-02-11 Thread Andrey Stepachev
I such case I think that you can use tall tables with parent:child keys and filters or range scans to get childrens. You queries will be: -Fetch all children from a single parent scan [parent:0, parent+1:0) -Find a few children by their keys or values from a single parent scan [parent:min_of_ch

Re: Determining the unqiue row keys for Hbase table

2011-02-08 Thread Andrey Stepachev
You can use HTable.incrementColumnValue. Of cource you can increment in steps (f.e. 100) and reduce amount of rpc generetated by you MR jobs. 2011/2/7 som_shekhar > > Hi All, > I would like to know how to provide the unique row names for hbase table. > Since if there are similar row keys then th

Re: Java Commited Virtual Memory significally larged then Heap Memory

2011-01-12 Thread Andrey Stepachev
wrong setting the env var. It > should just be picked up when it's in hbase-env.sh, right? > > > Friso > > > > On 12 jan 2011, at 10:59, Andrey Stepachev wrote: > > > with MALLOC_ARENA_MAX=2 > > > > I check -XX:MaxDirectMemorySize=256m, before, but it doe

Re: Java Commited Virtual Memory significally larged then Heap Memory

2011-01-12 Thread Andrey Stepachev
the leakage with > LZO... > > > Thanks, > Friso > > > On 12 jan 2011, at 07:46, Andrey Stepachev wrote: > > > My bad. All things work. Thanks for Todd Lipcon :) > > > > 2011/1/11 Andrey Stepachev > > > >> I tried to set MALLOC_ARENA_MAX=2. But s

Re: Java Commited Virtual Memory significally larged then Heap Memory

2011-01-11 Thread Andrey Stepachev
My bad. All things work. Thanks for Todd Lipcon :) 2011/1/11 Andrey Stepachev > I tried to set MALLOC_ARENA_MAX=2. But still the same issue like in LZO > problem thread. All those 65M blocks here. And JVM continues to eat memory > on heavy write load. And yes, I use "improved&q

Re: problem with LZO compressor on write only loads

2011-01-11 Thread Andrey Stepachev
Yes, I tried. 2011/1/12 Sandy Pratt > I'm curious if you've tried -XX:MaxDirectMemorySize=256m (or whatever > value). > > > -Original Message- > > From: Andrey Stepachev [mailto:oct...@gmail.com] > > Sent: Tuesday, January 11, 2011 12:58 > >

Re: problem with LZO compressor on write only loads

2011-01-11 Thread Andrey Stepachev
Not only with LZO, but with regular gzip I got the same issue (on sun and jrocket jvm). Looks like some bug for me. Don't know how to beat this bug. 2011/1/3 Friso van Vollenhoven > Hi all, > > I seem to run into a problem that occurs when using LZO compression on a > heavy write only load. I am

Re: Java Commited Virtual Memory significally larged then Heap Memory

2011-01-11 Thread Andrey Stepachev
with the new and "improved" memory > allocator? > > If so try setting this in hadoop-env.sh: > > export MALLOC_ARENA_MAX= > > Maybe start by setting it to 4. You can thank Todd Lipcon if this works > for you. > > Cheers, > > > -Xavier > > On

Re: Java Commited Virtual Memory significally larged then Heap Memory

2011-01-11 Thread Andrey Stepachev
No. I don't use LZO. I tried even remove any native support (i.e. all .so from class path) and use java gzip. But nothing. 2011/1/11 Friso van Vollenhoven > Are you using LZO by any chance? If so, which version? > > Friso > > > On 11 jan 2011, at 15:57, Andrey Stepa

Re: Java Commited Virtual Memory significally larged then Heap Memory

2011-01-11 Thread Andrey Stepachev
t of lost memory: 65M * 32 + 132M = 2212M So, it looks like HLog allcates to many memory, and question is: how to restrict it? 2010/12/30 Andrey Stepachev > Hi All. > > After heavy load into hbase (single node, nondistributed test system) I got > 4Gb process size of my HBase java proc

Re: How to rename table's family name

2011-01-10 Thread Andrey Stepachev
the replacement of code with > actual columnfamily name? Over on client or on server before result > is sent the client? > St.Ack > > On Sun, Jan 9, 2011 at 12:29 PM, Andrey Stepachev > wrote: > > 2011/1/9 Stack > > > >> > >> To respond to Andrey, y

Re: question about merge-join (or AND operator betwween colums)

2011-01-10 Thread Andrey Stepachev
it can be very usefull (for inplace indexing). > > -Jack > > On Sat, Jan 8, 2011 at 2:54 PM, Andrey Stepachev wrote: > > > Ok. Understand. > > > > But do you check is it really an issue? I think that it is only 1 IO > here, > > (especially > > i

Re: How to rename table's family name

2011-01-09 Thread Andrey Stepachev
enting the master-coordination for this (for > > instance, all RS's should delete 'atomically') will be interesting > > > > > > > > On Sat, Jan 8, 2011 at 11:36 AM, Andrey Stepachev > wrote: > > > >> 2011/1/8 Stack >

Re: question about merge-join (or AND operator betwween colums)

2011-01-08 Thread Andrey Stepachev
of. I only given the worst case > scenario example, I understand that filtering will produce results we want > but at cost of examining every row and offloading AND/join logic to the > application. > > -Jack > > On Sat, Jan 8, 2011 at 1:59 PM, Andrey Stepachev wrote: > > >

Re: question about merge-join (or AND operator betwween colums)

2011-01-08 Thread Andrey Stepachev
More details on binary sorting you can read http://brunodumon.wordpress.com/2010/02/17/building-indexes-using-hbase-mapping-strings-numbers-and-dates-onto-bytes/ 2011/1/8 Jack Levin > Basic problem described: > > user uploads 1 image and creates some text -10 days ago, then creates 1000 > text m

Re: question about merge-join (or AND operator betwween colums)

2011-01-08 Thread Andrey Stepachev
Hm. But what the problem to have Long.MAX - dayNum instead of dayNum? In this case you get all data sorted in reverse order and you give last entries first in scan results? 2011/1/8 Jack Levin > Basic problem described: > > user uploads 1 image and creates some text -10 days ago, then creates 10

Re: How to rename table's family name

2011-01-08 Thread Andrey Stepachev
2011/1/8 Stack > > > Perhaps we should consider > detaching CF name from whats stored? > Yes! Are there any jira? I'll vote for it. > > St.Ack >

Re: question about merge-join (or AND operator betwween colums)

2011-01-08 Thread Andrey Stepachev
I don't think that it is possible on scanner level with bloomfilters (families are in separate files, so they scanned independently). But you can use filters, to filter out unneeded data. 2011/1/8 Jack Levin > Hello all, I have a scanner question, we have this table: > > hbase(main):002:0> scan

Re: Java Commited Virtual Memory significally larged then Heap Memory

2010-12-30 Thread Andrey Stepachev
3b3? There was a leak in earlier > versions of hadoop-lzo that showed up under CDH3b3. You should upgrade to > the newest. > > If that's not it, let me know, will keep thinking. > > -Todd > > On Thu, Dec 30, 2010 at 12:13 AM, Andrey Stepachev > wrote: > > > A

Re: Java Commited Virtual Memory significally larged then Heap Memory

2010-12-30 Thread Andrey Stepachev
Addition information: ps shows, that my HBase process eats up to 4GB of RSS. $ ps --sort=-rss -eopid,rss | head | grep HMaster PID RSS 23476 3824892 2010/12/30 Andrey Stepachev > Hi All. > > After heavy load into hbase (single node, nondistributed test system) I got > 4Gb pro

Java Commited Virtual Memory significally larged then Heap Memory

2010-12-30 Thread Andrey Stepachev
Hi All. After heavy load into hbase (single node, nondistributed test system) I got 4Gb process size of my HBase java process. On 6GB machine there was no room for anything else (disk cache and so on). Does anybody knows, what is going on, and how you solve this. What heap memory is set on you ho

Re: Insert into tall table 50% faster than wide table

2010-12-23 Thread Andrey Stepachev
2010/12/23 Ted Dunning > But the tall table is FASTER than the wide table. > Opps. :). Maybe you put more data? Do you using compression? (in case of prefixed qualifiers you get more data, that uuid can has comparable length as an order row) > > On Wed, Dec 22, 2010 at 11:1

Re: HBase Client connect to remote HBase

2010-12-23 Thread Andrey Stepachev
gt; > On Thu, Dec 23, 2010 at 3:44 PM, Andrey Stepachev > wrote: > > > As you see, Region servers and master reigsted on 127.0.0.1, so clients > > can't reach them. > > Why it is i can't say. Only three things you can check: > > 1. /etc/hosts, where you

Re: HBase Client connect to remote HBase

2010-12-23 Thread Andrey Stepachev
;Zxid: 0xe >Mode: standalone > >Node count: 10 > > > > On Thu, Dec 23, 2010 at 3:35 PM, Andrey Stepachev > wrote: > > > Wey strange. Can you post > > http://you_hbase_server:60010/zk.jsp<http://mars.nkb:60010/zk.jsp> (or >

Re: HBase Client connect to remote HBase

2010-12-23 Thread Andrey Stepachev
:::127.0.0.1:18965 > ESTABLISHED > > > On Thu, Dec 23, 2010 at 3:17 PM, Andrey Stepachev > wrote: > > > Are the error lines the same as above? Can you post error here? > > > > 10/12/23 14:26:06 INFO ipc.HbaseRPC: Server at /127.0.0.1:45369 could > not >

Re: HBase Client connect to remote HBase

2010-12-23 Thread Andrey Stepachev
Are the error lines the same as above? Can you post error here? 10/12/23 14:26:06 INFO ipc.HbaseRPC: Server at /127.0.0.1:45369 could not be 2010/12/23 King JKing > I had already config this parameter. But still error > > On Thu, Dec 23, 2010 at 2:48 PM, Andrey Stepachev > wrote:

Re: HBase Client connect to remote HBase

2010-12-22 Thread Andrey Stepachev
. 2010/12/23 King JKing > HBase and client are on the different machine > > On Thu, Dec 23, 2010 at 2:32 PM, Andrey Stepachev > wrote: > > > Are HBase and client on the same machine? > > > > 2010/12/23 King JKing > > > > > But still error... >

Re: HBase Client connect to remote HBase

2010-12-22 Thread Andrey Stepachev
> be > reached after 1 tries, giving up. > > > > On Thu, Dec 23, 2010 at 2:18 PM, Andrey Stepachev > wrote: > > > place hbase-site.xml with two parameters in you classpath. > > > > hbase.zookeeper.quorum=your host > > hbase.zookeeper.property.clientPort

Re: HBase Client connect to remote HBase

2010-12-22 Thread Andrey Stepachev
place hbase-site.xml with two parameters in you classpath. hbase.zookeeper.quorum=your host hbase.zookeeper.property.clientPort=2181 (this parameter is optional) another way, create Configuration and set this parameters from any sources you like (i use for example jndi in some of my apps). c =

Re: Insert into tall table 50% faster than wide table

2010-12-22 Thread Andrey Stepachev
I think row locks slows down here. Each row you inserted tries to aquire lock, and then release it. Wide table has significally less rows, and much less locks acquired during insert. 2010/12/23 Bryan Keller > I have been testing a couple of different approaches to storing customer > orders. One

Re: HBase filter in scan

2010-12-13 Thread Andrey Stepachev
Not very easy, but using jRuby and java api, you can. But it is very depend on how you store you data. ## function converts any value to bytes (using Bytes class methods) def toBytes(val) String.from_java_bytes(Bytes.toBytes(val)) end ## example analog of the "select * from where key" def

Re: HBase cluster with heterogeneous resources

2010-10-17 Thread Andrey Stepachev
https://issues.apache.org/jira/browse/HDFS-236 How bad is HDFS random access? - Random access in HDFS always seemed to have bad PR though hardly anyone used the interface. Claims/rumours range from "transfers a lot of excess data" (not true) to "we noticed it is 10 times slower than our

Re: Using external indexes in an HBase Map/Reduce job...

2010-10-12 Thread Andrey Stepachev
Still don't understand. Looks like you want to optimize scans in hbase. Lets invent method for you :). 1. Create you custom input format, which will override getSplits method. like this http://pastebin.org/166201 2. Change splits.start and split.end to min and max keys in you 100k. for example:

Re: Using external indexes in an HBase Map/Reduce job...

2010-10-12 Thread Andrey Stepachev
Hi Michael Segel. If I understand your question correctrly, you looking for optimal way for scanning index search results? If not, my answer below is not relevant :). 1. For mr joins or large index results scan bloom filters can be used like described here http://blog.rapleaf.com/dev/2009/09/25/b

Re: Number of column families vs Number of column family qualifiers

2010-10-11 Thread Andrey Stepachev
2010/10/11 Jean-Daniel Cryans : > On Mon, Oct 11, 2010 at 4:20 AM, Andrey Stepachev wrote: >> Hi. >> Yes. I agree. OOME unlikely. I misinterpreted my current problem. I found, that this (gc timeout) on my 0.89-stumpbleupon hbase occurs only if writeToWAL=false. My RS eats all a

Re: Number of column families vs Number of column family qualifiers

2010-10-11 Thread Andrey Stepachev
Hi. One additional issue with column families: number of memstores. Each family on insert utilizies one memstory. If you'll write in several memstores at onces you get more memstores and more memory will be used by you region server. Especially with random inserts you can easy get gc timeouts or O

Re: I'm looking for usage examples of HBase.

2010-10-06 Thread Andrey Stepachev
2010/10/7 Hisayoshi Tamaki : > Hi. everyone. > > I'm looking for the following usage examples of HBase. >  (1) A database which is stored user data to authenticate users. Why no ldap? (OpenDS, OpenLDAP, ActiveDirectory) Andrey.

Re: Can't insert data into hbase, OOME.

2010-09-30 Thread Andrey Stepachev
x27;s > already in apache's SVN). > > J-D > > On Thu, Sep 30, 2010 at 12:01 AM, Andrey Stepachev wrote: >> Thanx once more. Now works fine. >> >> 2010/9/29 Jean-Daniel Cryans : >>> The fix is here: http://pastebin.com/zuL23e0U >>> >>&g

Re: Can't insert data into hbase, OOME.

2010-09-30 Thread Andrey Stepachev
Thanx once more. Now works fine. 2010/9/29 Jean-Daniel Cryans : > The fix is here: http://pastebin.com/zuL23e0U > > We're going to do a push to github later today, along with other > patches that require more testing. > > J-D > > On Wed, Sep 29, 2010 at 10:54 AM, An

Re: Can't insert data into hbase, OOME.

2010-09-29 Thread Andrey Stepachev
Thanks. I found. I looked in upstream branch instead of stumbleupon branch in my local repo. 2010/9/30 Alexey Kovyrin : > http://github.com/stumbleupon/hbase/blob/master/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L1418 >

Re: Can't insert data into hbase, OOME.

2010-09-29 Thread Andrey Stepachev
ohh... sorry. I find it. Thanks for patch. 2010/9/30 Andrey Stepachev : > Against what version this patch.  I can't find "Only 1 KV, does" in > any stumbleupon or > upstream repositories > > > 2010/9/29 Jean-Daniel Cryans : >> The fix is here: http://pasteb

Re: Can't insert data into hbase, OOME.

2010-09-29 Thread Andrey Stepachev
atches that require more testing. > > J-D > > On Wed, Sep 29, 2010 at 10:54 AM, Andrey Stepachev wrote: >> wow. i'll wait. thanks for reply. >> >> 2010/9/29 Jean-Daniel Cryans : >>> Ok I found the bug, I think it's only in our distro. >>&g

Re: Can't insert data into hbase, OOME.

2010-09-29 Thread Andrey Stepachev
4 times (theoretically >> 256MB of data) I don't even see a flush request... although you're >> running at INFO level instead of DEBUG. Could you switch that and send >> us just the full log. >> >> Thanks a lot! >> >> J-D >> >> On

Re: Can't insert data into hbase, OOME.

2010-09-29 Thread Andrey Stepachev
Data is simple table with two column families. info: json object (small) rows: ~300 bigdecimal columns per row.

Can't insert data into hbase, OOME.

2010-09-29 Thread Andrey Stepachev
Hi all, I'm stuck. I can't insert any valuable peace of data into hbase. Data is something around ~20mil rows (20G). I try to insert them into nondistributed hbase with 4 parallel jobs. MR job run until all memory given to hbase is exhaused and then hbase produces hprof file. As profiler shows, a

Re: Client hanging 20 seconds after job's over (WAS: Re: Can I run HBase 0.20.6 on Hadoop 0.21?)

2010-09-27 Thread Andrey Stepachev
Perhaps the reason of those slowdowns are: 1. copy and unpack job jar. 2. start child java process

Re: Duplicates region after upgrade to 0.89

2010-09-27 Thread Andrey Stepachev
this is restored data (before any upates and all region looks good) http://paste.ubuntu.com/501716/ 2010/9/27 Andrey Stepachev : > Thanks for reply Stack. > > How data was imported and chaged (full story). > 1. stop 0.20.6. > 2. copy /hbase into new location (where 0.89 lives)

Re: Duplicates region after upgrade to 0.89

2010-09-27 Thread Andrey Stepachev
What happens when you do check_meta.rb --fix? > > Can you do a: > > $ echo "scan '.META.'" | ./bin/hbase shell &> /tmp/meta.txt > > ... and pastebin the /tmp/meta.txt so I can take a look? > > St.Ack > > On Mon, Sep 27, 2010 at 6:42 AM, Andre

Duplicates region after upgrade to 0.89

2010-09-27 Thread Andrey Stepachev
After upgrade from 0.20.6 to 0.89 (stumbleupon version) I mentioned, that i have duplicate entries in .META. (encoded names are different, but keys are identical) check_meta.rb sad : What I can do to recover from this situation? Export/Import only? And if that, I must export/import production da

Re: Queries regarding Put and Scanner Result

2010-09-25 Thread Andrey Stepachev
2010/9/25 Imran M Yousuf : > On Thu, Sep 23, 2010 at 11:28 AM, Andrey Stepachev wrote: >> 2010/9/23 Imran M Yousuf : >> How you populate family cf1? 1,2,3,4 = are qualifiers? or you put values >> under the same qualifiers? >> >> In case of qualifiers, you should

Re: Queries regarding Put and Scanner Result

2010-09-22 Thread Andrey Stepachev
2010/9/23 Imran M Yousuf : > Hi, > > > Second, when doing object to row, we are mapping a one to many > relation in a specific column family, e.g. cf1, now when I will update > the row I will populate cf1 with values set newly, e.g. old values are > 1,2,3 and new values are 1,3,4, now I will popula

Re: hbase-0.89/trunk: org.apache.hadoop.fs.ChecksumException: Checksum error

2010-09-22 Thread Andrey Stepachev
/9/22 Andrey Stepachev : > Very strange. With habase over hadoop no such errors with checksums. > Very strange. I'll recheck on another big family. > > 2010/9/22 Andrey Stepachev : >> Thanks. Now i run the same job on >> hbase 0.89 over cloudera hadoop instead of standalon

Re: hbase-0.89/trunk: org.apache.hadoop.fs.ChecksumException: Checksum error

2010-09-22 Thread Andrey Stepachev
Very strange. With habase over hadoop no such errors with checksums. Very strange. I'll recheck on another big family. 2010/9/22 Andrey Stepachev : > Thanks. Now i run the same job on > hbase 0.89 over cloudera hadoop instead of standalone mode. > May be here some bug in standalo

Re: hbase-0.89/trunk: org.apache.hadoop.fs.ChecksumException: Checksum error

2010-09-22 Thread Andrey Stepachev
systems > have been known to give weird flaky faults under load.  It used to be > compiling the linux kernel was a simple benchmark for RAM problems. > If you have time you could try memtest86 to see if the memory has > issues, since that is a common place of errors. > > -ryan &

Re: hbase-0.89/trunk: org.apache.hadoop.fs.ChecksumException: Checksum error

2010-09-22 Thread Andrey Stepachev
One more note. This database was 0.20.6 before. Then i start 0.89 over it. (but table with wrong checksum was created in 0.89 hbase) 2010/9/22 Andrey Stepachev : > 2010/9/22 Ryan Rawson : >> why are you using such expensive disks?  raid + hdfs = lower >> performance than non-r

Re: hbase-0.89/trunk: org.apache.hadoop.fs.ChecksumException: Checksum error

2010-09-22 Thread Andrey Stepachev
g/1074628. It should bail out on exception (except ScannerTimeoutException) > > On Wed, Sep 22, 2010 at 2:14 AM, Andrey Stepachev wrote: >> hp proliant raid 10 with 4 sas. 15k. smartarray 6i. 2cpu/4core. >> >> 2010/9/22 Ryan Rawson : >>> generally checksum errors are du

Re: hbase-0.89/trunk: org.apache.hadoop.fs.ChecksumException: Checksum error

2010-09-22 Thread Andrey Stepachev
. ---- 2010/9/22 Andrey Stepachev : > hp proliant raid 10 with 4 sas. 15k. smartarray 6i. 2cpu/4core. > > 2010/9/22 Ryan Rawson : >> generally checksum errors are due to hardware faults of one kind or another. >> >> what is your hardware like? >> >>

Re: hbase-0.89/trunk: org.apache.hadoop.fs.ChecksumException: Checksum error

2010-09-22 Thread Andrey Stepachev
hp proliant raid 10 with 4 sas. 15k. smartarray 6i. 2cpu/4core. 2010/9/22 Ryan Rawson : > generally checksum errors are due to hardware faults of one kind or another. > > what is your hardware like? > > On Wed, Sep 22, 2010 at 2:08 AM, Andrey Stepachev wrote: >> But

Re: hbase-0.89/trunk: org.apache.hadoop.fs.ChecksumException: Checksum error

2010-09-22 Thread Andrey Stepachev
inue normally? >> >> -ryan >> >> On Wed, Sep 22, 2010 at 1:37 AM, Andrey Stepachev wrote: >>> Hi All. >>> >>> I get org.apache.hadoop.fs.ChecksumException for a table on heavy >>> write in standalone mode. >>> table tmp.bsn.main created 2010-09-22 10:42:28,860 and then 5 threads >>> writes data to it. >>> At some moment exception thrown. >>> >>> Andrey. >>> >> >

Re: Where are hadoop distributions compatible with hbase-0.89

2010-09-22 Thread Andrey Stepachev
Sep 22, 2010 at 1:39 AM, Andrey Stepachev wrote: >> Good. Thanks for reply. >> >> But if you have repo on github why not to make hadoop repo for the same with >> branches/tags for uploaded artefacts? >>

Re: Where are hadoop distributions compatible with hbase-0.89

2010-09-22 Thread Andrey Stepachev
-ryan > > On Wed, Sep 22, 2010 at 1:29 AM, Andrey Stepachev wrote: >> Hi All. >> >> I found many hadoop jars floating around (for example >> http://people.apache.org/~rawson/repo/org/apache/hadoop/hadoop-core/) >> but can't find distribution and/or source

Where are hadoop distributions compatible with hbase-0.89

2010-09-22 Thread Andrey Stepachev
Hi All. I found many hadoop jars floating around (for example http://people.apache.org/~rawson/repo/org/apache/hadoop/hadoop-core/) but can't find distribution and/or source repository used to build this jars. stumbleupon uses 0.20.2-321, but it is close to cloudera, but not cloudera 0.20.2-320 r

Re: Are there any hbase client pom.

2010-09-21 Thread Andrey Stepachev
Ok. looks like i should do it mysefl via exclusions.

Are there any hbase client pom.

2010-09-21 Thread Andrey Stepachev
Hi all. I try to migrate to 0.89 hbase, but i use maven. For 0.20.x I made special pom and place all needed jars into my local repository. But 0.89 already uses maven. Even through it is very bloated with dependencies. (tomcat, jetty, jruby etc). Are there anywhere floating pom for hbase client ap

Re: got HBASE-2516 on 0.20.6

2010-09-17 Thread Andrey Stepachev
> StumbleUpon we have been running on 0.89.20100830 (the release > candidates) since the first week of september in our production > environment. Thanks. Interesting. I'll try to move my test server and look how it works. 0.89 has too many features (like timestamp scan) that i

got HBASE-2516 on 0.20.6

2010-09-17 Thread Andrey Stepachev
I need to massive data rewrite in some family on standalone server. I got org.apache.hadoop.hbase.NotServingRegionException or java.io.IOException: Region xxx closed if I write and read at the same time. What i shall do in 0.20.6 version? One thing that i can try - write to another family and then

Re: Composite Row Key and Partial Matching

2010-09-17 Thread Andrey Stepachev
my bad. thrift doesn't support this. 2010/9/17 Shuja Rehman : > Andrey > > I have checked this filters but i think they can be used with java client > API, are you confirmed that these can be used with thrift api? > > On Thu, Sep 16, 2010 at 5:29 PM, Andrey Stepachev wr

Re: Composite Row Key and Partial Matching

2010-09-16 Thread Andrey Stepachev
DateTime easy, scan [DateTime;DateTime+1] for DateTime_x_ProductName use row filter http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/filter/RowFilter.html or FilterSet to compose predicate for server side filtering. CompareOp can be any of http://hbase.apache.org/docs/current/

  1   2   >