Re: Behavior of Filter.transform() in FilterList?

2013-07-01 Thread Christophe Taton
Hi, I created https://issues.apache.org/jira/browse/HBASE-8847, with a small patch ( https://github.com/kryzthov/hbase/commit/bd9a3b325d5d335fba04b5f7ce5f588e673cac91) based on 0.94.8. That seems to fix the problem on my side, but I would need to do some more testing to ensure it doesn't introduce

Re: Behavior of Filter.transform() in FilterList?

2013-07-01 Thread Ted Yu
Christophe: Looks like you have clear idea of what to do. If you can show us in the form of patch, that would be nice. Cheers On Mon, Jul 1, 2013 at 7:17 PM, Christophe Taton wrote: > On Mon, Jul 1, 2013 at 12:01 PM, lars hofhansl wrote: > > > It would make sense, but it is not immediately cl

Re: Behavior of Filter.transform() in FilterList?

2013-07-01 Thread Christophe Taton
On Mon, Jul 1, 2013 at 12:01 PM, lars hofhansl wrote: > It would make sense, but it is not immediately clear how to do so cleanly. > We would no longer be able to call transform at the StoreScanner level (or > evaluate the filter multiple times, or require the filters to maintain > their - last -

Re: data loss after cluster wide power loss

2013-07-01 Thread Azuryy Yu
Thanks Dave. On Tue, Jul 2, 2013 at 8:34 AM, Dave Latham wrote: > On Mon, Jul 1, 2013 at 4:52 PM, Azuryy Yu wrote: > > > how to enable "sync on block close" in HDFS? > > > Set dfs.datanode.synconclose to true > > See https://issues.apache.org/jira/browse/HDFS-1539 >

Re: data loss after cluster wide power loss

2013-07-01 Thread Dave Latham
On Mon, Jul 1, 2013 at 4:52 PM, Azuryy Yu wrote: > how to enable "sync on block close" in HDFS? > Set dfs.datanode.synconclose to true See https://issues.apache.org/jira/browse/HDFS-1539

Re: data loss after cluster wide power loss

2013-07-01 Thread Azuryy Yu
how to enable "sync on block close" in HDFS? --Send from my Sony mobile. On Jul 2, 2013 6:47 AM, "Lars Hofhansl" wrote: > HBase is interesting here, because it rewrites old data into new files. So > a power outage by default would not just lose new data but potentially old > data as well. > You

Re: data loss after cluster wide power loss

2013-07-01 Thread Lars Hofhansl
HBase is interesting here, because it rewrites old data into new files. So a power outage by default would not just lose new data but potentially old data as well. You can enable "sync on block close" in HDFS, and then at least be sure that closed blocks (and thus files) are synced to disk physi

Re: how can i improve sequence write speed?

2013-07-01 Thread Mohammad Tariq
Hello there, I'm sorry I didn't quite get it. What do you mean by "sequence write speed"? If you are looking for ways to improve HBase writes, you might find this useful : http://hbase.apache.org/book/perf.writing.html Warm Regards, Tariq cloudfront.blogspot.com On Mon, Jul 1, 2013 at 9:

Re: data loss after cluster wide power loss

2013-07-01 Thread Dave Latham
Thanks for the response, Suresh. I'm not sure that I understand the details properly. From my reading of HDFS-744 the hsync API would allow a client to make sure that at any point in time it's writes so far hit the disk. For example, for HBase it could apply a fsync after adding some edits to it

Re: stop_replication dangerous?

2013-07-01 Thread Patrick Schless
sure thing: https://issues.apache.org/jira/browse/HBASE-8844 On Mon, Jul 1, 2013 at 3:59 PM, Jean-Daniel Cryans wrote: > Yeah that package documentation ought to be changed. Mind opening a jira? > > Thx, > > J-D > > On Mon, Jul 1, 2013 at 1:51 PM, Patrick Schless > wrote: > > The first two tuto

Re: data loss after cluster wide power loss

2013-07-01 Thread Suresh Srinivas
Yes this is a known issue. The HDFS part of this was addressed in https://issues.apache.org/jira/browse/HDFS-744 for 2.0.2-alpha and is not available in 1.x release. I think HBase does not use this API yet. On Mon, Jul 1, 2013 at 3:00 PM, Dave Latham wrote: > We're running HBase over HDFS 1.0

Re: Poor HBase map-reduce scan performance

2013-07-01 Thread Bryan Keller
I attached my patch to the JIRA issue, in case anyone is interested. It can pretty easily be used on its own without patching HBase. I am currently doing this. On Jul 1, 2013, at 2:23 PM, Enis Söztutar wrote: > Bryan, > > 3.6x improvement seems exciting. The ballpark difference between HBase

Re: Poor HBase map-reduce scan performance

2013-07-01 Thread Enis Söztutar
Bryan, 3.6x improvement seems exciting. The ballpark difference between HBase scan and hdfs scan is in that order, so it is expected I guess. I plan to get back to the trunk patch, add more tests etc next week. In the mean time, if you have any changes to the patch, pls attach the patch. Enis

Re: stop_replication dangerous?

2013-07-01 Thread Jean-Daniel Cryans
Yeah that package documentation ought to be changed. Mind opening a jira? Thx, J-D On Mon, Jul 1, 2013 at 1:51 PM, Patrick Schless wrote: > The first two tutorials for enabling replication that google gives me [1], > [2] take very different tones with regard to stop_replication. The HBase > doc

stop_replication dangerous?

2013-07-01 Thread Patrick Schless
The first two tutorials for enabling replication that google gives me [1], [2] take very different tones with regard to stop_replication. The HBase docs [1] make it sound fine to start and stop replication as desired. The Cloudera docs [2] say it may cause data loss. Which is true? If data loss is

simple export --> bulk import

2013-07-01 Thread Michael Ellery
I'm currently struggling with export/import between two hbase clusters. I have managed to create incremental exports from the source cluster (using hbase Export). Now I would like to bulk load the export into the destination (presumably using HFiles). The reason for the bulk load requirement is t

Re: How many column families in one table ?

2013-07-01 Thread Viral Bajaria
On Mon, Jul 1, 2013 at 10:06 AM, Vimal Jain wrote: > Sorry for the typo .. please ignore previous mail.. Here is the corrected > one.. > 1)I have around 140 columns for each row , out of 140 , around 100 columns > hold java primitive data type , remaining 40 columns contain serialized > java obj

Re: Behavior of Filter.transform() in FilterList?

2013-07-01 Thread lars hofhansl
It would make sense, but it is not immediately clear how to do so cleanly. We would no longer be able to call transform at the StoreScanner level (or evaluate the filter multiple times, or require the filters to maintain their - last - state and only apply transform selectively). I added transf

BigSecret: A secure data management framework for Key-Value Stores

2013-07-01 Thread erman pattuk
My name is Erman Pattuk. Together with my advisor Prof. Murat Kantarcioglu, we have developed an open source tool that enables secure and encrypted outsourcing of Key-Value stores to public cloud infrastructures. I would like to get feedback from interested users, so that I can improve and stren

Re: Behavior of Filter.transform() in FilterList?

2013-07-01 Thread Christophe Taton
On Mon, Jul 1, 2013 at 4:14 AM, lars hofhansl wrote: > You want transform to only be called on filters that are "reached"? > I.e. FilterA and FilterB, FilterB.transform should not be called if a KV > is already filtered by FilterA? > Yes, that's what I naively expected, at first. That's not how

Re: How many column families in one table ?

2013-07-01 Thread Vimal Jain
Sorry for the typo .. please ignore previous mail.. Here is the corrected one.. 1)I have around 140 columns for each row , out of 140 , around 100 columns hold java primitive data type , remaining 40 columns contain serialized java object as byte array(Inside each object is an ArrayList). Yes , I

Re: How many column families in one table ?

2013-07-01 Thread Vimal Jain
Hi Lars, 1)I have around 140 columns for each row , out of 140 , around 100 rows are holds java primitive data type , remaining 40 rows contains serialized java object as byte array. Yes , I do delete data but the frequency is very less ( 1 out of 5K operations ). I dont run any compaction. 2) I ha

Re: How many column families in one table ?

2013-07-01 Thread lars hofhansl
The performance you're seeing is definitely not typical. 'couple of further questions: - How large are your KVs (columns)?- Do you delete data? Do you run major compactions? - Can you measure: CPU, IO, context switches, etc, during the scanning? - Do you have many versions of the columns? Note

Re: Issues with delete markers

2013-07-01 Thread lars hofhansl
That is the easy part :) The hard part is to add this to filters in a backwards compatible way. -- Lars - Original Message - From: Varun Sharma To: user@hbase.apache.org; lars hofhansl Cc: "d...@hbase.apache.org" Sent: Monday, July 1, 2013 8:18 AM Subject: Re: Issues with delete marke

Re: Issues with delete markers

2013-07-01 Thread Varun Sharma
I mean version tracking with delete markers... On Mon, Jul 1, 2013 at 8:17 AM, Varun Sharma wrote: > So, yesterday, I implemented this change via a coprocessor which basically > initiates a scan which is raw, keeps tracking of # of delete markers > encountered and stops when a configured thresh

Re: Issues with delete markers

2013-07-01 Thread Varun Sharma
So, yesterday, I implemented this change via a coprocessor which basically initiates a scan which is raw, keeps tracking of # of delete markers encountered and stops when a configured threshold is met. It instantiates its own ScanDeleteTracker to do the masking through delete markers. So raw scan,

Re: question about hbase envionmnet variable

2013-07-01 Thread Ted Yu
Looking at bin/hbase: # check envvars which might override default args if [ "$HBASE_HEAPSIZE" != "" ]; then #echo "run with heapsize $HBASE_HEAPSIZE" JAVA_HEAP_MAX="-Xmx""$HBASE_HEAPSIZE""m" #echo $JAVA_HEAP_MAX fi Meaning, if you set HBASE_HEAPSIZE environment variable, bin/hbase would ta

Re: How many column families in one table ?

2013-07-01 Thread Vimal Jain
Hi, We had some hardware constraints along with the fact that our total data size was in GBs. Thats why to start with Hbase , we first began with pseudo distributed mode and thought if required we would upgrade to fully distributed mode. On Mon, Jul 1, 2013 at 5:09 PM, Ted Yu wrote: > bq. I

Re: How many column families in one table ?

2013-07-01 Thread Ted Yu
bq. I have configured Hbase in pseudo distributed mode on top of HDFS. What was the reason for using pseudo distributed mode in production setup ? Cheers On Mon, Jul 1, 2013 at 1:44 AM, Vimal Jain wrote: > Thanks Dhaval/Michael/Ted/Otis for your replies. > Actually , i asked this question beca

Re: How many column families in one table ?

2013-07-01 Thread Vimal Jain
Hi Lars, I am using Hadoop version - 1.1.2 and Hbase version - 0.94.7. Yes , I have enabled scanner caching with value 10K but performance is not too good. :( On Mon, Jul 1, 2013 at 4:48 PM, lars hofhansl wrote: > Which version of HBase? > Did you enable scanner caching? Otherwise each call to

Re: How many column families in one table ?

2013-07-01 Thread lars hofhansl
Which version of HBase? Did you enable scanner caching? Otherwise each call to next() is a RPC roundtrip and you are basically measuring your networks RTT. -- Lars From: Vimal Jain To: user@hbase.apache.org Sent: Monday, July 1, 2013 4:11 AM Subject: Re: How

Re: Behavior of Filter.transform() in FilterList?

2013-07-01 Thread lars hofhansl
You want transform to only be called on filters that are "reached"? I.e. FilterA and FilterB, FilterB.transform should not be called if a KV is already filtered by FilterA? That's not how it works right now, transform is called in a completely different code path from the actual filtering logic.

Re: How many column families in one table ?

2013-07-01 Thread Vimal Jain
Can someone please reply ? Also what is the typical read/write speed of hbase and how much deviation would be there in my scenario mentioned above (14 cf , total 140 columns ) ? I am asking this because i am not simply printing out the scanned values , instead i am applying some logic on the data

Re: Poor HBase map-reduce scan performance

2013-07-01 Thread lars hofhansl
Absolutely. - Original Message - From: Ted Yu To: user@hbase.apache.org Cc: Sent: Sunday, June 30, 2013 9:32 PM Subject: Re: Poor HBase map-reduce scan performance Looking at the tail of HBASE-8369, there were some comments which are yet to be addressed. I think trunk patch should be

Re: Issues with delete markers

2013-07-01 Thread lars hofhansl
That would be quite dramatic change, we cannot pass delete markers to the existing filters without confusing them. We could invent a new method (filterDeleteKV or filterDeleteMarker or something) on filters along with a new "filter type" that implements that method. -- Lars - Original Me

HBASE-7846 : is it safe to use on 0.94.4 ?

2013-07-01 Thread Viral Bajaria
Hi, Just wanted to check if it's safe to use the JIRA mentioned in the subject i.e. https://issues.apache.org/jira/browse/HBASE-7846 Thanks, Viral

Re: How many column families in one table ?

2013-07-01 Thread Vimal Jain
I scanned it during normal traffic hours.There was no I/O load on the server. I dont see any GC locks too. Also i have given 1.5G to RS , 512M to each Master and Zookeeper. One correction in the post above : Actual time to scan whole table is even more , it takes 10 mins to scan 0.1 million rows (

Re: lzo lib missing ,region server can not start

2013-07-01 Thread Ted Yu
Please take a look at http://hbase.apache.org/book.html#lzo.compression and the links in that section. Cheers On Mon, Jul 1, 2013 at 3:57 PM, ch huang wrote: > i add lzo compression in config file ,but region server can not start,it > seems lzo lib is miss,how can i install lzo lib for hbase,an

Re: How many column families in one table ?

2013-07-01 Thread Viral Bajaria
When you did the scan, did you check what the bottleneck was ? Was it I/O ? Did you see any GC locks ? How much RAM are you giving to your RS ? -Viral On Mon, Jul 1, 2013 at 1:44 AM, Vimal Jain wrote: > To completely scan the table for all 140 columns , it takes around 30-40 > minutes. >

Re: How many column families in one table ?

2013-07-01 Thread Vimal Jain
Thanks Dhaval/Michael/Ted/Otis for your replies. Actually , i asked this question because i am seeing some performance degradation in my production Hbase setup. I have configured Hbase in pseudo distributed mode on top of HDFS. I have created 17 Column families :( . I am actually using 14 out of th

lzo lib missing ,region server can not start

2013-07-01 Thread ch huang
i add lzo compression in config file ,but region server can not start,it seems lzo lib is miss,how can i install lzo lib for hbase,and in production which compress is used ? snappy or lzo? thanks all # /etc/init.d/hadoop-hbase-regionserver start starting regionserver, logging to /var/log/hbase/hb

question about hbase envionmnet variable

2013-07-01 Thread ch huang
if i set HBASE_HEAPSIZE=2 (HEAP is 20G ) ,can i set jvm option -Xmx20g -Xms20G? if not ,how much i can set?