Hi,
I created https://issues.apache.org/jira/browse/HBASE-8847, with a small
patch (
https://github.com/kryzthov/hbase/commit/bd9a3b325d5d335fba04b5f7ce5f588e673cac91)
based on 0.94.8.
That seems to fix the problem on my side, but I would need to do some more
testing to ensure it doesn't introduce
Christophe:
Looks like you have clear idea of what to do.
If you can show us in the form of patch, that would be nice.
Cheers
On Mon, Jul 1, 2013 at 7:17 PM, Christophe Taton wrote:
> On Mon, Jul 1, 2013 at 12:01 PM, lars hofhansl wrote:
>
> > It would make sense, but it is not immediately cl
On Mon, Jul 1, 2013 at 12:01 PM, lars hofhansl wrote:
> It would make sense, but it is not immediately clear how to do so cleanly.
> We would no longer be able to call transform at the StoreScanner level (or
> evaluate the filter multiple times, or require the filters to maintain
> their - last -
Thanks Dave.
On Tue, Jul 2, 2013 at 8:34 AM, Dave Latham wrote:
> On Mon, Jul 1, 2013 at 4:52 PM, Azuryy Yu wrote:
>
> > how to enable "sync on block close" in HDFS?
> >
> Set dfs.datanode.synconclose to true
>
> See https://issues.apache.org/jira/browse/HDFS-1539
>
On Mon, Jul 1, 2013 at 4:52 PM, Azuryy Yu wrote:
> how to enable "sync on block close" in HDFS?
>
Set dfs.datanode.synconclose to true
See https://issues.apache.org/jira/browse/HDFS-1539
how to enable "sync on block close" in HDFS?
--Send from my Sony mobile.
On Jul 2, 2013 6:47 AM, "Lars Hofhansl" wrote:
> HBase is interesting here, because it rewrites old data into new files. So
> a power outage by default would not just lose new data but potentially old
> data as well.
> You
HBase is interesting here, because it rewrites old data into new files. So a
power outage by default would not just lose new data but potentially old data
as well.
You can enable "sync on block close" in HDFS, and then at least be sure that
closed blocks (and thus files) are synced to disk physi
Hello there,
I'm sorry I didn't quite get it. What do you mean by "sequence write
speed"? If you are looking for ways to improve HBase writes, you might find
this useful :
http://hbase.apache.org/book/perf.writing.html
Warm Regards,
Tariq
cloudfront.blogspot.com
On Mon, Jul 1, 2013 at 9:
Thanks for the response, Suresh.
I'm not sure that I understand the details properly. From my reading of
HDFS-744 the hsync API would allow a client to make sure that at any point
in time it's writes so far hit the disk. For example, for HBase it could
apply a fsync after adding some edits to it
sure thing: https://issues.apache.org/jira/browse/HBASE-8844
On Mon, Jul 1, 2013 at 3:59 PM, Jean-Daniel Cryans wrote:
> Yeah that package documentation ought to be changed. Mind opening a jira?
>
> Thx,
>
> J-D
>
> On Mon, Jul 1, 2013 at 1:51 PM, Patrick Schless
> wrote:
> > The first two tuto
Yes this is a known issue.
The HDFS part of this was addressed in
https://issues.apache.org/jira/browse/HDFS-744 for 2.0.2-alpha and is not
available in 1.x release. I think HBase does not use this API yet.
On Mon, Jul 1, 2013 at 3:00 PM, Dave Latham wrote:
> We're running HBase over HDFS 1.0
I attached my patch to the JIRA issue, in case anyone is interested. It can
pretty easily be used on its own without patching HBase. I am currently doing
this.
On Jul 1, 2013, at 2:23 PM, Enis Söztutar wrote:
> Bryan,
>
> 3.6x improvement seems exciting. The ballpark difference between HBase
Bryan,
3.6x improvement seems exciting. The ballpark difference between HBase scan
and hdfs scan is in that order, so it is expected I guess.
I plan to get back to the trunk patch, add more tests etc next week. In the
mean time, if you have any changes to the patch, pls attach the patch.
Enis
Yeah that package documentation ought to be changed. Mind opening a jira?
Thx,
J-D
On Mon, Jul 1, 2013 at 1:51 PM, Patrick Schless
wrote:
> The first two tutorials for enabling replication that google gives me [1],
> [2] take very different tones with regard to stop_replication. The HBase
> doc
The first two tutorials for enabling replication that google gives me [1],
[2] take very different tones with regard to stop_replication. The HBase
docs [1] make it sound fine to start and stop replication as desired. The
Cloudera docs [2] say it may cause data loss.
Which is true? If data loss is
I'm currently struggling with export/import between two hbase clusters.
I have managed to create incremental exports from the source cluster
(using hbase Export). Now I would like to bulk load the export into the
destination (presumably using HFiles). The reason for the bulk load
requirement is t
On Mon, Jul 1, 2013 at 10:06 AM, Vimal Jain wrote:
> Sorry for the typo .. please ignore previous mail.. Here is the corrected
> one..
> 1)I have around 140 columns for each row , out of 140 , around 100 columns
> hold java primitive data type , remaining 40 columns contain serialized
> java obj
It would make sense, but it is not immediately clear how to do so cleanly. We
would no longer be able to call transform at the StoreScanner level (or
evaluate the filter multiple times, or require the filters to maintain their -
last - state and only apply transform selectively).
I added transf
My name is Erman Pattuk. Together with my advisor Prof. Murat Kantarcioglu, we
have developed an open source tool that enables secure and encrypted
outsourcing of Key-Value stores to public cloud infrastructures. I would like
to get feedback from interested users, so that I can improve and stren
On Mon, Jul 1, 2013 at 4:14 AM, lars hofhansl wrote:
> You want transform to only be called on filters that are "reached"?
> I.e. FilterA and FilterB, FilterB.transform should not be called if a KV
> is already filtered by FilterA?
>
Yes, that's what I naively expected, at first.
That's not how
Sorry for the typo .. please ignore previous mail.. Here is the corrected
one..
1)I have around 140 columns for each row , out of 140 , around 100 columns
hold java primitive data type , remaining 40 columns contain serialized
java object as byte array(Inside each object is an ArrayList). Yes , I
Hi Lars,
1)I have around 140 columns for each row , out of 140 , around 100 rows are
holds java primitive data type , remaining 40 rows contains serialized java
object as byte array. Yes , I do delete data but the frequency is very less
( 1 out of 5K operations ). I dont run any compaction.
2) I ha
The performance you're seeing is definitely not typical. 'couple of further
questions:
- How large are your KVs (columns)?- Do you delete data? Do you run major
compactions?
- Can you measure: CPU, IO, context switches, etc, during the scanning?
- Do you have many versions of the columns?
Note
That is the easy part :)
The hard part is to add this to filters in a backwards compatible way.
-- Lars
- Original Message -
From: Varun Sharma
To: user@hbase.apache.org; lars hofhansl
Cc: "d...@hbase.apache.org"
Sent: Monday, July 1, 2013 8:18 AM
Subject: Re: Issues with delete marke
I mean version tracking with delete markers...
On Mon, Jul 1, 2013 at 8:17 AM, Varun Sharma wrote:
> So, yesterday, I implemented this change via a coprocessor which basically
> initiates a scan which is raw, keeps tracking of # of delete markers
> encountered and stops when a configured thresh
So, yesterday, I implemented this change via a coprocessor which basically
initiates a scan which is raw, keeps tracking of # of delete markers
encountered and stops when a configured threshold is met. It instantiates
its own ScanDeleteTracker to do the masking through delete markers. So raw
scan,
Looking at bin/hbase:
# check envvars which might override default args
if [ "$HBASE_HEAPSIZE" != "" ]; then
#echo "run with heapsize $HBASE_HEAPSIZE"
JAVA_HEAP_MAX="-Xmx""$HBASE_HEAPSIZE""m"
#echo $JAVA_HEAP_MAX
fi
Meaning, if you set HBASE_HEAPSIZE environment variable, bin/hbase would
ta
Hi,
We had some hardware constraints along with the fact that our total data
size was in GBs.
Thats why to start with Hbase , we first began with pseudo distributed
mode and thought if required we would upgrade to fully distributed mode.
On Mon, Jul 1, 2013 at 5:09 PM, Ted Yu wrote:
> bq. I
bq. I have configured Hbase in pseudo distributed mode on top of HDFS.
What was the reason for using pseudo distributed mode in production setup ?
Cheers
On Mon, Jul 1, 2013 at 1:44 AM, Vimal Jain wrote:
> Thanks Dhaval/Michael/Ted/Otis for your replies.
> Actually , i asked this question beca
Hi Lars,
I am using Hadoop version - 1.1.2 and Hbase version - 0.94.7.
Yes , I have enabled scanner caching with value 10K but performance is not
too good. :(
On Mon, Jul 1, 2013 at 4:48 PM, lars hofhansl wrote:
> Which version of HBase?
> Did you enable scanner caching? Otherwise each call to
Which version of HBase?
Did you enable scanner caching? Otherwise each call to next() is a RPC
roundtrip and you are basically measuring your networks RTT.
-- Lars
From: Vimal Jain
To: user@hbase.apache.org
Sent: Monday, July 1, 2013 4:11 AM
Subject: Re: How
You want transform to only be called on filters that are "reached"?
I.e. FilterA and FilterB, FilterB.transform should not be called if a KV is
already filtered by FilterA?
That's not how it works right now, transform is called in a completely
different code path from the actual filtering logic.
Can someone please reply ?
Also what is the typical read/write speed of hbase and how much deviation
would be there in my scenario mentioned above (14 cf , total 140 columns ) ?
I am asking this because i am not simply printing out the scanned values ,
instead i am applying some logic on the data
Absolutely.
- Original Message -
From: Ted Yu
To: user@hbase.apache.org
Cc:
Sent: Sunday, June 30, 2013 9:32 PM
Subject: Re: Poor HBase map-reduce scan performance
Looking at the tail of HBASE-8369, there were some comments which are yet
to be addressed.
I think trunk patch should be
That would be quite dramatic change, we cannot pass delete markers to the
existing filters without confusing them.
We could invent a new method (filterDeleteKV or filterDeleteMarker or
something) on filters along with a new "filter type" that implements that
method.
-- Lars
- Original Me
Hi,
Just wanted to check if it's safe to use the JIRA mentioned in the subject
i.e. https://issues.apache.org/jira/browse/HBASE-7846
Thanks,
Viral
I scanned it during normal traffic hours.There was no I/O load on the
server.
I dont see any GC locks too.
Also i have given 1.5G to RS , 512M to each Master and Zookeeper.
One correction in the post above :
Actual time to scan whole table is even more , it takes 10 mins to scan 0.1
million rows (
Please take a look at http://hbase.apache.org/book.html#lzo.compression and
the links in that section.
Cheers
On Mon, Jul 1, 2013 at 3:57 PM, ch huang wrote:
> i add lzo compression in config file ,but region server can not start,it
> seems lzo lib is miss,how can i install lzo lib for hbase,an
When you did the scan, did you check what the bottleneck was ? Was it I/O ?
Did you see any GC locks ? How much RAM are you giving to your RS ?
-Viral
On Mon, Jul 1, 2013 at 1:44 AM, Vimal Jain wrote:
> To completely scan the table for all 140 columns , it takes around 30-40
> minutes.
>
Thanks Dhaval/Michael/Ted/Otis for your replies.
Actually , i asked this question because i am seeing some performance
degradation in my production Hbase setup.
I have configured Hbase in pseudo distributed mode on top of HDFS.
I have created 17 Column families :( . I am actually using 14 out of th
i add lzo compression in config file ,but region server can not start,it
seems lzo lib is miss,how can i install lzo lib for hbase,and in production
which compress is used ? snappy or lzo?
thanks all
# /etc/init.d/hadoop-hbase-regionserver start
starting regionserver, logging to
/var/log/hbase/hb
if i set HBASE_HEAPSIZE=2 (HEAP is 20G ) ,can i set jvm option -Xmx20g
-Xms20G? if not ,how much i can set?
42 matches
Mail list logo