On 10/25/2012 07:44 AM, Anoop Sam John wrote:
Hi
Can you tell more details? How much data your scan is going to retrieve?
it's a full scan of 1.7TB of data on 62 regionserver+master and ZK
quorum machines. I hoped that in some way block caching may slightly
improve the read perfomances. hbase
Hi all,
I'm on cdh3u3 (hbase 0.90.4) and I need to provide a bunch of row keys based on
a column value (e.g. give me all keys where column dataset = 1234). That's
straightforward using a scan and filter. The trick is that I want to return an
Iterator over my key type (Integer) rather than
I'm using CDH4, but I'd imagine that the steps would be similar for upstream
HBase. I've already got the tests JAR file loaded as part of my classpath. I
also have the following in my hbase-site.xml:
property
namehbase.coprocessor.region.classes/name
Hi all,
First, sorry about my slowness to reply to this thread, but it went to
my spam folder and I lost sight of it.
I don’t have good knowledge of RDBMS, and so I don’t have good
knowledge of triggers too. That’s why I looked at the endpoints too
because they are pretty new for me.
First, I
Hi Ted,
Sorry, I totally missed this email too :( Many of my HBase list emails
went into the junk folder those last few weeks and just figured that
yesterday. So I have a LOT of reading to do.
I looked at all the HBASE-6942 thread and seems it's now commited in
0.94.3. I will wait for this
Glad that you like the approach, Jean-Marc.
Tell us your experience when 0.94.3 comes out next month.
On Thu, Oct 25, 2012 at 6:51 AM, Jean-Marc Spaggiari
jean-m...@spaggiari.org wrote:
Hi Ted,
Sorry, I totally missed this email too :( Many of my HBase list emails
went into the junk folder
Hi everyone
Continuing on my journey in Hadoop world.
I have installed:
Hive 0.9 on
Hbase 0.94.2 cluster atop
an Hadoop 1.0.3 cluster.
When I do simply query like select * everything is fine.
When I try select foo from myTestTable I see the MR
start on my hadoop monitor UI and it fails with:
Hi Nick,
Have you tried what's in this documentation?
https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration#HBaseIntegration-Usage
It seems to me that you should add guava in the auxpath.
J-D
On Thu, Oct 25, 2012 at 7:52 AM, Nick maillard
nicolas.maill...@fifty-five.com wrote:
Hi
Hi Jean-Daniel
Thanks for the quick reply.
Guava is already in my auxpath.
I guess I have found a solution by adding protobuff to the auxpath.
Basically instead of:
./bin/hive --auxpath
$HIVE_HOME/lib/hive-hbase-handler-0.9.0.jar,
$HBASE_HOME/hbase-0.94.2.jar,
Oh I meant to say protobuf, not guava. And yes that's how it should be
done. You can also set your auxpath in your hive-env so that you don't
need to keep a big command line around.
J-D
On Thu, Oct 25, 2012 at 8:06 AM, Nick maillard
nicolas.maill...@fifty-five.com wrote:
Hi Jean-Daniel
Thanks
Hi jean-Daniel
Ok I'll sent it in the env thanks for the advice.
Are their other libs I might need to add?
Could just tell hive to use it's lib directory or hbase's lib directory in it's
classpath in some way?
I could just set it in the bashrc but that's not very elegant.
Another thing I am
Sorry typo my query was 'select count(*) from myTestTable'
On Thu, Oct 25, 2012 at 8:31 AM, Nick maillard
nicolas.maill...@fifty-five.com wrote:
Hi jean-Daniel
Ok I'll sent it in the env thanks for the advice.
Are their other libs I might need to add?
The usual client libs... doesn't seem like we documented them
anywhere... it's pretty much what you
Nicolas,
I just went through the same exercise. There are many ways to get this to
go faster, but eventually I decided that bulk loading is the best solution
as run times scaled with the number machines in my cluster when I used that
approach.
One thing you can try is to turn off hbase's write
Hi Jean-Daniel
Again thanks for the quick reply and for the env detail I'll get to it.
Of course select count (*) is not what I want to optimize.
My more regular queries will have an Hbase schema designed for them using the
rowkeys and potentially column families etc...
I'm guessing Hive uses
Hi Amit,
You might want to add details to your question.
1) Lot of small files is a known 'problem' for Hadoop MapReduce. And you
will find information on it by searching.
http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
I assume you have a more specific issue, what is it?
2) I am
Hi amit
I am starting with Hbase and MR so my opinion ismore about what I read than real
world.
However the documentation says Hadoop will deal better with a set of large files
than a lot of small ones.
regards
amit bohra bohra.a@... writes:
On Thu, Oct 25, 2012 at 9:00 AM, Nick maillard
nicolas.maill...@fifty-five.com wrote:
Hi Jean-Daniel
Again thanks for the quick reply and for the env detail I'll get to it.
Of course select count (*) is not what I want to optimize.
My more regular queries will have an Hbase schema designed
Hi Jean-Daniel
We are trying different types of software to solve our size issue.
We aggregate data on website interactions and this can easily go up to a couple
of millions lines with a couple tens of interactions types per day. We would
like to keep a rolling 2 years history. We expect this
What I still don’t understand is, since both CP and MR are both
running on the region side, with is the MR better than the CP?
For the case bulk delete alone CP (Endpoint) will be better than MR for
sure.. Considering your over all need people were suggesting better MR..
U need a scan and move
When you say small files, do you mean to say those are stored within HBase
columns?
If so, you need not worry as HBase would eventually write bigger HFile on
disk (or HDFS).
If you are storing lot of small files on HDFS itself, then you will have
scalability problems as single NameNode cannot
Hi Nicolas,
As per my experience you wont get good performance if you run 3 Map task
simultaneously on one Hard Drive. That seems like a lot of I/O on one disk.
HBase performs well when you have at least 5 nodes in cluster. So, running
HBase on 3 nodes is not something you would do in prod.
@Jonathan,
As per Anoop and Ram, WAL is not used with bulk loading so turning off WAL
wont have any impact on performance.
On Thu, Oct 25, 2012 at 1:33 PM, anil gupta anilgupt...@gmail.com wrote:
Hi Nicolas,
As per my experience you wont get good performance if you run 3 Map task
Hi JM:
There was a thread discussing M/R bulk delete vs. Coprocessor bulk delete.
The thread subject is Bulk Delete.
The guy in that post suggested to write a HFile which contains all the
delete markers and then use bulk incremental load facility to actually move
all the delete markers to the
Anoop: In prePut hook u call HTable#put()?
Anil: Yes i call HTable#put() in prePut. Is there better way of doing it?
Anoop: Why use the network calls from server side here then?
Anil: I thought this is a cleaner approach since i am using BulkLoader. I
decided not to run two jobs since i am
Hi Anil,
Some confusion after seeing your reply.
You use bulk loading? You created your own mapper? You call HTable#put() from
mappers?
I think confusion in another thread also.. I was refering to the
HFileOutputReducer.. There is a TableOutputFormat also... In
As per Anoop and Ram, WAL is not used with bulk loading so turning off WAL
wont have any impact on performance.
This is if HFileOutputFormat is being used.. There is a TableOutputFormat
which also can be used as the OutputFormat for MR.. Here write to wal is
applicable
This one, instead of
Use SingleColumnValueFilter#filterIfMissing(true)
s.setBatch(10);
How many total columns in the Schema? When using the SingleColumnValueFilter
setBatch() might not work ou always.. FYI
-Anoop-
From: jian fan [xiaofanhb...@gmail.com]
Sent: Friday,
Is it a
good idea to create Htable instance on B and do put in my mapper? I
might
try this idea.
Yes you can do this.. May be the same mapper you can do a put for table
B. This was how we have tried loading data to another table by using the
main table A
Puts.
Now your main question is
Anil
Have a look at MultiTableOutputFormat ( I am refering to 0.94 code base
Not sure whether available in older versions)
-Anoop-
From: Ramkrishna.S.Vasudevan [ramkrishna.vasude...@huawei.com]
Sent: Friday, October 26, 2012 9:50 AM
To:
Hi Anoop,
Yes i use bulk loading for loading table A. I wrote my own mapper as
Importtsv wont suffice my requirements. :) No, i dont call HTable#put()
from my mapper. I was thinking about trying out calling HTable#put() from
my mapper and see the outcome.
I meant to say that when we use MR job
31 matches
Mail list logo