This is two questions for one solution. I want to put a blog post
together on this if it's right.
I am playing with htable.batch for multi get to see if I can remove my
external hbase indexes. This is what I am trying to do.
#1 What is the best model for a column family that is just used as a
Timestamp is in every key value pair.
Take a look at this method in Scan:
public Scan setTimeRange(long minStamp, long maxStamp)
Cheers
On Sat, Mar 19, 2011 at 3:43 PM, Oleg Ruchovets wrote:
> Good point ,
> let me explain the process. We choose the keys _
> because after insertion w
Good point ,
let me explain the process. We choose the keys _
because after insertion we run scans and want to analyse data which is
related to the specific date.
Can you provide more details using hashing and how can I scan hbase data per
specific date using it.
Oleg.
On Sun, Mar 20,
Thank you both for your replies. I took a look at the information you
pointed me to, and it already helped me quite a lot.
For now, I still have these questions:
How do I deal with 'nested' one-to-one relationships? I'm
talking about a following case: a patient has many episodes
I guess you chose date prefix for query consideration.
You should introduce hashing so that the row keys are not clustered
together.
On Sat, Mar 19, 2011 at 3:00 PM, Oleg Ruchovets wrote:
> We want to insert to hbase on daily basis (hbase 0.90.1 , hadoop append).
> currently we have ~ 10 millio
We want to insert to hbase on daily basis (hbase 0.90.1 , hadoop append).
currently we have ~ 10 million records per day.We use map/reduce to prepare
data , and write it to hbase using chunks of data (5000 puts every chunk)
All process takes 1h 20 minutes. Making some tests verified that wri
Thank you St.Ack
the question is regarding setting heap size for hbase:
As I understand there are 3 processes HBASE master , Hbase Region server ,
Zookeper.
What is the heap size should I set for these processes? I don't remember
where do I see 4000m was recommended , but does it mean that all
See this section in your hbase-env.sh:
# export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false"
# export HBASE_MASTER_OPTS="$HBASE_JMX_BASE
-Dcom.sun.management.jmxremote.port=10101
-javaagent:lib/HelloWorldAgent.jar"
# export HBASE_REGIO
Hi , we started our tests on cluster ( hbase 0.90.1 , hadoop append) ,
I set HBASE_HEAPSIZE to 4000m in hbase-env.sh and got 3 processes which
has heap size 4000m:
my questions are:
1)What is the way to set separately heap size for these processes. In case
I want to give to zookeper less h
There is also this small section in our book:
http://hbase.apache.org/book/schema.html It refers to a useful paper
by Ian Varley on modelling in non-rdbms dbs. Sounds like it would be
good preparatory reading for the project you've just started.
St.Ack
On Sat, Mar 19, 2011 at 9:30 AM, Ted Yu w
Thank you for the info, HFile looks interesting, can't wait to dig into the
code and get a better understanding of HFile !
On Sat, Mar 19, 2011 at 11:28 AM, Harsh J wrote:
> Hello,
>
> On Sat, Mar 19, 2011 at 9:31 PM, Weishung Chung
> wrote:
> > Is all data written through hadoop including thos
See:
http://search-hadoop.com/m/zbKmE14o0Js/wide+tall+hbase+table&subj=Re+Parent+child+relation+go+vertical+horizontal+or+many+tables+
You can also search for related discussion on tall vs. wide tables.
On Sat, Mar 19, 2011 at 8:53 AM, Niels Nuyttens wrote:
> Hi all,
>
> I'm need a database scal
Hello,
On Sat, Mar 19, 2011 at 9:31 PM, Weishung Chung wrote:
> Is all data written through hadoop including those from hbase saved in the
> above formats? It seems like SequenceFile is in key value pair format.
HBase provides its own format called HFile. See
http://hbase.apache.org/apidocs/org/
sreejith:
I leave your second question to other experts.
Let me try to answer schema question.
You didn't mention how URLs and keywords scale (there're 1 trillion URLs in
the world). So I base my suggestion on what you outlined.
First you need to use hash/index to represent each URL.
You can then
I am browsing through the hadoop.io package and was wondering what other
file formats are available in hadoop other than SequenceFile and TFile?
Is all data written through hadoop including those from hbase saved in the
above formats? It seems like SequenceFile is in key value pair format.
Thank y
Hi all,
I'm need a database scaled for large datasets and high throughput. HBase
seemed like the way to go. However, while designing my database schema I
started to doubt my choice, due to the conversion of the current
relational schema to a NoSQL variant. I can't get my head around the
efficient
Have you tried out the mix of importtsv + completebulkload? Would that
work for you?
On Sat, Mar 19, 2011 at 9:18 PM, Vivek Krishna wrote:
> I have around 20 GB of data to be dumped into a hbase table.
>
> Initially, I had a simple java program to put the values in a batch of
> (5000-1) recor
I have around 20 GB of data to be dumped into a hbase table.
Initially, I had a simple java program to put the values in a batch of
(5000-1) records. I tried concurrent inserts and each insert took about
15 seconds to write. Which is very slow and was taking ages.
Next approach was to use i
18 matches
Mail list logo