Have a look at FuzzyRowFilter
-Anoop-
On Sat, Jun 22, 2013 at 9:20 AM, Tony Dean wrote:
> I understand more, but have additional questions about the internals...
>
> So, in this example I have 6000 rows X 40 columns in this table. In this
> test my startRow and stopRow do not narrow the scan c
I understand more, but have additional questions about the internals...
So, in this example I have 6000 rows X 40 columns in this table. In this test
my startRow and stopRow do not narrow the scan criterior therefore all 6000x40
KVs must be included in the search and thus read from disk and int
On Fri, Jun 21, 2013 at 4:41 PM, Joel Alexandre wrote:
...
>
> I'm my jar there is a log4j.properties file, but it is being ignored.
>
Your log4j.properties is in the right location inside the job jar? (
http://stackoverflow.com/questions/9081625/override-log4j-properties-in-hadoop
).
St.Ack
Also, if you assign "only" 24GB for the heap, the OS will still use
some of the remaining memory as cache. And you will need some memory
for the hadoop process too.
JM
2013/6/21 Jean-Daniel Cryans :
> 24GB is often cited as an upper limit, but YMMV.
>
> It also depends if you need memory for MapR
Worst case you can modify the log level directly into your code?
JM
2013/6/21 Joel Alexandre :
> hi,
>
> i'm running some Hbase MR jobs through the bin/hadoop jar command line.
>
> How can i change the log level for those specific execution without
> changing hbase/conf/log4j.properties ?
>
> I'm
Lars,
I thought that column family is the locality group and placement columns which
are frequently accessed together into
the same column family (locality group) is the obvious performance improvement
tip. What are the "essential column families" for in this context?
As for original question..
hi,
i'm running some Hbase MR jobs through the bin/hadoop jar command line.
How can i change the log level for those specific execution without
changing hbase/conf/log4j.properties ?
I'm my jar there is a log4j.properties file, but it is being ignored.
Thanks,
Joel
HBase is a key value (KV) store. Each column is stored in its own KV, a row is
just a set of KVs that happen to have the row key (which is the first part of
the key).
I tried to summarize this here:
http://hadoop-hbase.blogspot.de/2011/12/introduction-to-hbase.html)
In the StoreFiles all KVs ar
24GB is often cited as an upper limit, but YMMV.
It also depends if you need memory for MapReduce, if you are using it.
J-D
On Wed, Jun 19, 2013 at 3:17 PM, prakash kadel wrote:
> hi every one,
>
> i am quite new to base and java. I have a few questions.
>
> 1. on the web ui for hbase i have th
I think that the same way writing with more clients helped throughput,
writing with only 1 replication thread will hurt it. The clients in
both cases have to read something (a file from HDFS or the WAL) then
ship it, meaning that you can utilize the cluster better since a
single client isn't consis
Hi,
I hope that you can shed some light on these 2 scenarios below.
I have 2 small tables of 6000 rows.
Table 1 has only 1 column in each of its rows.
Table 2 has 40 columns in each of its rows.
Other than that the two tables are identical.
In both tables there is only 1 row that contains a matc
Thanks Asaf and Anoop.
You are right, data in Memstore is already sorted so flush() would not
block too much with current write stream to another Memstore...
But wait flush() consumes disk IO, which I think would interferes with
WAL writes. Say we have two Memstore A and B on one node. A is d
Hi,
Filter List with multiple filters is not working for us.
We have a table 'country_details' with family 'country' having columns with
prefix 'AGE' and 'SALARY'.
Data is inserted as shown below.
We need to get following rows and columns based on filters
'SRILANKA' if 'AGE' prefix column
Hmm... Yes. Was worth a try :) Should've checked and I even wrote that part of
the code.
I have no good explanation then, and also no good suggestion about how to
improve this.
From: Asaf Mesika
To: "user@hbase.apache.org" ; lars hofhansl
Sent: Friday, J
On Fri, Jun 21, 2013 at 2:38 PM, lars hofhansl wrote:
> Another thought...
>
> I assume you only write to a single table, right? How large are your rows
> on average?
>
> I'm writing to 2 tables: Avg row size for 1st table is 1500 bytes, and the
seconds around is around 800 bytes
>
> Replication
Another thought...
I assume you only write to a single table, right? How large are your rows on
average?
Replication will send 64mb blocks by default (or 25000 edits, whatever is
smaller). The default HTable buffer is 2mb only, so the slave RS receiving a
block of edits (assuming it is a full
Thanks!
default mapper :org.apache.hadoop.hbase.mapreduce.TsvImporterMapper
use the same ts, I can rewrite it to achieve my goal!
在 2013-6-21,下午4:44,Anoop John 写道:
> he ts for each row in the raw data file.. While
> running the tool we can specify which column (in raw data file) should be
>
You can specify max size to indicate the region split (when a region should
get split) But this size is the size of the HFile. To be precise it is the
size of the biggest HFile under that region. If u specify this size as 10G
and when the region is having a file of size bigger than 10G the region
w
Thanks for checking... Interesting. So talking to 3RSs as opposed to only 1
before had no effect on the throughput?
Would be good to explore this a bit more.
Since our RPC is not streaming, latency will effect throughout. In this case
there is latency while all edits are shipped to the RS in the
When adding data to HBase with same key, it is the timestamp (ts) which
determines the version. Diff ts will make diff versions for the cell. But
in case of bulk load using ImportTSV tool, the ts used by one mapper will
be same. All the Puts created from it will have the same ts. The tool
allows us
hello everyone
When I use bulk-load to import datas to HBase, I found that if I have some
rowkey with same values, only one of them imported to HBase!
but I want to import all of them to HBase with different versions, How should
I do?
Original data
mike18:20
mike16:20
mike1
21 matches
Mail list logo