I want to copy the files of one table from one cluster to another cluster .
So I do it by step:
1 . bin/hadoop fs -copyToLocal at A
2. scp files from A to B
3. bin/hadoop fs -copyFromLocal at B
I scan the table and save the data to a file by a file formats which is
defined by me before yesterday
Do you mean 100Mb rows? That seems pretty fast.
On Thu, Mar 31, 2011 at 5:29 PM, Jean-Daniel Cryans wrote:
> Sub-second responses for 100MBs files? You sure that's right?
>
> Regarding proper case studies, I don't think a single one exists.
> You'll find presentations decks about some use cases
Solr/Elastic search is a fine solution, but probably won't be quite as fast
as a well-tuned hbase solution.
One key assumption you seem to be making is that you will store messages
only once. If you are willing to make multiple updates to tables, then you
can arrange the natural ordering of the t
Thanks for your help J.D., answers inline:
On Mar 31, 2011, at 8:00 PM, Jean-Daniel Cryans wrote:
> I wouldn't worry too much at the moment for what seems to be double
> deletes of blocks, I'd like to concentrate on the state of your
> cluster first.
>
> So if you run hbck, do you see any incons
Thanks, please submit a patch and I can try to test it.
Jira is :
https://issues.apache.org/jira/browse/HBASE-3722
-邮件原件-
发件人: jdcry...@gmail.com [mailto:jdcry...@gmail.com] 代表 Jean-Daniel Cryans
发送时间: 2011年4月1日 1:20
收件人: Gaojinchao; user@hbase.apache.org
主题: Re: A lot of data is lost when
This is all determined in hbase-daemon.sh:
https://github.com/apache/hbase/blob/trunk/bin/hbase-daemon.sh#L117
The log4j file sets default values just in case the processes are
started in another way (as far as I understand it).
J-D
On Thu, Mar 31, 2011 at 5:58 PM, Geoff Hendrey wrote:
> whoop
I wouldn't worry too much at the moment for what seems to be double
deletes of blocks, I'd like to concentrate on the state of your
cluster first.
So if you run hbck, do you see any inconsistencies?
In the datanode logs, do you see any exceptions regarding xcievers
(just in case).
In the region
whoops, yep that's the one. Just trying to understand how it related to
the master logfile, and regionserver logfile, and zookeeper logfile.
-geoff
-Original Message-
From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of
Jean-Daniel Cryans
Sent: Thursday, March 31, 2011 5:54 P
Hi all,
I'm modelling a schema for storing and retrieving threaded messages, where, for
planning purposes:
- there are many millions of users.
- a user might have up to 1000 threads.
- each thread might have up to 5 messages (with some threads being
sparse with only
The HBase log4j.properties doesn't have that, but it has hbase.log.file
https://github.com/apache/hbase/blob/trunk/conf/log4j.properties
Is it what you're talking about?
Thx,
J-D
On Thu, Mar 31, 2011 at 5:48 PM, Geoff Hendrey wrote:
> it is in log4j.properties (/conf).
>
> -geoff
>
> -Ori
it is in log4j.properties (/conf).
-geoff
-Original Message-
From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of
Jean-Daniel Cryans
Sent: Thursday, March 31, 2011 5:26 PM
To: user@hbase.apache.org
Subject: Re: hadoop.log.file
Where is that "hadoop.log.file" you're talking a
Sub-second responses for 100MBs files? You sure that's right?
Regarding proper case studies, I don't think a single one exists.
You'll find presentations decks about some use cases if you google a
bit tho.
J-D
On Thu, Mar 31, 2011 at 12:20 PM, Shantian Purkad
wrote:
> Hello,
>
> Does anyone kno
Where is that "hadoop.log.file" you're talking about?
J-D
On Thu, Mar 31, 2011 at 3:22 PM, Geoff Hendrey wrote:
> Hi -
>
>
>
> I was wondering where I can find an explanation of what hbase logs to
> hadoop.log.file. This file is defined in log4j.properties. I see
> DFSClient logging to it, but I
I think this is expected. The caching means that you only get blocks of 2000
rows. And if you go for longer than 60 seconds between blocks, then the
scanner will time out. You could try tuning your caching down to 100 to see if
that works for a bit (although, due to variance in the time you t
That's the correct guess.
J-D
On Thu, Mar 31, 2011 at 4:59 PM, Joseph Boyd
wrote:
> We're using hbase 0.90.0 here, and I'm seeing a curious behavior with my
> scans.
>
> I have some code that does a scan over a table, and for each row
> returned some work to verify the data...
>
> I set the sca
I've been trying to track down some hbase strangeness from what looks to be
lost hbase puts: in one thrift put we insert data into two different column
families at different rowkeys, but only one of the rows is there. There
were no errors to the client or the thrift log, which is a little
disturbi
We're using hbase 0.90.0 here, and I'm seeing a curious behavior with my scans.
I have some code that does a scan over a table, and for each row
returned some work to verify the data...
I set the scan up like so :
byte[] family = Bytes.toBytes("mytable");
Scan scan = new Scan();
scan.setCac
Hi -
I was wondering where I can find an explanation of what hbase logs to
hadoop.log.file. This file is defined in log4j.properties. I see
DFSClient logging to it, but I can't locate a doc describing exactly
what hadoop.log.file is for, by Hbase.
-geoff
On 3/31/11 12:41 PM, Ted Yu wrote:
Adam:
I logged https://issues.apache.org/jira/browse/HBASE-3721
Thanks for opening that. I haven't delved much into the HBase code
previously, but I may take a look into this since it is causing us some
trouble currently.
- Adam
Adam:
I logged https://issues.apache.org/jira/browse/HBASE-3721
Feel free to comment on that JIRA.
On Thu, Mar 31, 2011 at 11:14 AM, Adam Phelps wrote:
> On 3/30/11 8:39 PM, Stack wrote:
>
>> What is slow? The running of the LoadIncrementHFiles or the copy?
>>
>
> Its the LoadIncrementHFiles p
Hello,
Does anyone know of any case studies where HBase is used in production for a
large data volumes (including big files/documents on the scale of few
KBs-100MBs
stored in rows) and giving subsecond responses to online queries?
Thanks and Regards,
Shantian
On 3/30/11 8:39 PM, Stack wrote:
What is slow? The running of the LoadIncrementHFiles or the copy?
Its the LoadIncrementHFiles portion.
If
the former, is it because the table its loading into has different
boundaries than those of the HFiles so the HFiles have to be split?
I'm sure that co
Depends what you're trying to do? Like I said you didn't give us a lot
of information so were pretty much in the dark regarding what you're
trying to achieve.
At first you asked why the files were so big, I don't see the relation
with the log files.
Also I'm not sure why you referred to the numbe
Inline.
J-D
> I assume the block cache tunning key you talk about is
> "hfile.block.cache.size", right? If it is only 20% by default than
> what is the rest of the heap used for? Since there are no fancy
> operations like joins and since I'm not using memory tables the only
> thing I can think of
(sending back to the list, please don't answer to directly to the
sender, always send back to the mailing list)
MasterFileSystem has most of DFS interactions, it seems that
checkFileSystem is never called (it should be) and splitLog catches
the ERROR when splitting but doesn't abort.
Would you mi
So you tried to write to another cluster instead? Because it says you
didn't specify the other cluster correctly, CopyTable contains a help
that describes how that value should be constructed.
J-D
On Thu, Mar 31, 2011 at 2:23 AM, Stuart Scott wrote:
> Hi J-D,
>
> Thanks for the info.
> I tried t
I assume the block cache tunning key you talk about is
"hfile.block.cache.size", right? If it is only 20% by default than
what is the rest of the heap used for? Since there are no fancy
operations like joins and since I'm not using memory tables the only
thing I can think of is the memstore right?
Yeah...excise_regions seem to work
but plug_hole does n't plug the hole..thinks the region still exists in META
May be the issue is with excise_regions..& does n't cleanly remove it..
I also tried
/hbase org.apache.hadoop.hbase.util.Merge
That does n't work for me in 0.20.6..
What are the r
You will have to make sure you are not writing to the table that you are
copying to the local disk.
It seems reasonable to me, but I would suggest trying it out with a small data
set to make sure you get the process down.
Dave
-Original Message-
From: 陈加俊 [mailto:cjjvict...@gmail.com]
If you skip the log files, you are likely dropping data.
St.Ack
On Thu, Mar 31, 2011 at 12:27 AM, 陈加俊 wrote:
> Can I skip the log files?
>
> On Thu, Mar 31, 2011 at 2:17 PM, 陈加俊 wrote:
>
>> I found there is so many log files under the table folder and it is very
>> big !
>>
>>
>> On Thu, Mar 31,
Our session with zookeeper expired so the regionserver shut itself
down probably because of long GC.
Please upgrade to 0.90.x.
St.Ack
On Thu, Mar 31, 2011 at 2:50 AM, 陈加俊 wrote:
> 2011-03-30 20:25:12,798 WARN org.apache.zookeeper.ClientCnxn: Exception
> closing session 0x932ed83611540001 to sun
2011-03-30 20:25:12,798 WARN org.apache.zookeeper.ClientCnxn: Exception
closing session 0x932ed83611540001 to sun.nio.ch.SelectionKeyImpl@54e184e6
java.io.IOException: Read error rc = -1 java.nio.DirectByteBuffer[pos=0
lim=4 cap=4]
at
org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCn
Hi J-D,
Thanks for the info.
I tried this but ended up with the following error. Any ideas?
Exception in thread "main" java.io.IOException: Please specify the peer cluster
as hbase.zookeeper.quorum:zookeeper.znode.parent
at
org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableR
Can I skip the log files?
On Thu, Mar 31, 2011 at 2:17 PM, 陈加俊 wrote:
> I found there is so many log files under the table folder and it is very
> big !
>
>
> On Thu, Mar 31, 2011 at 2:16 PM, 陈加俊 wrote:
>
>> I fond there is so many log files under the table folder and it is very
>> big !
>>
>>
34 matches
Mail list logo