Re: Hbase scaning for couple Terabytes data

2016-05-11 Thread Ted Yu
TableInputFormatBase is abstract. Most likely you would use TableInputFormat for the scan. See javadoc of getSplits(): * Calculates the splits that will serve as input for the map tasks. The * number of splits matches the number of regions in a table. FYI On Wed, May 11, 2016 at 6:05 P

Hbase scaning for couple Terabytes data

2016-05-11 Thread Yi Jiang
Hi, Guys Recently we are debating the usage for hbase as our destination for data pipeline job. Basically, we want to save our logs into hbase, and our pipeline can generate 2-4 terabytes data everyday, but our IT department think it is not good idea to scan so hbase, it will cause the performan

RE: Able to search by all the columns and faster than impala

2016-05-11 Thread Dave Birdsall
Hi, If your SQL-on-Hadoop solution supports secondary indexes, you can simply create those on the popular columns to speed up query time. Dave -Original Message- From: Bin Wang [mailto:binwang...@gmail.com] Sent: Wednesday, May 11, 2016 2:22 PM To: user@hbase.apache.org Subject: Able to

Able to search by all the columns and faster than impala

2016-05-11 Thread Bin Wang
Hi there, I have a use case here where I have a table that have low billions of rows and less than 50 columns. This is a very popular data sources where there is a huge demand internally people want to query the table. Nothing more complex than "select * from where .. and .." However, not everyon

hbasecon2016 less than two weeks away; signup and spread the word

2016-05-11 Thread Stack
I just took a pass through the talks. This is the best set we've had at any of the hbasecons. You should all come. End of Message, Thanks all, St.Ack