[off-topic] HBaseCon ticket substitution

2012-05-07 Thread Cristofer Weber
Hi! First of all, I apologize for sending this kind of off-topic e-mail to this list. I'm from Brazil and I tried to buy a ticket to HBaseCon last week but unfortunately tickets for HBaseCon was sold out right after I have had authorization from my company for subscribing for both HBaseCon and

RES: HBase Fault tolerance

2012-07-13 Thread Cristofer Weber
Hi Sever Coprocessors are still new for me, so I don't have a good answer for your second question. But for your first, (as far as I understand) remember that you can send Puts/Deletes in any order, and Memstore is responsible for keeping your data sorted before flushing to a StoreFile, and k

RES: Rowkey hashing to avoid hotspotting

2012-07-16 Thread Cristofer Weber
Hi Anand, As usual, the answer is that 'it depends' :) I think that the main question here is: why are you afraid that this setup would lead to region server hotspotting? Is because you don't know how your production data will seems? Based on what you told about your rowkey, you will query m

CCSHB : PASS!

2012-07-17 Thread Cristofer Weber
Hi there! Need to share this :) Few minutes ago I got my Cloudera Certified Specialist in Apache HBase with 42 correct answers out of 45! I am very grateful to the following people and groups: * All those who have shared knowledge at this list * All those who have contributed f

RES: Rowkey hashing to avoid hotspotting

2012-07-17 Thread Cristofer Weber
and some other columns. I scan the table > with column value filter for this case. > > I will evaluate salting as you have explained. > > Regards, > Anand.C > > On Tue, Jul 17, 2012 at 12:30 AM, Cristofer Weber < > cristofer.we...@neogrid.com> wrote: > > >

RES: Bulk Import & Data Locality

2012-07-18 Thread Cristofer Weber
Hi Alex Here we worked with bulk import creating the HFiles in a MR job and we finish the load calling doBulkLoad method of LoadIncrementalHFiles class (probably the same method used by completebulkload tool) and HFiles generated by reducer tasks are correctly 'adopted' by each corresponding re

RES: Rowkey hashing to avoid hotspotting

2012-07-18 Thread Cristofer Weber
g Hi Cristofer, Data i store is test cell reports about a component. I have many test cell reports for each model number + serial number combination. So to make rowkey unique, I added timstamp. On Wed, Jul 18, 2012 at 3:14 AM, Cristofer Weber < cristofer.we...@neogrid.com> wrote: > So, An

RES: Bulk Import & Data Locality

2012-07-18 Thread Cristofer Weber
Hi Alex, I ran one of our bulk import jobs with partial payload, without proceeding with major compaction, and you are right: Some hdfs blocks are in a different datanode. -Mensagem original- De: Alex Baranau [mailto:alex.barano...@gmail.com] Enviada em: quarta-feira, 18 de julho de 20

Re: [poll] Does anyone run or test against hadoop 0.21, 0.22, 0.23 under HBase 0.92.0+/0.94.0?

2012-07-18 Thread Cristofer Weber
We are using CDH4 Sent from my iPad On Jul 18, 2012, at 18:48, "Tony Dean" wrote: > We are using HBase 0.94.0 against Hadoop 1.0.3, but plan to move to 0.23.x. > > -Original Message- > From: Ted Yu [mailto:yuzhih...@gmail.com] > Sent: Wednesday, July 18, 2012 4:12 PM > To: d...@hbase.

RES: Schema for sorted results

2012-07-24 Thread Cristofer Weber
Hello Hari! Just for the sake of maintaining sorted results, that's it. You have to keep it in lexicographic order. An alternative, for example, could be maintain date|category as RowKey and store your N URLs as members of a Column Family, where padded_visits could be the Column Qualifier and

RES: Schema for sorted results

2012-07-24 Thread Cristofer Weber
Hi Hari, Using date as column qualifier is nice, but I experienced a drawback in a scenario where I left the window open: I kept a large range of dates per RowKey and the amount of rows per region became lower and lower as I started to split regions. You can manage this with TTL if you don't

RES: Hbase Data Model to purge old data.

2012-07-26 Thread Cristofer Weber
Hi there There are some really good ideas in this presentation from HBaseCon: http://www.cloudera.com/resource/video-hbasecon-2012-real-performance-gains-with-real-time-data/ Regards, Cristofer -Mensagem original- De: Alex Baranau [mailto:alex.barano...@gmail.com] Enviada em: quinta-fe

RES: HBase Is So Slow To Save Data?

2012-08-29 Thread Cristofer Weber
There's also a lot of conversions from same values to byte array representation, eg, your NeighborStructure constants. You should do this conversion only once to save time, since you are doing this inside 3 nested loops. Not sure about how much this can improve, but you should try this also. Be

[maybe off-topic?] article: Solving Big Data Challenges for Enterprise Application Performance Management

2012-08-30 Thread Cristofer Weber
Just read this article, "Solving Big Data Challenges for Enterprise Application Performance Management." published this month @ Volume 5, No.12 of Proceedings of the VLDB Endowment, where they measured 6 different databases - Project Voldemort, Redis, HBase, Cassandra, MySQL Cluster and VoltDB -

RES: [maybe off-topic?] article: Solving Big Data Challenges for Enterprise Application Performance Management

2012-08-30 Thread Cristofer Weber
ice to see some of the more recent work done in the area of performance. One thing the paper does touch on is the relative difficulty of standing up the cluster, which has not changed since 0.90.4. I think that's definitely something that could be improved upon. - Dave On Thu, Aug 3

RES: [maybe off-topic?] article: Solving Big Data Challenges for Enterprise Application Performance Management

2012-08-30 Thread Cristofer Weber
k [st...@duboce.net] Enviado: quinta-feira, 30 de agosto de 2012 19:04 Para: user@hbase.apache.org Assunto: Re: [maybe off-topic?] article: Solving Big Data Challenges for Enterprise Application Performance Management On Thu, Aug 30, 2012 at 7:51 AM, Cristofer Weber wrote: > About HMasters, yes,

HBase and unit tests

2012-08-30 Thread Cristofer Weber
Hi there! After I started studying HBase, I've searched for open source projects backed by HBase and I found Titan distributed graph database (you probably heard about it). As soon as I read in their documentation that HBase adapter is experimental and suboptimal (disclaimer here: https://gith

RES: HBase and unit tests

2012-08-31 Thread Cristofer Weber
nd of each test, either cleanup your hbase or use a different "area" per test. best regards, ulrich -- connect on xing or linkedin. sent from my tablet. On 31.08.2012, at 06:46, Stack wrote: > On Thu, Aug 30, 2012 at 4:44 PM, Cristofer Weber > wrote: >> Hi there! >

RES: HBase and unit tests

2012-08-31 Thread Cristofer Weber
ngleton class + prefixing the table names by a random key (to allow multiple tests in parallel on the same cluster without relying on cleanup) + getProperty to decide between starting a mini cluster or connecting to a cluster. HTH, Nicolas On Fri, Aug 31, 2012 at 12:28 PM, Cristofer Weber &l

RES: HBase and unit tests

2012-08-31 Thread Cristofer Weber
TH2, Ulrich On Fri, Aug 31, 2012 at 12:28 PM, Cristofer Weber < cristofer.we...@neogrid.com> wrote: > Hi Sonal, Stack and Ulrich! > > Yes, I should provide more details :$ > > I reached the links you provided when I was searching for a way to > start HBase with JUnit. F