Re: Bulk loading

2015-04-08 Thread Flavio Pompermaier
, 2015 at 7:52 AM, Flavio Pompermaier pomperma...@okkam.it wrote: Hi all, I have a non-mapreduce process that produce a lot of data that I want to import into HBase through programmatically bulk loading because using TableOutputFormat from client makes my HBase stop working (too many

Bulk loading

2015-04-08 Thread Flavio Pompermaier
Hi all, I have a non-mapreduce process that produce a lot of data that I want to import into HBase through programmatically bulk loading because using TableOutputFormat from client makes my HBase stop working (too many writes in parallel I think). How can I create the necessary HFiles starting

Re: Compute region splits

2014-11-10 Thread Flavio Pompermaier
Any help here? On Fri, Nov 7, 2014 at 5:18 PM, Flavio Pompermaier pomperma...@okkam.it wrote: Hi to all, in the TableInputFormatBase there's a method that computes the splits that depends on the region start/end key. I'd like to further split each split so that to be able to assign work

Re: Region split during mapreduce

2014-11-01 Thread Flavio Pompermaier
the splits by retrying. -- Lars From: Flavio Pompermaier pomperma...@okkam.it To: user@hbase.apache.org Sent: Friday, October 31, 2014 10:23 AM Subject: Re: Region split during mapreduce The problem is that I don't know if what they say at that link is true or not. In the past I

Re: Region split during mapreduce

2014-10-31 Thread Flavio Pompermaier
Is there anybody here..? On Thu, Oct 30, 2014 at 2:28 PM, Flavio Pompermaier pomperma...@okkam.it wrote: Any help about this..? On Wed, Oct 29, 2014 at 9:08 AM, Flavio Pompermaier pomperma...@okkam.it wrote: Hi to all, I was reading http://www.abcn.net/2014/07/spark-hbase-result-keyvalue

Re: Region split during mapreduce

2014-10-31 Thread Flavio Pompermaier
: Flavio: Have you considered using TableSnapshotInputFormat ? See TableMapReduceUtil#initTableSnapshotMapperJob() Cheers On Fri, Oct 31, 2014 at 10:01 AM, Flavio Pompermaier pomperma...@okkam.it wrote: Is there anybody here..? On Thu, Oct 30, 2014 at 2:28 PM, Flavio Pompermaier

Re: Region split during mapreduce

2014-10-30 Thread Flavio Pompermaier
Any help about this..? On Wed, Oct 29, 2014 at 9:08 AM, Flavio Pompermaier pomperma...@okkam.it wrote: Hi to all, I was reading http://www.abcn.net/2014/07/spark-hbase-result-keyvalue-bytearray.html?m=1 and they say still using org.apache.hadoop.hbase.mapreduce.TableInputFormat is a big

Region split during mapreduce

2014-10-29 Thread Flavio Pompermaier
Hi to all, I was reading http://www.abcn.net/2014/07/spark-hbase-result-keyvalue-bytearray.html?m=1 and they say still using org.apache.hadoop.hbase.mapreduce.TableInputFormat is a big problem, your job will fail when one of HBase Region for target HBase table is splitting ! because the original

Re: Error during HBase import

2014-10-28 Thread Flavio Pompermaier
see any of the error logs above ? Looks like setup() should exit the program when clusterIds cannot be initialized. Cheers On Fri, Oct 24, 2014 at 1:53 AM, Flavio Pompermaier pomperma...@okkam.it wrote: Hi all, I'm trying to import in my HBase 0.98 an export of 0.96

Error during HBase import

2014-10-24 Thread Flavio Pompermaier
Hi all, I'm trying to import in my HBase 0.98 an export of 0.96 with the following command sudo -u hdfs hbase org.apache.hadoop.hbase.mapreduce.Driver import X /somedir But I get this error: Error: java.lang.NullPointerException at

Migration from 0.92 to 0.98

2014-09-16 Thread Flavio Pompermaier
Hi guys, I'm reading these days about migrating from 0.94 to 0.98 and it seems not so easy. Unfortunately, I have to update still from 0.92.1-cdh4.1.2 to 0.98 on CDH5..which is the best way to migrate my data in this specific case? Best, Flavio

Re: HBase 0.94 vs 0.98

2014-08-26 Thread Flavio Pompermaier
is open source :) 0.94 will be supported as long as folks still make patches for it. Currently there is still a steady stream of 10-20 fixes/improvements per months. -- Lars From: Flavio Pompermaier pomperma...@okkam.it To: user@hbase.apache.org Sent: Monday

HBase 0.94 vs 0.98

2014-08-25 Thread Flavio Pompermaier
Hi to all, I saw the announcement about the fact that HBase 0.96 is going to be out of maintenance. From a developer perspective, HBase 0.94 is supported by almost all ORM frameworks (Datanucleus, Kundera, Trafodion, etc..) while HBase 0.98 just from Phoenix and Spring Data. How long will HBase

Re: HBase cluster design

2014-05-28 Thread Flavio Pompermaier
to scan HBase you should setCacheBlocks to false Regards, Dhaval From: Flavio Pompermaier pomperma...@okkam.it To: user@hbase.apache.org; Dhaval Shah prince_mithi...@yahoo.co.in Sent: Friday, 23 May 2014 3:16 AM Subject: Re: HBase cluster design

Re: HBase cluster design

2014-05-28 Thread Flavio Pompermaier
, Flavio Pompermaier pomperma...@okkam.itwrote: Thank you all for the great suggestions, I'll try ASAP to test them. Just 2 questions: - why should I set setCacheBlocks to false? - How cai I increasing/decreasing the amount of RAM you provide to block caches and memstores? Best

Region servers crashing during mapreduce

2014-05-20 Thread Flavio Pompermaier
Hi to all, I'm using Cloudera CDH4 (4.5.0) with default parameters and HBase 0.94.6. I'm experiencing a bad behaviour of my mapreduce jobs, where region servers keep crashing. I checked the logs and the region servers seems to die without logging anything..this seems to happen at the 2nd or 3rd

Re: Region servers crashing during mapreduce

2014-05-20 Thread Flavio Pompermaier
(@marcosluis2186http://twitter.com/marcosluis2186 ) http://about.me/marcosortiz On Tuesday, May 20, 2014 02:18:50 PM Flavio Pompermaier wrote: In the attached zip the config files generated by Cloudera. The core-site and the hdfs-site are slightly different if I download them from mapreduce or hbase

Re: HBase cluster design

2014-05-17 Thread Flavio Pompermaier
Could you tell me please in detail the parameters you'd like to see so i can look for them and learn the important ones?i'm using cloudera, cdh4 in one cluster and cdh5 in the other. Best, Flavio On May 17, 2014 2:48 AM, prince_mithi...@yahoo.co.in prince_mithi...@yahoo.co.in wrote: Can you

Re: HBase cluster design

2014-05-16 Thread Flavio Pompermaier
a couple of mapred job). However in general, speaking also with other people using Hbase, it seems that is not very safe to run mapred jobs while updating the table..are we wrong? Best, Flavio On Wed, May 14, 2014 at 7:04 PM, Stack st...@duboce.net wrote: On Tue, May 13, 2014 at 3:14 AM, Flavio

Re: HBase cluster design

2014-05-14 Thread Flavio Pompermaier
at http://hbase.apache.org/book/performance.html ? Cheers On May 13, 2014, at 3:14 AM, Flavio Pompermaier pomperma...@okkam.it wrote: So just to summarize the result of this discussion.. do you confirm me that the last version of HBase should (in theory) support mapreduce jobs on tables

Re: HBase cluster design

2014-05-13 Thread Flavio Pompermaier
HBase clusters can fail easily under heavy load.. Could you suggest me some tuning to avoid the crashing of HBase in such situations? Best, Flavio On Fri, Apr 11, 2014 at 12:06 PM, Flavio Pompermaier pomperma...@okkam.itwrote: Today I was able to catch an error during a mapreduce job

Re: Releasing HSearch 1.0 - Search and Analytics Engine on hadoop/hbase

2014-04-17 Thread Flavio Pompermaier
Is Hsearch able to index also json fields?that would be awesome :) On Apr 17, 2014 1:01 PM, shubhendu.singh shubhendu.si...@bizosys.com wrote: A filter using filterRow() to filter out an entire row, or filterRow(List) to modify the final list of included values, must also override the

Re: HBase cluster design

2014-04-11 Thread Flavio Pompermaier
was logged at WARN level. If you can pastebin more of the region server log before its crash, I would be take a deeper look. BTW I assume your zookeeper quorum was healthy during that period of time. On Fri, Apr 4, 2014 at 7:29 AM, Flavio Pompermaier pomperma...@okkam.it wrote: Yes I

HBase cluster design

2014-04-04 Thread Flavio Pompermaier
Hi to everybody, I have a probably stupid question: is it a problem to run many mapreduce jobs on the same HBase table at the same time? And multiple jobs on different tables on the same cluster? Should I use Hoya to have a better cluster usage..? In my current cluster I noticed that the region

Re: HBase cluster design

2014-04-04 Thread Flavio Pompermaier
. Cheers On Apr 4, 2014, at 3:08 AM, Flavio Pompermaier pomperma...@okkam.it wrote: Hi to everybody, I have a probably stupid question: is it a problem to run many mapreduce jobs on the same HBase table at the same time? And multiple jobs on different tables on the same cluster? Should

HBase indexing and updating

2013-08-28 Thread Flavio Pompermaier
Hi to everybody, I have two questions: - My HBase table is composed by a UUID as a key and xml as content in a single column. Which is at the moment the best option to read all those xml, deserialize to their object representation and add them to Solr (or another indexing system)? The problem

Programmatically create HTable with snappy-compressed columns

2013-07-16 Thread Flavio Pompermaier
Hi to all, I have to programmatically create HTable (from Java) with a compressed (snappy) column. Is it possible to do it from code or do I have to manually create them via hbase shell? If it is possible could someone show me a snippet? Best, Flavio

Re: Programmatically create HTable with snappy-compressed columns

2013-07-16 Thread Flavio Pompermaier
help someone else! Best, Flavio On Tue, Jul 16, 2013 at 3:08 PM, Flavio Pompermaier pomperma...@okkam.itwrote: Hi to all, I have to programmatically create HTable (from Java) with a compressed (snappy) column. Is it possible to do it from code or do I have to manually create them via hbase

Re: Making HBase easier to understand

2013-07-15 Thread Flavio Pompermaier
Very helpful, thanks! On Mon, Jul 15, 2013 at 7:59 AM, Azuryy Yu azury...@gmail.com wrote: impressive. thanks On Jul 15, 2013 9:00 AM, kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com wrote: Hello all, In order to help understand the intricacies of HBase, I started a small pet

Re: Help in designing row key

2013-07-04 Thread Flavio Pompermaier
. Have you see that in Scan you can set start time and end time? On Wednesday, July 3, 2013, Flavio Pompermaier wrote: All my enums produce positive integers so I don't have +/-ve Integer problems. Obviously If I use fixed-length rowKeys I could take away the separator.. Sorry but I'm

Re: Help in designing row key

2013-07-03 Thread Flavio Pompermaier
at FuzzyRowFilter.java Cheers On Tue, Jul 2, 2013 at 10:35 AM, Flavio Pompermaier pomperma...@okkam.it wrote: Thanks for the reply! I thus have two questions more: 1) is it true that filtering on timestamps doesn't affect performance..? 2) could you send me a little snippet of how you would

Re: Help in designing row key

2013-07-03 Thread Flavio Pompermaier
FuzzyRowFilter and other techniques to quickly perform scans. The disadvantage is that you have to normalize the source- integer but I find I can either store that in an enum or cache it for a long time so it's not a big issue. -Mike On Wed, Jul 3, 2013 at 4:05 AM, Flavio Pompermaier pomperma

Re: Help in designing row key

2013-07-03 Thread Flavio Pompermaier
, Jul 3, 2013 at 2:44 PM, Flavio Pompermaier pomperma...@okkam.it wrote: Yeah, I was thinking to use a normalization step in order to allow the use of FuzzyRowFilter but what is not clear to me is if integers must also be normalized or not. I will explain myself better. Suppose that i

Re: Help in designing row key

2013-07-03 Thread Flavio Pompermaier
numbers no issues :) Well when all the parts of the RK is of fixed width u will need any seperator?? -Anoop- On Wed, Jul 3, 2013 at 2:44 PM, Flavio Pompermaier pomperma...@okkam.it wrote: Yeah, I was thinking to use a normalization step in order to allow the use of FuzzyRowFilter but what

Help in designing row key

2013-07-02 Thread Flavio Pompermaier
Hi to everybody, in my use case I have to perform batch analysis skipping old data. For example, I want to process all rows created after a certain timestamp, passed as parameter. What is the most effective way to do this? Should I design my row-key to embed timestamp? Or just filtering by