Re: Java Out of Memory Errors with CsvBulkLoadTool

2015-12-18 Thread Gabriel Reid
On Fri, Dec 18, 2015 at 4:31 PM, Riesland, Zack wrote: > We are able to ingest MUCH larger sets of data (hundreds of GB) using the > CSVBulkLoadTool. > > However, we have found it to be a huge memory hog. > > We dug into the source a bit and found that >

Re: Java Out of Memory Errors with CsvBulkLoadTool

2015-12-18 Thread Gabriel Reid
Hi Jonathan, Sounds like something is very wrong here. Are you running the job on an actual cluster, or are you using the local job tracker (i.e. running the import job on a single computer). Normally an import job, regardless of the size of the input, should run with map and reduce tasks that

Re: Help calling CsvBulkLoadTool from Java Method

2015-12-18 Thread Gabriel Reid
Hi Jonathan, It looks like this is a bug that was relatively recently introduced in the bulk load tool (i.e. that the exit status is not correctly reported if the bulk load fails). I've logged this as a jira ticket: https://issues.apache.org/jira/browse/PHOENIX-2538. This means that for now,

RE: Java Out of Memory Errors with CsvBulkLoadTool

2015-12-18 Thread Riesland, Zack
We are able to ingest MUCH larger sets of data (hundreds of GB) using the CSVBulkLoadTool. However, we have found it to be a huge memory hog. We dug into the source a bit and found that HFileOutputFormat.configureIncrementalLoad(), in using TotalOrderPartitioner and KeyValueReducer,

RE: [EXTERNAL] Re: Java Out of Memory Errors with CsvBulkLoadTool

2015-12-18 Thread Cox, Jonathan A
Gabriel, I am running the job on a single machine in pseudo distributed mode. I've set the max Java heap size in two different ways (just to be sure): export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Xmx48g" and also in mapred-site.xml: mapred.child.java.opts -Xmx48g

Re: Questions: history of deleted records, controlling timestamps

2015-12-18 Thread Thomas D'Silva
John, You can use a connection with a scn to ensure all changes are written with the specified time stamp https://phoenix.apache.org/faq.html#Can_phoenix_work_on_tables_with_arbitrary_timestamp_as_flexible_as_HBase_API We are also working on transaction support using Tephra for the upcoming 4.7

Re: [EXTERNAL] Re: Java Out of Memory Errors with CsvBulkLoadTool

2015-12-18 Thread Gabriel Reid
Hi Jonathan, Which Hadoop version are you using? I'm actually wondering if mapred.child.java.opts is still supported in Hadoop 2.x (I think it has been replaced by mapreduce.map.java.opts and mapreduce.reduce.java.opts). The HADOOP_CLIENT_OPTS won't make a difference if you're running in

RE: [EXTERNAL] Re: Java Out of Memory Errors with CsvBulkLoadTool

2015-12-18 Thread Cox, Jonathan A
Hi Gabriel, The Hadoop version is 2.6.2. -Jonathan -Original Message- From: Gabriel Reid [mailto:gabriel.r...@gmail.com] Sent: Friday, December 18, 2015 11:58 AM To: user@phoenix.apache.org Subject: Re: [EXTERNAL] Re: Java Out of Memory Errors with CsvBulkLoadTool Hi Jonathan, Which

Re: Order by/limit clause on partitioned data

2015-12-18 Thread Sachin Katakdound
Thanks for quick reply. We are using 4.2+ version at the moment, will that be a problem? Or is this a default behavior in all the version of Phoenix? Regards, Sachin > On Dec 13, 2015, at 4:17 PM, James Taylor wrote: > > bq. When this simple query with Order by and

Remote driver error: Encountered exception in sub plan [0] execution

2015-12-18 Thread Youngwoo Kim
Hi, I'm running Phoenix(current master branch) and HBase(1.1.2) My query failed with an error as following [1]. two tables, each table has 1 billion rows. Salted with 20 buckets and compressed with SNAPPY. it's just simple inner join like this: SELECT a.val1 FROM tbl1 a INNER JOIN tbl2 b ON