Re: CsvBulkLoadTool with ~75GB file

2016-08-19 Thread Gabriel Reid
Hi Aaron, How many regions are there in the LINEITEM table? The fact that you needed to bump the hbase.mapreduce.bulkload.max.hfiles.perRegion.perFamily setting up to 48 suggests that the amount of data going into a single region of that table is probably pretty large. Along the same line, I beli

Re: CsvBulkLoadTool with ~75GB file

2016-08-19 Thread John Leach
Gabriel, Do you guys provide pre-split mechanisms (sampling of import/query data, splitting policies, etc.) or does the admin have to determine the split points? I guess that begs the question of how you would do a basic ETL operation in Phoenix? How would you do the following on a 100 gigs

Re: CsvBulkLoadTool with ~75GB file

2016-08-19 Thread Gabriel Reid
Hi John, You can actually pre-split a table when creating it, either by specifying split points in the CREATE TABLE statement[1] or by using salt buckets[2]. In my current use cases I always use salting, but take a look at the salting documentation[2] for the pros and cons of this. Your approache

Re: [ANNOUNCE] Apache Phoenix 4.8.0 released

2016-08-19 Thread Afshin Moazami
I am wondering why many of these interesting features are not listed in official release notes. https://phoenix.apache.org/release_notes.html Best, Afshin On Aug 12, 2016, at 1:25 PM, Ankit Singhal mailto:an...@apache.org>> wrote: Apache Phoenix enables OLTP and operational analytics for Hadoop

Re: [ANNOUNCE] Apache Phoenix 4.8.0 released

2016-08-19 Thread Josh Elser
(-cc other lists) Hi Afshin, The release notes you referenced are more meant to alert users about any issues in the new release that you may run into over previous releases. "Release notes provide details on issues and their fixes which may have an impact on prior Phoenix behavior" - Josh

Re: CsvBulkLoadTool with ~75GB file

2016-08-19 Thread John Leach
Gabriel, Thanks for the response I appreciate it. I struggle to understand how to use split points in the create statement. (1) Creating a table with Split Points: CREATE TABLE stats.prod_metrics ( host char(50) not null, created_date date not null, txn_count bigint CONSTRAINT pk PRIM

Cannot select data from a system table

2016-08-19 Thread Aaron Molitor
Looks like the SYSTEM.FUNCTION table is names with a reserved word. Is this a known bug? 0: jdbc:phoenix:stl-colo-srv073.splicemachine> !tables ++--+-+---+--+++-+--+---

Re: [ANNOUNCE] Apache Phoenix 4.8.0 released

2016-08-19 Thread James Taylor
This is good feedback, Afshin. Thanks for letting us know. I've updated the download page to provide a link to the new fixes/features. Would be great if this link could be dynamic (i.e. always point to the release notes from the last released version). Anyone know how to do this? I've also updated

Re: CsvBulkLoadTool with ~75GB file

2016-08-19 Thread John Leach
Aaron, Looks like a permission issue? org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint threw java.lang.IllegalStateException: Failed to get FileSystem instance java.lang.IllegalStateException: Failed to get FileSystem instance at org.apache.hadoop.hbase.security.access.Sec

Re: CsvBulkLoadTool with ~75GB file

2016-08-19 Thread James Taylor
Maybe this will help? http://phoenix.apache.org/bulk_dataload.html#Permissions_issues_when_uploading_HFiles bq. I struggle to understand how to use split points in the create statement. You can't always use split points - it depends on your schema and the knowledge you have about the data being l