Re: Select dynamic column content

2015-01-07 Thread James Taylor
If the dynamic column approach with the cf.* feature (PHOENIX-374) meets your needs, that's good feedback. FWIW, you would not need to create all the views up front at schema creation time. You can create them on-the-fly. All views share the same, single underlying HBase table, so no HBase metadata

Re: Performance options for doing Phoenix full table scans to complete some data statistics and summary collection work

2015-01-07 Thread James Taylor
Hi Sun, Can you give us a sample DDL and upsert/select query for #1? What's the approximate cluster size and what does the client look like? How much data are you scanning? Are you using multiple column families? We should be able to help tune things to improve #1. Thanks, James On Monday, January

Re: Phoenix in production

2015-01-07 Thread James Taylor
Thanks for sharing your experiences,Vaclav. That's very valuable. Yes, for (1) bad things can happen if a region server doesn't have the Phoenix jar. This was improved as of HBase 0.98.9 with HBASE-12573 and HBASE-12575. For (3), this was fixed as of Phoenix 3.1/4.1 with PHOENIX-1075. If you have

Re: Data type integer representation documentation

2015-01-07 Thread James Taylor
Hi Jamie, This looks like a bug - UNSIGNED_TIMESTAMP should have a unique sql type. Would you mind filing a JIRA? Thanks, James On Wed, Jan 7, 2015 at 2:55 PM, Nick Dimiduk wrote: > Your observation that UNSIGNED_DATE and UNSIGNED_TIMESTAMP share a common > sql type is correct, at least on the 4.

Re: Data type integer representation documentation

2015-01-07 Thread Nick Dimiduk
Your observation that UNSIGNED_DATE and UNSIGNED_TIMESTAMP share a common sql type is correct, at least on the 4.2 branch and on master. On Wed, Jan 7, 2015 at 11:32 AM, Jamie Murray wrote: > Hello, > > > > I am looking to find documentation on the integer representation used by > Phoenix for i

Re: Phoenix in production

2015-01-07 Thread Vaclav Loffelmann
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, here's my 2 cents. We've had few serious issues before deploy on production. 1) deploying new server without Phoenix jar - make sure you have properly configured tool for automatic server maintaining 2) inserting empty string (byte array) to Phoe

Data type integer representation documentation

2015-01-07 Thread Jamie Murray
Hello, I am looking to find documentation on the integer representation used by Phoenix for its data types listed here: http://phoenix.apache.org/language/datatypes.html Here is what I have found so far, this came from a combination of java.sql types for the non-unsigned types and some phoenix

Re: Re: Fwd: Phoenix in production

2015-01-07 Thread anil gupta
Yup, I am aware of Spark HBase integration. Phoenix-Spark integration would be more sweet. :) On Wed, Jan 7, 2015 at 12:40 AM, su...@certusnet.com.cn < su...@certusnet.com.cn> wrote: > Hi Anil, > Well, there are already good opensouce project on github for Spark on > HBase, like the following: >

Re: high CPU when using bulk loading

2015-01-07 Thread Gabriel Reid
Hi Noam, It doesn't sound all that surprising that you're CPU bound on a batch import job like this if you consider everything that is going on within the mappers. Let's say you're importing data for a table with 20 columns. For each line of input data, the following is then occurring within the

Re: RE: high CPU when using bulk loading

2015-01-07 Thread Wangwenli
what kind of disc using , sas or sata ? how much cpu for system /user? also can using jstack to check what is the map are doing ? whether too much map stared in one node? Wangwenli From: Bulvik, Noam Date: 2015-01-07 21:29 To: user@ph

RE: high CPU when using bulk loading

2015-01-07 Thread Puneet Kumar Ojha
What is the cluster size, number of salted buckets? Are you using compression ? : SNAPPY recommended. From: Bulvik, Noam [mailto:noam.bul...@teoco.com] Sent: Wednesday, January 07, 2015 7:00 PM To: user@phoenix.apache.org Subject: RE: high CPU when using bulk loading Only when doing bulk load

RE: high CPU when using bulk loading

2015-01-07 Thread Bulvik, Noam
Only when doing bulk loading and only during mapping phase -Original Message- From: Puneet Kumar Ojha [puneet.ku...@pubmatic.com] Received: רביעי, 07 ינו 2015, 15:03 To: user@phoenix.apache.org [user@phoenix.apache.org] Subject: RE: high CPU when using bulk loading Is the CPU usage 100% a

RE: high CPU when using bulk loading

2015-01-07 Thread Puneet Kumar Ojha
Is the CPU usage 100% all the time OR only while doing bulk loading? From: Bulvik, Noam [mailto:noam.bul...@teoco.com] Sent: Wednesday, January 07, 2015 6:26 PM To: user@phoenix.apache.org Subject: high CPU when using bulk loading Hi, We are tuning our system for bulk loading. We managed to

high CPU when using bulk loading

2015-01-07 Thread Bulvik, Noam
Hi, We are tuning our system for bulk loading. We managed to load ~250M records per hour (~96G of raw input csv data ) on a cluster with 8 nodes. We use MR bulk loading tool with pre split table and salted key. What we currently see is that while Mappers are working we have 100% CPU usage ac

Re: Select dynamic column content

2015-01-07 Thread Sumanta Gh
Thanks James for replying.  The below example is really a smart way to map dynamic columns into static ones. I will implement the idea in another case. But I can not create these views at schema creation time and I really have an infinite set of values for event_type. Keeping a column for all dyn

Re: Re: Fwd: Phoenix in production

2015-01-07 Thread Kristoffer Sjögren
We have been using Phoenix 2.2.3 in production for about a year and I agree with the previous comments. - Mainly a storage for temporal OLAP-like data in single tables without secondary indexes. - Data ingested via pig on hourly basis. - Heavy usage of composite primary keys using skip-scans whene

Re: Re: Fwd: Phoenix in production

2015-01-07 Thread su...@certusnet.com.cn
Hi Anil, Well, there are already good opensouce project on github for Spark on HBase, like the following: https://github.com/cloudera-labs/SparkOnHBase Phoenix integration shall be more convenient based on that. Considering to share our code for using that schema. Thanks, Sun. CertusNet

Re: Re: Fwd: Phoenix in production

2015-01-07 Thread anil gupta
Hi Sun, Phoenix-Spark would be a nice addon if you can open source it. I am planning/thinking to using Spark on HBase for one of my project. ~Anil On Wed, Jan 7, 2015 at 12:17 AM, su...@certusnet.com.cn < su...@certusnet.com.cn> wrote: > Hi, > spark-phoenix integration would be great as Spark c

Re: Re: Fwd: Phoenix in production

2015-01-07 Thread su...@certusnet.com.cn
Hi, spark-phoenix integration would be great as Spark community is greately active now and more and more developers are using Apache Spark. Thanks, Sun. From: James Taylor Date: 2015-01-07 16:10 To: su...@certusnet.com.cn Subject: Re: Fwd: Phoenix in production This is great, Sun! Thank yo

Re: Re: Phoenix in production

2015-01-07 Thread su...@certusnet.com.cn
Hi, Glad to share our experience of using Phoenix in Production. I believe that Siddharth had done sufficient tests and practices about Phoenix performance. Here are some tips about how we are using Phoenix for our projects: 1. We facilitate Phoenix to give convinience for both RD and QA engin

Re: Phoenix in production

2015-01-07 Thread anil gupta
Inline. On Tue, Jan 6, 2015 at 11:54 PM, Justin Workman wrote: > I am also using Phoenix in production and have been now for roughly 6 > months. We adopted Phoenix for most of the same reasons Anil mentions. > > We are connection to a secure cluster without issue. We have also > implemented our

Re: Phoenix in production

2015-01-07 Thread anil gupta
Hi Siddharth, I havent used Phoenix with Storm but I have used HBase with Storm. IMHO, it should be fairly simple to write your own connector(code in Spout/Bolt) to talk with Phoenix. It should mostly be same code that you use in standalone Phoenix java app. Thanks, Anil Gupta On Tue, Jan 6, 20

Re: Select dynamic column content

2015-01-07 Thread James Taylor
Hi Sumanta, Another alternative option is to leverage support for VIEWs in Phoenix ( http://phoenix.apache.org/views.html). In many use cases I've seen where there are hundreds of sparse columns defined for a schema, there's a column that determines *which* sparse columns are applicable for a given