Sorry, we're not on aws but on bare metal On Mon, Mar 9, 2015 at 6:13 PM, Brady, John <john.br...@intel.com> wrote:
> Hi Yohan, > > > > Apologies, I don’t have an answer to your question. > > > > Could I ask a separate question please? Is your cluster on AWS? > > > > I have Apache Phoenix installed on a 5 node cluster with 3 zookeeper nodes > on AWS. Also using Phoenix 4.2 with Hbase 0.98.6 from CDH5.3.2. I put > the phoenix server and client jars in the hbase class path on all nodes and > restarted the cluster. The phoenix command line works on the cluster and > running a JDBC app on the cluster returns data. > > The problem is that I can’t run a JDBC app outside the cluster. > > > > I've read that the link below that there is an issue on AWS where internal > and external IPs get confused and zookeeper can't connect to HBase > properly. Did you have this problem? > > > http://stackoverflow.com/questions/28676561/apache-phoenix-jdbc-connection-zookeeper-error > > > > > As suggested in the link I solved this by creating aliases in /etc/hosts > on the machines in the cluster pointing at internal IP addresses, then on > my local desktop using the same aliases but pointing to the external IPs. > Then, altered my cluster setup to use aliases everywhere instead of IP > addresses. I could run the app on my local machine. But modifying cloud > era config files to point to aliases on the servers ultimately breaks > cloudera and isn’t a viable solution long term. > > > > Thanks > > John > > > > > > > > *From:* Yohan Bismuth [mailto:yohan.bismu...@gmail.com] > *Sent:* Monday, March 09, 2015 5:02 PM > *To:* user@phoenix.apache.org > *Subject:* Phoenix table scan performance > > > > Hello, > > we're currently using Phoenix 4.2 with Hbase 0.98.6 from CDH5.3.2 on our > cluster and we're experiencing some perf issues. > > > > What we need to do is a full table scan over 1 billion rows. We've got 50 > regionservers and approximatively 1000 regions of 1Gb equally distributed > on these rs (which means ~20 regions per rs). Each node has 14 disks and 12 > cores. > > > > A simple "Select count(1) from table" is currently taking 400~500 sec. > > > > We noticed that a range scan over 2 regions located on 2 different rs > seems to be done in parallel (taking 15~20 sec) but a range scan over 2 > regions of a single rs is taking twice this time (about 30~40 sec). We > experience the same result with more than 2 regions. > > > > *Could this mean that parallelization is done at a regionserver level but > not a region level *? in this case 400~500 seconds seems legit with 20~25 > regions per rs. We expected regions of a single rs to be scanned in > parallel, is this a normal behavior or are we doing something wrong ? > > > > Thanks for your help > > ------------------------------------------------------------- > Intel Ireland Limited (Branch) > Collinstown Industrial Park, Leixlip, County Kildare, Ireland > Registered Number: E902934 > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. >