Thanks Thomas, My table has about 10 billion rows with about 12 columns.
-----Original Message----- From: Thomas D'Silva [mailto:[email protected]] Sent: Wednesday, July 22, 2015 12:51 PM To: [email protected] Subject: Re: How fast is upsert select? Zack, It depends on how wide the rows are in your table. On a 8 node cluster, creating an index with 3 columns (char(15),varchar and date) on a 1 billion row table takes about 1 hour 15 minutes. How many rows does your table have and how wide are they? On Wed, Jul 22, 2015 at 8:29 AM, Riesland, Zack <[email protected]> wrote: > Thanks Ravi, > > > > I think I may not have IndexTool in my version of Phoenix. > > > > I’m calling: > HADOOP_CLASSPATH=/usr/hdp/current/hbase-master/conf/:/usr/hdp/current/ > hbase-master/lib/hbase-protocol.jar > hadoop jar /usr/hdp/current/phoenix-client/phoenix-client.jar > org.apache.phoenix.mapreduce.index.IndexTool > > > > And getting a java.lang.ClassNotFoundException: > org.apache.phoenix.mapreduce.index.IndexTool > > > > > > > > From: Ravi Kiran [mailto:[email protected]] > Sent: Wednesday, July 22, 2015 10:36 AM > To: [email protected] > Subject: Re: How fast is upsert select? > > > > Hi , > > > > Since you are saying billions of rows, why don't you try out the > MapReduce route to speed up the process. You can take a look at how > IndexTool.java(https://github.com/apache/phoenix/blob/359c255ba6c67d01 > a810d203825264907f580735/phoenix-core/src/main/java/org/apache/phoenix > /mapreduce/index/IndexTool.java) was written as it does a similar task > of reading from a Phoenix table and writes the data into the target > table using bulk load. > > > > > > Regards > > Ravi > > > > On Wed, Jul 22, 2015 at 6:23 AM, Riesland, Zack > <[email protected]> > wrote: > > I want to play with some options for splitting a table to test performance. > > > > If I were to create a new table and perform an upsert select * to the > table, with billions of rows in the source table, is that like an > overnight operation or should it be pretty quick? > > > > For reference, we have 6 (beefy) region servers in our cluster. > > > > Thanks! > > > >
