Re: Error with lines ended with backslash when Bulk Data Loading

2016-12-08 Thread Gabriel Reid
ema name (optional) > -z,–zookeeper Zookeeper quorum to connect to (optional) > -it,–index-table Index table name to load (optional) > > > > > From: Gabriel Reid > Subject: Re: Error with lines ended with backslash when Bulk Data Loading

Re: Error with lines ended with backslash when Bulk Data Loading

2016-12-08 Thread Gabriel Reid
Hi Backslash is the default escape character that is used for parsing CSV data when running a bulk import, so it has a special meaning. You can supply a different (custom) escape character with the -e or --escape flag on the command line so that parsing your CSV files that include backslashes lik

Re: Issue w/ CsvBulkUploadTool when column data has "," character

2016-10-08 Thread Gabriel Reid
Hi Zack, My initial gut feeling is that this doesn't have anything to do with the commas in the input data, but it looks like instead the pipe separator isn't being taken into account. Has this been working for you with other data files? I've got more questions than answers to start with: * Whic

Re: Using CsvBulkInsert With compressed Hive data

2016-09-29 Thread Gabriel Reid
Hi Zack, Am I correct in understanding the the files are under a structure like x/.deflate/csv_file.csv ? In that case, I believe everything under the .deflate directory will simply be ignored, as directories whose name start with a period are considered "hidden" files. However, assuming the dat

Re: Loading via MapReduce, Not Moving HFiles to HBase

2016-09-29 Thread Gabriel Reid
t; > R’s > > Ravi Kumar B > > > > *From:* Gabriel Reid [mailto:gabriel.r...@gmail.com] > *Sent:* Wednesday, September 28, 2016 5:51 PM > *To:* user@phoenix.apache.org > *Subject:* Re: Loading via MapReduce, Not Moving HFiles to HBase > > > > Hi Ravi, > > >

Re: Loading via MapReduce, Not Moving HFiles to HBase

2016-09-28 Thread Gabriel Reid
Hi Ravi, It looks like those log file entries you posted are from a mapreduce task. Could you post the output of the command that you're using to start the actual job (i.e. console output of "hadoop jar ..."). - Gabriel On Wed, Sep 28, 2016 at 1:49 PM, Ravi Kumar Bommada wrote: > Hi All, > > I

Re: Phoenix and HBase data type serialization issue

2016-08-24 Thread Gabriel Reid
g the PDataType is some kind of equivalent > and that this is part of the Phoenix JDBC (fat) driver? > > Thanks, > Ryan > > > > >> On 8/24/16, 3:01 AM, "Gabriel Reid" wrote: >> >> Hi Ankit, >> >> All data stored in HBase is stored in th

Re: Phoenix and HBase data type serialization issue

2016-08-24 Thread Gabriel Reid
ards, > ANKIT BEOHAR > > > On Wed, Aug 24, 2016 at 1:31 PM, Gabriel Reid > wrote: >> >> Hi Ankit, >> >> All data stored in HBase is stored in the form of byte arrays. The >> conversion from richer types (e.g. date) to byte arrays is one of the >> (m

Re: Phoenix and HBase data type serialization issue

2016-08-24 Thread Gabriel Reid
Hi Ankit, All data stored in HBase is stored in the form of byte arrays. The conversion from richer types (e.g. date) to byte arrays is one of the (many) functionalities included in Phoenix. When you add a date value in the form of a string to HBase directly (bypassing Phoenix), you're simply sav

Re: Can't find some of our Phoenix primary keys in our HBase table when looking for row keys

2016-08-23 Thread Gabriel Reid
Hi Tom, What's the primary key definition of your table? Does it have salted row keys? In the first example (the one that works) I see a leading byte on the row key, which makes me think that you're using salting. In the second example (the one that isn't working) I see the leading "\x00" being a

Re: JDBC Client issue

2016-08-21 Thread Gabriel Reid
Hi Aaron, I feel like I've seen this one before, but I'm not totally sure. What I would look at first is a possible hbase-xxx version issue. Something along these lines that I've seen in the past is that another uber-jar JDBC driver that is on the classpath also contains hbase or zookeeper classe

Re: CsvBulkLoadTool with ~75GB file

2016-08-19 Thread Gabriel Reid
suspect the insert does MapReduce as well or is there some other mechanism > that would scale? > (8) Compaction/Statistics Operation on Aggregate Table > > I really appreciate all the support. We are trying to run a Phoenix TPCH > benchmark and are struggling a bit to underst

Re: CsvBulkLoadTool with ~75GB file

2016-08-19 Thread Gabriel Reid
0) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:350) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:324) > at > org.a

Re: CsvBulkLoadTool with ~75GB file

2016-08-18 Thread Gabriel Reid
Hi Aaron, I'll answered your questions directly first, but please see the bottom part of this mail for important additional details. You can specify the "hbase.mapreduce.bulkload.max.hfiles.perRegion.perFamily" parameter (referenced from your StackOverflow link) on the command line of you CsvBulk

Re: Invoking org.apache.phoenix.mapreduce.CsvBulkLoadTool from phoenix-4.4.0.2.4.0.0-169-client.jar is not working properly

2016-08-03 Thread Gabriel Reid
Hi Radha, This looks to me as if there is an issue in your data somewhere past the first 100 records. The bulk loader isn't supposed to fail due to issues like this. Instead, it's intended to simply report the problem input lines and continue on, but it appears that this isn't happening. Could yo

Re: Phoenix 4.7 CSVBulk Loading not populating index tables

2016-06-17 Thread Gabriel Reid
Hi Vikash, If I'm not mistaken, the bulk load tool was changed in 4.7 to populate the main table and index tables in a single job (instead of one job per table). However, based on what you're seeing, it sounds like there's a problem with this change. Could you verify that only one index table wa

Re: error Loading via MapReduce

2016-05-11 Thread Gabriel Reid
hbase-site.xml. >> You may take a look at that property. >> >> Thanks, >> Sandeep Nemuri >> ᐧ >> >> On Wed, May 11, 2016 at 1:49 PM, kevin wrote: >> >>> Thanks,I did't found fs.defaultFS property be overwritten . And I hav

Re: error Loading via MapReduce

2016-05-10 Thread Gabriel Reid
-all-1.8.jar: >> >> /home/dcos/hadoop-2.7.1/share/hadoop/mapreduce/lib/jackson-core-asl-1.9.13.jar: >> /home/dcos/hadoop-2.7.1/share/hadoop/mapreduce/lib/javax.inject-1.jar: >> >> /home/dcos/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.7.1.jar: &g

Re: error Loading via MapReduce

2016-05-10 Thread Gabriel Reid
t; > dfs.permissions > false > > > 2016-05-10 14:32 GMT+08:00 kevin : >> >> thanks, what I use is from apache. and hadoop ,hbase was in cluster model >> with one master and three slaves >> >> 2016-05-10 14:17 GMT+08:00 Gabriel Reid : >>> >

Re: error Loading via MapReduce

2016-05-09 Thread Gabriel Reid
Hi, It looks like your setup is using a combination of the local filesystem and HDFS at the same time, so this looks to be a general configuration issue. Are you running on a real distributed cluster, or a single-node setup? Is this a vendor-based distribution (i.e. HDP or CDH), or apache release

Re: how to tune phoenix CsvBulkLoadTool job

2016-03-19 Thread Gabriel Reid
m. > But, how would it impact the aggregate queries? > > Vamsi Attluri > > On Wed, Mar 16, 2016 at 9:06 AM Gabriel Reid wrote: >> >> Hi Vamsi, >> >> The first thing that I notice looking at the info that you've posted >> is that you have 13 nodes a

Re: how to tune phoenix CsvBulkLoadTool job

2016-03-19 Thread Gabriel Reid
Hi Vamsi, The first thing that I notice looking at the info that you've posted is that you have 13 nodes and 13 salt buckets (which I assume also means that you have 13 regions). A single region is the unit of parallelism that is used for reducers in the CsvBulkLoadTool (or HFile-writing MapReduc

Re: Does phoenix CsvBulkLoadTool write to WAL/Memstore

2016-03-15 Thread Gabriel Reid
Hi Vamsi, I can't answer your question abotu the Phoenix-Spark plugin (although I'm sure that someone else here can). However, I can tell you that the CsvBulkLoadTool does not write to the WAL or to the Memstore. It simply writes HFiles and then hands those HFiles over to HBase, so the memstore a

Re: leveraging hive.hbase.generatehfiles

2016-02-24 Thread Gabriel Reid
Hi Zack, If bulk loading is currently slow or error prone, I don't think that this approach would improve the situation. >From what I understand from that link, this is a way to copy the contents of a Hive table into HFiles. Hive operates via mapreduce jobs, so this is technically a map reduce jo

Re: PhoenixBulkLoader command as generic user.

2016-01-30 Thread Gabriel Reid
> James > > On Fri, Jan 29, 2016 at 11:03 PM, Parth Sawant > wrote: >> >> Hi Gabriel, >> This worked perfectly. >> >> Thanks a lot. >> Parth S >> >> On Fri, Jan 29, 2016 at 10:29 PM, Gabriel Reid >> wrote: >>> >>&

Re: PhoenixBulkLoader command as generic user.

2016-01-29 Thread Gabriel Reid
Hi Parth, Setting the "fs.permissions.umask-mode" config setting to "000" should do the job. You can do this in your hadoop config on the machine where you're submitting the job, or just supply it as the leading command-line parameter as follows: hadoop jar phoenix-client.jar org.apache.phoen

Re: Unable to Use bulkloader to load Control-A delimited file

2015-12-31 Thread Gabriel Reid
.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1444) >>> >>> Is my bulkloader command incorrect? >>> >>> >>> Thanks, >>> Anil Gupta >>> >>> On Wed, Dec 30, 2015 at 11:23 AM, anil gupta >>> wrote: >>>> &g

Re: Unable to Use bulkloader to load Control-A delimited file

2015-12-29 Thread Gabriel Reid
Hi Anil, This issue was resolved a while back, via this ticket: https://issues.apache.org/jira/browse/PHOENIX-2238 Unfortunately, that fix is only available starting from Phoenix 4.6 and 4.5.3 (i.e. it wasn't back-ported to 4.4.x). - Gabriel On Wed, Dec 30, 2015 at 1:21 AM, anil gupta wrote: >

Re: [EXTERNAL] Re: Java Out of Memory Errors with CsvBulkLoadTool

2015-12-19 Thread Gabriel Reid
On Fri, Dec 18, 2015 at 9:35 PM, Cox, Jonathan A wrote: > > The Hadoop version is 2.6.2. > I'm assuming the reduce phase is failing with the OOME, is that correct? Could you run "jps -v" to see what the full set of JVM parameters are for the JVM that is running the task that is failing? I can't

Re: [EXTERNAL] Re: Java Out of Memory Errors with CsvBulkLoadTool

2015-12-18 Thread Gabriel Reid
different ways (just to be sure): > > export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Xmx48g" > > and also in mapred-site.xml: > > mapred.child.java.opts > -Xmx48g > > > -Original Message- > From: Gabriel Reid [mailto:g

Re: Java Out of Memory Errors with CsvBulkLoadTool

2015-12-18 Thread Gabriel Reid
On Fri, Dec 18, 2015 at 4:31 PM, Riesland, Zack wrote: > We are able to ingest MUCH larger sets of data (hundreds of GB) using the > CSVBulkLoadTool. > > However, we have found it to be a huge memory hog. > > We dug into the source a bit and found that > HFileOutputFormat.configureIncrementalLoa

Re: Java Out of Memory Errors with CsvBulkLoadTool

2015-12-18 Thread Gabriel Reid
Hi Jonathan, Sounds like something is very wrong here. Are you running the job on an actual cluster, or are you using the local job tracker (i.e. running the import job on a single computer). Normally an import job, regardless of the size of the input, should run with map and reduce tasks that h

Re: Help calling CsvBulkLoadTool from Java Method

2015-12-18 Thread Gabriel Reid
Hi Jonathan, It looks like this is a bug that was relatively recently introduced in the bulk load tool (i.e. that the exit status is not correctly reported if the bulk load fails). I've logged this as a jira ticket: https://issues.apache.org/jira/browse/PHOENIX-2538. This means that for now, ther

Re: CsvBulkUpload not working after upgrade to 4.6

2015-12-12 Thread Gabriel Reid
This looks like an incompatibility between HBase versions (i.e. between the version that Phoenix is built against, and the version that you've got installed on your cluster). The reason that the bulk loader and fat client are causing issues is that they include the linked versions of the hbase jar

Re: Index rows are not updated when the index key updated using bulk loader

2015-12-12 Thread Gabriel Reid
Hi Afshin, That looks like a bug to me, although I'm not too confident about coming up with a good fix for it. There isn't any handling in the bulk load tool for multiple updates to the same row in a single input. Basically, the code assumes that a single given row is only included once in any gi

Re: Phoenix 4.4 does not accept null date value in bulk load

2015-11-25 Thread Gabriel Reid
ch > also take care of date column? > > > On Wed, Nov 25, 2015 at 3:33 AM, Gabriel Reid > wrote: > >> Indeed, this was a regression. It has since been fixed in PHOENIX-1277 >> [1], and is available in Phoenix 4.4.1 and Phoenix 4.5.0. >> >> - Gabriel >>

Re: Phoenix 4.4 does not accept null date value in bulk load

2015-11-25 Thread Gabriel Reid
Indeed, this was a regression. It has since been fixed in PHOENIX-1277 [1], and is available in Phoenix 4.4.1 and Phoenix 4.5.0. - Gabriel 1. https://issues.apache.org/jira/browse/PHOENIX-1277 On Wed, Nov 25, 2015 at 4:07 AM, 彭昶勳 wrote: > Hi, > > In Phoenix-4.3.0 or later version, They change

Re: Kerberos and bulkload

2015-11-18 Thread Gabriel Reid
Re-adding the user list, which I accidentally left off. On Wed, Nov 18, 2015 at 3:55 PM, Gabriel Reid wrote: > Yes, I believe that's correct, if you change the umask you make the > HFiles readable to all during creation. > > I believe that the alternate solutions listed o

Re: Kerberos and bulkload

2015-11-17 Thread Gabriel Reid
is the correct > thing to do ? > > conf.set("fs.permissions.umask-mode", "000"); > > > Thanks Again > > Sanooj > > On Wed, Nov 18, 2015 at 12:29 AM, Gabriel Reid > wrote: >> >> Hi Sanooj, >> >> I believe that this is rela

Re: Kerberos and bulkload

2015-11-17 Thread Gabriel Reid
Hi Sanooj, I believe that this is related to the issue described in PHOENIX-976 [1]. In that case, it's not strictly related to Kerberos, but instead to file permissions (could it be that your dev environment also doesn't have file permissions turned on?) If you look at the comments on that jira

Re: replace CsvToKeyValueMapper with my implementation

2015-10-29 Thread Gabriel Reid
On Thu, Oct 29, 2015 at 6:33 PM, James Taylor wrote: > I seem to remember you starting down that path, Gabriel - a kind of > pluggable transformation for each row. It wasn't pluggable on the input > format, but that's a nice idea too, Ravi. I'm not sure if this is what Noam > needs or if it's some

Re: replace CsvToKeyValueMapper with my implementation

2015-10-29 Thread Gabriel Reid
Hi Noam, That specific piece of code in CsvBulkLoadTool that you referred to allows packaging the CsvBulkLoadTool within a different job jar file, but won't allow setting a different mapper class. The actual setting of the mapper class is done further down in the submitJob method, specifically the

Re: Help With CSVBulkLoadTool

2015-10-23 Thread Gabriel Reid
:02:19 WARN hbase.HBaseConfiguration: Config option > "hbase.regionserver.lease.period" is deprecated. Instead, use > "hbase.client.scanner.timeout.period" > 15/10/23 06:02:19 INFO util.ChecksumType: Checksum using > org.apache.hadoop.util.PureJavaCrc32 > 15/1

Re: Help With CSVBulkLoadTool

2015-10-23 Thread Gabriel Reid
oot cause of my job actually failing. > > I certainly never noticed this before, though. > > The main things that we have changed since these scripts worked cleanly were > upgrading our stack and adding new region servers. > > Does that help at all? > > -Original Message---

Re: Help With CSVBulkLoadTool

2015-10-22 Thread Gabriel Reid
Hi Zack, I can't give you any information about compatibility of a given Phoenix version with a given version of HDP (because I don't know). However, could you give a bit more info on what you're seeing? Are all import jobs failing with this error for a given set of tables? Or is this a random fa

Re: how to unsubscribe?

2015-09-30 Thread Gabriel Reid
Please send a message to user-unsubscr...@phoenix.apache.org to unsubscribe from this list. - Gabriel On Wed, Sep 30, 2015 at 3:05 PM, Ashutosh Sharma wrote: > > > -- > With best Regards: > Ashutosh Sharma

Re: Connecting to Phoenix from AquaDataStudio?

2015-09-30 Thread Gabriel Reid
sterIdZNode(ZKClusterId.java:65) > at > org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:106) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.retrieveClusterId(ConnectionManager.java:858) >

Re: Connecting to Phoenix from AquaDataStudio?

2015-09-30 Thread Gabriel Reid
Hi Zack, It looks like there is probably an older version of HBase somewhere (earlier) in the classpath. I don't know anything about Aqua Data Studio, but could it be that it somehow bundles support for HBase 0.94 somewhere (or perhaps there is another JDBC driver on the class path that workds wi

Re: Problems with Phoenix SqlLine loading large amounts of data

2015-09-28 Thread Gabriel Reid
Hi Gaurav, Looking at your DDL statement, I'm guessing that your table is currently made up of 33 regions, which means that the time to do a full count query will take at least as long as it takes to count 27 million rows with a single thread (900 million threads divided by 33 regions). The most-

Re: Capacity Scheduler Queues in CSVBulkLoadTool?

2015-09-24 Thread Gabriel Reid
Hi Zack, I've never actually tried it, but I don't see any reason why it shouldn't work. Starting the job with the parameter -Dmapreduce.job.queuename= should do the trick, assuming everything else is set up. - Gabriel On Thu, Sep 24, 2015 at 7:05 PM, Riesland, Zack wrote: > Hello, > > > > Can

Re: BulkloadTool issue even after successful HfileLoads

2015-09-23 Thread Gabriel Reid
>> first=\x80\x00\x00\x00\x00\x0009 last=\x80\x00\x00\x00\x00\x01\x092 > > > That's why we explicitly provided ---output as hdfs:// and things atleast > worked. > > -- > Dhruv > > On Wednesday 23 September 2015 06:54 PM, Gabriel Reid wrote: >> >&g

Re: BulkloadTool issue even after successful HfileLoads

2015-09-23 Thread Gabriel Reid
Hi Dhruv, This is a bug in Phoenix, although it appears that your hadoop configuration is also somewhat unusual. As far as I can see, your hadoop configuration is set up to use the local filesystem, and not hdfs. You can test this by running the following command: hadoop dfs -ls / If that c

Re: Using Phoenix Bulk Upload CSV to upload 200GB data

2015-09-19 Thread Gabriel Reid
; total of more than 3.2 GB temp space is required. > > I will of course look at using compression of map output - but just wanted > to check if this is expected behavior on workloads of this size. > > Thanks > Gaurav > > > > On 16 September 2015 at 12:21, Gaurav Kanade >

Re: Using Phoenix Bulk Upload CSV to upload 200GB data

2015-09-16 Thread Gabriel Reid
ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/Shuffle%20Errors/WRONG_LENGTH> > 0 > 0 0 WRONG_MAP > <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209

Re: Using Phoenix Bulk Upload CSV to upload 200GB data

2015-09-16 Thread Gabriel Reid
le ? > > Thanks > Gaurav > > > On 12 September 2015 at 11:16, Gabriel Reid wrote: >> >> Around 1400 mappers sounds about normal to me -- I assume your block >> size on HDFS is 128 MB, which works out to 1500 mappers for 200 GB of >> input. >> >>

Re: Question about IndexTool

2015-09-16 Thread Gabriel Reid
not lexically larger than previous > key" > > Thanks a lot! > > > On 15 September 2015 at 19:46, Gabriel Reid wrote: >> >> The upsert statements in the MR jobs are used to convert data into the >> appropriate encoding for writing to an HFile -- the data doe

Re: Question about IndexTool

2015-09-15 Thread Gabriel Reid
The upsert statements in the MR jobs are used to convert data into the appropriate encoding for writing to an HFile -- the data doesn't actually get pushed to Phoenix from within the MR job. Instead, the created KeyValues are extracted from the "output" of the upsert statement, and the statement is

Re: Renaming a column in phoenix

2015-09-12 Thread Gabriel Reid
There isn't a command available to rename or change the data type of a column in Phoenix -- to do something like this, you need to drop a column and then create a new column. If you have existing data that you want to migrate, I would suggest doing the following: 1. Create the new column (with the

Re: Using Phoenix Bulk Upload CSV to upload 200GB data

2015-09-12 Thread Gabriel Reid
Around 1400 mappers sounds about normal to me -- I assume your block size on HDFS is 128 MB, which works out to 1500 mappers for 200 GB of input. To add to what Krishna asked, can you be a bit more specific on what you're seeing (in log files or elsewhere) which leads you to believe the data nodes

Re: Upsert command with prepared statement syntax

2015-09-10 Thread Gabriel Reid
Hi, Using prepared statements with Phoenix is basically the same as using insert or update statements with any other JDBC driver (see https://docs.oracle.com/javase/tutorial/jdbc/basics/prepared.html for more details on this in general). The one difference is that the actual SQL syntax that you u

Re: Help Tuning CsvBulkImport MapReduce

2015-09-01 Thread Gabriel Reid
On Tue, Sep 1, 2015 at 11:29 AM, Riesland, Zack wrote: > You say I can find information about spills in the job counters. Are you > talking about “failed” map tasks, or is there something else that will help > me identify spill scenarios? "Spilled records" is a counter that is available at the jo

Re: Help Tuning CsvBulkImport MapReduce

2015-09-01 Thread Gabriel Reid
On Tue, Sep 1, 2015 at 3:04 AM, Behdad Forghani wrote: > In my experience the fastest way to load data is directly write to HFile. I > have measured a performance gain of 10x. Also, if you have binary data or > need to escape characters HBase bulk loader does not escape characters. For > my use

Re: Help Tuning CsvBulkImport MapReduce

2015-08-31 Thread Gabriel Reid
If the bulk of the time is being spent in the map phase, then there probably isn't all that much that can be done in terms of tuning that will make a huge difference. However, there may be a few things to look at. You mentioned that HDFS decided to translate the hive export to 257 files -- do you

Re: Configuring phoenix.query.dateFormatTimeZone

2015-08-20 Thread Gabriel Reid
responding to > 2015-08-12 10:20:21.125 directly at Phoenix? > > Thanks > בתאריך 18 באוג׳ 2015 21:48,‏ "Gabriel Reid" כתב: > > Ok, thanks for those snippets -- I think that's enough to explain what is >> happening. The biggest cause for confusion here is pr

Re: Configuring phoenix.query.dateFormatTimeZone

2015-08-18 Thread Gabriel Reid
mple, if I insert the timestamp corresponding to 1-1-2015 > 10:00:00, > > the inserted timestamp column would be 1-1-2015 07:00:00 and so on. > > FYI, I use the DateTimeFormatter class for converting the date (which > comes > > with GMT+3 suffix) to a TimeStamp object as above

Re: Configuring phoenix.query.dateFormatTimeZone

2015-08-17 Thread Gabriel Reid
responding to 1-1-2015 10:00:00, > the inserted timestamp column would be 1-1-2015 07:00:00 and so on. > FYI, I use the DateTimeFormatter class for converting the date (which comes > with GMT+3 suffix) to a TimeStamp object as above for inserting the date as > a TimeStamp. > > - Da

Re: Configuring phoenix.query.dateFormatTimeZone

2015-08-14 Thread Gabriel Reid
Hi David, How are you upserting timestamps? The phoenix.query.dateFormatTimeZone config property only affects string parsing or the TO_DATE function (docs on this are at [1]). If you're using the TO_DATE function, it's also possible to supply a custom time zone in the function call (docs on this a

Re: Understanding keys

2015-07-23 Thread Gabriel Reid
Filtering a query on the leading columns of the primary key (i.e. [A], [A,B], or [A,B,C]) will give optimal performance. This is because the records are in sorted order based on the combination of [A,B,C], so filtering on a leading subset of the primary key is basically the same as filtering on the

Re: Problems with casts and TO_DATE in WHERE clauses in views

2015-07-19 Thread Gabriel Reid
Hi Tom, I've tried your SQL statements with 4.3.1, and the initial one does indeed work in 4.3.1 and later. The view definition that includes a CAST statement still fails (due to an bug in CastParseNode for which I'll post a patch shortly). By the way, the way I tested this (and an easy way to te

Re: Permissions Question

2015-07-07 Thread Gabriel Reid
Hi Zack, There are two options that I know of, and I think that both of them should work. First is that you can supply a custom output directory to the bulk loader using the -o parameter (see http://phoenix.apache.org/bulk_dataload.html). In this way you can ensure that the output directory doesn

Re: Help w/ connection issue

2015-07-06 Thread Gabriel Reid
Are you supplying the -z parameter with the ZooKeeper quorum? This will be necessary if ZooKeeper isn't running on the localhost and/or isn't configured in the local configuration (see http://phoenix.apache.org/bulk_dataload.html). - Gabriel On Mon, Jul 6, 2015 at 12:08 PM Riesland, Zack wrote:

Re: Bug in CsvBulkLoad tool?

2015-06-25 Thread Gabriel Reid
correct? > > > > That would clear up a lot of confusion… > > > > *From:* Gabriel Reid [mailto:gabriel.r...@gmail.com] > *Sent:* Thursday, June 25, 2015 2:44 PM > > > *To:* user@phoenix.apache.org > *Subject:* Re: Bug in CsvBulkLoad tool? > > > >

Re: Bug in CsvBulkLoad tool?

2015-06-25 Thread Gabriel Reid
un the CsvBulkLoad tool at the command line and > have the same SSH window open when it finishes, it is easy to see all the > statistics. > > > > But where I can find this data in the logs? Since these ingests can take > several hours, I sometimes lose my VPN connection and my

Re: Bug in CsvBulkLoad tool?

2015-06-25 Thread Gabriel Reid
Hi Zack, No, you don't need to worry about the name of the primary key getting in the way of the rows being added. Like Anil pointed out, the best thing to look at first is the job counters. The relevant ones for debugging this situation are the total map inputs and total map outputs, total reduc

Re: SocketTimeoutException on Update Statistics

2015-06-24 Thread Gabriel Reid
low. > > What more do I need to do to increase this timeout effectively? > > > > > > phoenix.query.timeoutMs > > 90 > > > > > > > > Error: org.apache.phoenix.exception.PhoenixIOException: Failed after > attempts=36,

Re: What's column family name for columns of table created by phoniex create table statement without a specific cf name?

2015-06-23 Thread Gabriel Reid
The default column family name is "0". This is the string containing the character representation of zero (or in other words, a single byte with value 48). And yes, it's possible to read Phoenix tables using the HBase API (although it's of course a lot easier if you go via Phoenix). - Gabriel O

Re: CsvBulkLoad output questions

2015-06-23 Thread Gabriel Reid
a way to modify my existing table, or would I have to drop it and > start over? > > > > *From:* Gabriel Reid [mailto:gabriel.r...@gmail.com] > *Sent:* Tuesday, June 23, 2015 1:47 PM > *To:* user@phoenix.apache.org > *Subject:* Re: CsvBulkLoad output questions > > > >

Re: CsvBulkLoad output questions

2015-06-23 Thread Gabriel Reid
ws.hasNext(SqlLine.java:2440) > > at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2074) > > at sqlline.SqlLine.print(SqlLine.java:1735) > > at sqlline.SqlLine$Commands.execute(SqlLine.java:3683) > > at sqlline.SqlLine$Commands.sql(SqlL

Re: How To Count Rows In Large Phoenix Table?

2015-06-22 Thread Gabriel Reid
Hi Zack, Would it be possible to provide a few more details on what kinds of failures that you're getting, both with the CsvBulkLoadTool, and with the "SELECT COUNT(*)" query? About question #1, there aren't any known bugs (that I'm aware of) that would cause some records to go missing in the Csv

Re: Date parsing error, or user error.

2015-05-26 Thread Gabriel Reid
ache/phoenix/blob/master/phoenix-core/src/it/java/org/apache/phoenix/mapreduce/CsvBulkLoadToolIT.java#L99-L100 > > On Sat, May 16, 2015 at 9:14 AM, Gabriel Reid > wrote: > >> Hi Nick, >> >> The date format is (if I'm not mistaken) ISO-8601, so I think you'll

Re: Date parsing error, or user error.

2015-05-16 Thread Gabriel Reid
Hi Nick, The date format is (if I'm not mistaken) ISO-8601, so I think you'll have to format your date values as 1970-01-01. - Gabriel On Fri, May 15, 2015 at 02:02 Nick Dimiduk wrote: > Heya, > > Giving the RC a spin, and also investigating the impact of HBASE-13604, I'm > having a spot of tr

Re: select w/ limit hanging on large tables

2015-05-12 Thread Gabriel Reid
Hi Kiru, How many regions are there on this table? Could you also share some information on the schema of the table (e.g. how many columns are defined)? Does a "limit 10" query also hang in this table? Could you also elaborate a bit on the issues you were running into when loading data into the

Re: TO_DATE is not working as expected

2015-05-04 Thread Gabriel Reid
Hi Siva, Yes, that's pretty much correct -- TO_DATE is returning a Date value, which has millisecond granularity -- the fact that you're only seeing a date (with no time component) is due to the way in which the Date is formatted, and not it's internal value. - Gabriel On Sun, May 3, 2015 at 5:2

Re: TO_DATE is not working as expected

2015-05-01 Thread Gabriel Reid
Hi Siva, The TO_DATE function returns a java.sql.Date value, and the string representation of the java.sql.Date value is what you're seeing in your sqlline session. The internal long representation of the Date value coming out of TO_DATE will represent the date to millisecond granularity however.

Re: CsvBulkLoadTool question

2015-04-23 Thread Gabriel Reid
Hi Kiru, The CSV bulk loader won't automatically make multiple regions for you, it simply loads data into the existing regions of the table. In your case, it means that all data has been loaded into a single region (as you're seeing), which means that any kind of operations that scan over a large

Re: Timezones in Phoenix

2015-04-16 Thread Gabriel Reid
Hi Matt, How are you viewing the timestamps (or in other words, how are you verifying that they're not in GMT)? The reason I ask is because internally in Phoenix, timestamps are used without a timezone (they're just based on a long, as you've saved in your table). However, the java.sql.Timestamp'

Re: group by problem

2015-04-06 Thread Gabriel Reid
That certainly looks like a bug. Would it be possible to make a small reproducible test case and if possible, log this in the Phoenix JIRA ( https://issues.apache.org/jira/browse/PHOENIX) ? - Gabriel On Mon, Apr 6, 2015 at 6:10 PM Marek Wiewiorka wrote: > Hi All, > I came across a weird situati

Re: bulk loader MR counters

2015-04-03 Thread Gabriel Reid
About the record count differences: the output values of the mapper are KeyValues, not Phoenix rows. Each column's value is stored in separate KeyValue, so one input row with a single-column primary key and five other columns will result in 6 output KeyValues: one KeyValue for each of the non-prima

Re: Phoenix sqlline command !tables issue

2015-04-02 Thread Gabriel Reid
Hi Ashish, The other columns are being cut off by the size of your terminal window. If you make your window larger, you'll be able to see the additional columns. - Gabriel On Thu, Apr 2, 2015 at 9:46 PM ashish tapdiya wrote: > I am issuing command "!tables" using sqlline to see all tables. Th

Re: UPSERT SELECT works partially

2015-03-30 Thread Gabriel Reid
Hi, I believe Squirrel does some kind of implicit "LIMIT 100" on statements, so my guess would be that it's adding a "LIMIT 100" to your UPSERT SELECT statement. Could you try the same thing using sqlline to verify if it's a problem there as well? - Gabriel On Mon, Mar 30, 2015 at 12:28 PM Dar

Re: phoenix psql.py tool call sql script and send the parameter

2015-03-28 Thread Gabriel Reid
Hi, No, there isn't currently any way to do any kind of variable/parameter replacement via psql.py. I think there must be some clever ways to do this automatically on the command line outside of psql, although I'm not aware of any specific way of doing it. - Gabriel On Fri, Mar 27, 2015 at 9:27

Re: Connection params on phoenix

2015-03-18 Thread Gabriel Reid
You can't set auto-commit via the connection properties or JDBC url in version 3.0.0. However, this is possible (via the AutoCommit property) as of version 3.3 and 4.3. Other client-side properties can be set via the Properties object passed in to DriverManager.connect. - Gabriel On Wed, Mar 18

Re: Phoenix Multitenancy - sqlline tenant-specific connection

2015-03-13 Thread Gabriel Reid
The correct syntax for a Phoenix JDBC url with a tenant id is as follows: localhost:2181;TenantId=foo Note that the TenantId parameter is capitalized (it's case-sensitive). However (on Linux or Mac at least), it's not currently possible to connect with a tenant-specific connection like this, as t

Re: CSV bulk loading question

2015-03-10 Thread Gabriel Reid
Hi Noam, >From the perspective of row-based data storage you're writing the same data, but from the perspective of HBase it's not the same at all. This is because HBase stores everything as KeyValues, with one KeyValue per column in Phoenix. Lets say you've got a table with a single primary key c

Re: Error: org.joda.time.format.DateTimeFormatter.withZoneUTC()Lorg/joda/time/format/DateTimeFormatter

2015-03-02 Thread Gabriel Reid
CDH 5.3.1. There are a couple of workarounds mentioned in the linked page for the meantime, but I think the easiest one which should work is just to do this before starting your import job: export HADOOP_USER_CLASSPATH_FIRST=true - Gabriel On Mon, Mar 2, 2015 at 2:32 PM, Gabriel Reid wrote

Re: Error: org.joda.time.format.DateTimeFormatter.withZoneUTC()Lorg/joda/time/format/DateTimeFormatter

2015-03-02 Thread Gabriel Reid
)Lorg/joda/time/format/DateTimeFormatter; > > 15/03/02 15:21:35 INFO client.HConnectionManager$HConnectionImplementation: > Closing zookeeper sessionid=0x24bd5223c471259 > > 15/03/02 15:21:35 INFO zookeeper.ZooKeeper: Session: 0x24bd5223c471259 > closed > > 15

Re: Error: org.joda.time.format.DateTimeFormatter.withZoneUTC()Lorg/joda/time/format/DateTimeFormatter

2015-03-02 Thread Gabriel Reid
Is there more of a stack trace you could post around the error? I'm guessing that this is something along the lines of NoSuchMethodException. Could you also post some info on your environment? Which version of Hadoop and HBase are you using? Is there an alternate version of joda-time on your class

Re: BigDecimal casting issue?

2015-02-26 Thread Gabriel Reid
Hi Matt, Although the object representation of the Phoenix DECIMAL type is BigDecimal, the byte-level encoding is different than that of Bytes.toBytes(BigDecimal). The reason for this is to allow for ordering of these values based comparison of binary values. Sorting the values with binary value c

Re: Phoenix bulk loading

2015-02-12 Thread Gabriel Reid
it. Use Phoenix just for sql queries. > > I think we should enhance the Phoenix data loader in the same way like Hbase > loader. What do you say, any thoughts on this? > > Thanks, > Siva. > > On Wed, Feb 11, 2015 at 11:34 PM, Gabriel Reid > wrote: >> >> Hi

Re: Line separator option in Bulk loader

2015-02-11 Thread Gabriel Reid
Hi Siva, Handling multi-line records with the Bulk CSV Loader (i.e. MapReduce-based loader) definitely won't support records split over multiple input lines. It could be that loading via PSQL (as described on http://phoenix.apache.org/bulk_dataload.html) will allow multi-line records, as this migh

  1   2   >