Re: Error with lines ended with backslash when Bulk Data Loading

2016-12-08 Thread Gabriel Reid
ookeeper Zookeeper quorum to connect to (optional) > -it,–index-table Index table name to load (optional) > > > > > From: Gabriel Reid <g...@gmail.com> > Subject: Re: Error with lines ended with backslash when Bulk Data Loading > Date: 2016-12-09 02:06 (+0800) >

Re: Error with lines ended with backslash when Bulk Data Loading

2016-12-08 Thread Gabriel Reid
Hi Backslash is the default escape character that is used for parsing CSV data when running a bulk import, so it has a special meaning. You can supply a different (custom) escape character with the -e or --escape flag on the command line so that parsing your CSV files that include backslashes

Re: Issue w/ CsvBulkUploadTool when column data has "," character

2016-10-08 Thread Gabriel Reid
Hi Zack, My initial gut feeling is that this doesn't have anything to do with the commas in the input data, but it looks like instead the pipe separator isn't being taken into account. Has this been working for you with other data files? I've got more questions than answers to start with: *

Re: Using CsvBulkInsert With compressed Hive data

2016-09-29 Thread Gabriel Reid
Hi Zack, Am I correct in understanding the the files are under a structure like x/.deflate/csv_file.csv ? In that case, I believe everything under the .deflate directory will simply be ignored, as directories whose name start with a period are considered "hidden" files. However, assuming the

Re: Loading via MapReduce, Not Moving HFiles to HBase

2016-09-29 Thread Gabriel Reid
> > > > R’s > > Ravi Kumar B > > > > *From:* Gabriel Reid [mailto:gabriel.r...@gmail.com] > *Sent:* Wednesday, September 28, 2016 5:51 PM > *To:* user@phoenix.apache.org > *Subject:* Re: Loading via MapReduce, Not Moving HFiles to HBase > > > > H

Re: Loading via MapReduce, Not Moving HFiles to HBase

2016-09-28 Thread Gabriel Reid
Hi Ravi, It looks like those log file entries you posted are from a mapreduce task. Could you post the output of the command that you're using to start the actual job (i.e. console output of "hadoop jar ..."). - Gabriel On Wed, Sep 28, 2016 at 1:49 PM, Ravi Kumar Bommada

Re: Phoenix and HBase data type serialization issue

2016-08-24 Thread Gabriel Reid
er way? > > Best Regards, > ANKIT BEOHAR > > > On Wed, Aug 24, 2016 at 1:31 PM, Gabriel Reid <gabriel.r...@gmail.com> > wrote: >> >> Hi Ankit, >> >> All data stored in HBase is stored in the form of byte arrays. The >> conversion from richer

Re: Phoenix and HBase data type serialization issue

2016-08-24 Thread Gabriel Reid
Hi Ankit, All data stored in HBase is stored in the form of byte arrays. The conversion from richer types (e.g. date) to byte arrays is one of the (many) functionalities included in Phoenix. When you add a date value in the form of a string to HBase directly (bypassing Phoenix), you're simply

Re: JDBC Client issue

2016-08-21 Thread Gabriel Reid
Hi Aaron, I feel like I've seen this one before, but I'm not totally sure. What I would look at first is a possible hbase-xxx version issue. Something along these lines that I've seen in the past is that another uber-jar JDBC driver that is on the classpath also contains hbase or zookeeper

Re: CsvBulkLoadTool with ~75GB file

2016-08-19 Thread Gabriel Reid
> suspect the insert does MapReduce as well or is there some other mechanism > that would scale? > (8) Compaction/Statistics Operation on Aggregate Table > > I really appreciate all the support. We are trying to run a Phoenix TPCH > benchmark and are struggling a bit to unders

Re: CsvBulkLoadTool with ~75GB file

2016-08-19 Thread Gabriel Reid
all(ScannerCallableWithReplicas.java:350) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:324) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126) >

Re: CsvBulkLoadTool with ~75GB file

2016-08-18 Thread Gabriel Reid
Hi Aaron, I'll answered your questions directly first, but please see the bottom part of this mail for important additional details. You can specify the "hbase.mapreduce.bulkload.max.hfiles.perRegion.perFamily" parameter (referenced from your StackOverflow link) on the command line of you

Re: Invoking org.apache.phoenix.mapreduce.CsvBulkLoadTool from phoenix-4.4.0.2.4.0.0-169-client.jar is not working properly

2016-08-03 Thread Gabriel Reid
Hi Radha, This looks to me as if there is an issue in your data somewhere past the first 100 records. The bulk loader isn't supposed to fail due to issues like this. Instead, it's intended to simply report the problem input lines and continue on, but it appears that this isn't happening. Could

Re: error Loading via MapReduce

2016-05-11 Thread Gabriel Reid
> Thanks,I did't found fs.defaultFS property be overwritten . And I have >>> change to use pig to load table data into Phoenix. >>> >>> 2016-05-11 14:23 GMT+08:00 Gabriel Reid <gabriel.r...@gmail.com>: >>> >>>> Another idea: could you check in >

Re: error Loading via MapReduce

2016-05-11 Thread Gabriel Reid
/home/dcos/hadoop-2.7.1/share/hadoop/mapreduce/lib/leveldbjni-all-1.8.jar: >> >> /home/dcos/hadoop-2.7.1/share/hadoop/mapreduce/lib/jackson-core-asl-1.9.13.jar: >> /home/dcos/hadoop-2.7.1/share/hadoop/mapreduce/lib/javax.inject-1.jar: >> >> /home/dcos/hadoop-2.

Re: how to tune phoenix CsvBulkLoadTool job

2016-03-19 Thread Gabriel Reid
parallelism. > But, how would it impact the aggregate queries? > > Vamsi Attluri > > On Wed, Mar 16, 2016 at 9:06 AM Gabriel Reid <gabriel.r...@gmail.com> wrote: >> >> Hi Vamsi, >> >> The first thing that I notice looking at the info that you've posted

Re: how to tune phoenix CsvBulkLoadTool job

2016-03-19 Thread Gabriel Reid
Hi Vamsi, The first thing that I notice looking at the info that you've posted is that you have 13 nodes and 13 salt buckets (which I assume also means that you have 13 regions). A single region is the unit of parallelism that is used for reducers in the CsvBulkLoadTool (or HFile-writing

Re: Does phoenix CsvBulkLoadTool write to WAL/Memstore

2016-03-15 Thread Gabriel Reid
Hi Vamsi, I can't answer your question abotu the Phoenix-Spark plugin (although I'm sure that someone else here can). However, I can tell you that the CsvBulkLoadTool does not write to the WAL or to the Memstore. It simply writes HFiles and then hands those HFiles over to HBase, so the memstore

Re: leveraging hive.hbase.generatehfiles

2016-02-24 Thread Gabriel Reid
Hi Zack, If bulk loading is currently slow or error prone, I don't think that this approach would improve the situation. >From what I understand from that link, this is a way to copy the contents of a Hive table into HFiles. Hive operates via mapreduce jobs, so this is technically a map reduce

Re: PhoenixBulkLoader command as generic user.

2016-01-29 Thread Gabriel Reid
Hi Parth, Setting the "fs.permissions.umask-mode" config setting to "000" should do the job. You can do this in your hadoop config on the machine where you're submitting the job, or just supply it as the leading command-line parameter as follows: hadoop jar phoenix-client.jar

Re: Unable to Use bulkloader to load Control-A delimited file

2015-12-31 Thread Gabriel Reid
gupt...@gmail.com> >>> wrote: >>>> >>>> I dont see 4.5.3 release over here: >>>> http://download.nextag.com/apache/phoenix/ >>>> Is 4.5.3 not released yet? >>>> >>>> On Wed, Dec 30, 2015 at 11:14 AM, anil gupta <anil

Re: [EXTERNAL] Re: Java Out of Memory Errors with CsvBulkLoadTool

2015-12-19 Thread Gabriel Reid
On Fri, Dec 18, 2015 at 9:35 PM, Cox, Jonathan A wrote: > > The Hadoop version is 2.6.2. > I'm assuming the reduce phase is failing with the OOME, is that correct? Could you run "jps -v" to see what the full set of JVM parameters are for the JVM that is running the task that

Re: Java Out of Memory Errors with CsvBulkLoadTool

2015-12-18 Thread Gabriel Reid
On Fri, Dec 18, 2015 at 4:31 PM, Riesland, Zack wrote: > We are able to ingest MUCH larger sets of data (hundreds of GB) using the > CSVBulkLoadTool. > > However, we have found it to be a huge memory hog. > > We dug into the source a bit and found that >

Re: Java Out of Memory Errors with CsvBulkLoadTool

2015-12-18 Thread Gabriel Reid
Hi Jonathan, Sounds like something is very wrong here. Are you running the job on an actual cluster, or are you using the local job tracker (i.e. running the import job on a single computer). Normally an import job, regardless of the size of the input, should run with map and reduce tasks that

Re: Help calling CsvBulkLoadTool from Java Method

2015-12-18 Thread Gabriel Reid
Hi Jonathan, It looks like this is a bug that was relatively recently introduced in the bulk load tool (i.e. that the exit status is not correctly reported if the bulk load fails). I've logged this as a jira ticket: https://issues.apache.org/jira/browse/PHOENIX-2538. This means that for now,

Re: [EXTERNAL] Re: Java Out of Memory Errors with CsvBulkLoadTool

2015-12-18 Thread Gabriel Reid
two different ways (just to be sure): > > export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Xmx48g" > > and also in mapred-site.xml: > > mapred.child.java.opts > -Xmx48g > > > -Original Message- > From: Gabriel Reid [mailto:gabr

Re: Index rows are not updated when the index key updated using bulk loader

2015-12-12 Thread Gabriel Reid
Hi Afshin, That looks like a bug to me, although I'm not too confident about coming up with a good fix for it. There isn't any handling in the bulk load tool for multiple updates to the same row in a single input. Basically, the code assumes that a single given row is only included once in any

Re: CsvBulkUpload not working after upgrade to 4.6

2015-12-12 Thread Gabriel Reid
This looks like an incompatibility between HBase versions (i.e. between the version that Phoenix is built against, and the version that you've got installed on your cluster). The reason that the bulk loader and fat client are causing issues is that they include the linked versions of the hbase

Re: Phoenix 4.4 does not accept null date value in bulk load

2015-11-25 Thread Gabriel Reid
Indeed, this was a regression. It has since been fixed in PHOENIX-1277 [1], and is available in Phoenix 4.4.1 and Phoenix 4.5.0. - Gabriel 1. https://issues.apache.org/jira/browse/PHOENIX-1277 On Wed, Nov 25, 2015 at 4:07 AM, 彭昶勳 wrote: > Hi, > > In Phoenix-4.3.0 or later

Re: Kerberos and bulkload

2015-11-18 Thread Gabriel Reid
Re-adding the user list, which I accidentally left off. On Wed, Nov 18, 2015 at 3:55 PM, Gabriel Reid <gabriel.r...@gmail.com> wrote: > Yes, I believe that's correct, if you change the umask you make the > HFiles readable to all during creation. > > I believe that the alternat

Re: Kerberos and bulkload

2015-11-17 Thread Gabriel Reid
t worked now.. I hope this is the correct > thing to do ? > > conf.set("fs.permissions.umask-mode", "000"); > > > Thanks Again > > Sanooj > > On Wed, Nov 18, 2015 at 12:29 AM, Gabriel Reid <gabriel.r...@gmail.com> > wrote: >> >>

Re: replace CsvToKeyValueMapper with my implementation

2015-10-29 Thread Gabriel Reid
On Thu, Oct 29, 2015 at 6:33 PM, James Taylor wrote: > I seem to remember you starting down that path, Gabriel - a kind of > pluggable transformation for each row. It wasn't pluggable on the input > format, but that's a nice idea too, Ravi. I'm not sure if this is what

Re: Help With CSVBulkLoadTool

2015-10-23 Thread Gabriel Reid
gt;/_SUCCESS > 15/10/23 06:02:19 WARN hbase.HBaseConfiguration: Config option > "hbase.regionserver.lease.period" is deprecated. Instead, use > "hbase.client.scanner.timeout.period" > 15/10/23 06:02:19 INFO util.ChecksumType: Checksum using > org.apache.hadoop.util.Pur

Re: Help With CSVBulkLoadTool

2015-10-22 Thread Gabriel Reid
Hi Zack, I can't give you any information about compatibility of a given Phoenix version with a given version of HDP (because I don't know). However, could you give a bit more info on what you're seeing? Are all import jobs failing with this error for a given set of tables? Or is this a random

Re: Connecting to Phoenix from AquaDataStudio?

2015-09-30 Thread Gabriel Reid
Hi Zack, It looks like there is probably an older version of HBase somewhere (earlier) in the classpath. I don't know anything about Aqua Data Studio, but could it be that it somehow bundles support for HBase 0.94 somewhere (or perhaps there is another JDBC driver on the class path that workds

Re: Connecting to Phoenix from AquaDataStudio?

2015-09-30 Thread Gabriel Reid
doop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:65) > at > org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:106) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.retrieveClusterId(ConnectionManager.

Re: Problems with Phoenix SqlLine loading large amounts of data

2015-09-28 Thread Gabriel Reid
Hi Gaurav, Looking at your DDL statement, I'm guessing that your table is currently made up of 33 regions, which means that the time to do a full count query will take at least as long as it takes to count 27 million rows with a single thread (900 million threads divided by 33 regions). The

Re: Capacity Scheduler Queues in CSVBulkLoadTool?

2015-09-24 Thread Gabriel Reid
Hi Zack, I've never actually tried it, but I don't see any reason why it shouldn't work. Starting the job with the parameter -Dmapreduce.job.queuename= should do the trick, assuming everything else is set up. - Gabriel On Thu, Sep 24, 2015 at 7:05 PM, Riesland, Zack

Re: BulkloadTool issue even after successful HfileLoads

2015-09-23 Thread Gabriel Reid
Hi Dhruv, This is a bug in Phoenix, although it appears that your hadoop configuration is also somewhat unusual. As far as I can see, your hadoop configuration is set up to use the local filesystem, and not hdfs. You can test this by running the following command: hadoop dfs -ls / If that

Re: BulkloadTool issue even after successful HfileLoads

2015-09-23 Thread Gabriel Reid
13944e4ab1195c70ff530ee >> first=\x80\x00\x00\x00\x00\x0009 last=\x80\x00\x00\x00\x00\x01\x092 > > > That's why we explicitly provided ---output as hdfs:// and things atleast > worked. > > -- > Dhruv > > On Wednesday 23 September 2015 06:54 PM, Gabriel Reid wrote

Re: Using Phoenix Bulk Upload CSV to upload 200GB data

2015-09-19 Thread Gabriel Reid
anade <gaurav.kan...@gmail.com> > wrote: > >> Thanks for the pointers Gabriel! Will give it a shot now! >> >> On 16 September 2015 at 12:15, Gabriel Reid <gabriel.r...@gmail.com> >> wrote: >> >>> Yes, there is post-processing that g

Re: Question about IndexTool

2015-09-16 Thread Gabriel Reid
ors "Added a key not lexically larger than previous > key" > > Thanks a lot! > > > On 15 September 2015 at 19:46, Gabriel Reid <gabriel.r...@gmail.com> wrote: >> >> The upsert statements in the MR jobs are used to convert data into the >> appro

Re: Using Phoenix Bulk Upload CSV to upload 200GB data

2015-09-16 Thread Gabriel Reid
t; > Am I missing something simple ? > > Thanks > Gaurav > > > On 12 September 2015 at 11:16, Gabriel Reid <gabriel.r...@gmail.com> wrote: >> >> Around 1400 mappers sounds about normal to me -- I assume your block >> size on HDFS is 128 MB, which works out to

Re: Using Phoenix Bulk Upload CSV to upload 200GB data

2015-09-16 Thread Gabriel Reid
95934997 > File Output Format Counters > Name > Map > Reduce > Total > Bytes Written > <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCoun

Re: Question about IndexTool

2015-09-15 Thread Gabriel Reid
The upsert statements in the MR jobs are used to convert data into the appropriate encoding for writing to an HFile -- the data doesn't actually get pushed to Phoenix from within the MR job. Instead, the created KeyValues are extracted from the "output" of the upsert statement, and the statement

Re: Renaming a column in phoenix

2015-09-12 Thread Gabriel Reid
There isn't a command available to rename or change the data type of a column in Phoenix -- to do something like this, you need to drop a column and then create a new column. If you have existing data that you want to migrate, I would suggest doing the following: 1. Create the new column (with

Re: Using Phoenix Bulk Upload CSV to upload 200GB data

2015-09-12 Thread Gabriel Reid
Around 1400 mappers sounds about normal to me -- I assume your block size on HDFS is 128 MB, which works out to 1500 mappers for 200 GB of input. To add to what Krishna asked, can you be a bit more specific on what you're seeing (in log files or elsewhere) which leads you to believe the data

Re: Help Tuning CsvBulkImport MapReduce

2015-09-01 Thread Gabriel Reid
On Tue, Sep 1, 2015 at 3:04 AM, Behdad Forghani wrote: > In my experience the fastest way to load data is directly write to HFile. I > have measured a performance gain of 10x. Also, if you have binary data or > need to escape characters HBase bulk loader does not escape

Re: Help Tuning CsvBulkImport MapReduce

2015-09-01 Thread Gabriel Reid
On Tue, Sep 1, 2015 at 11:29 AM, Riesland, Zack wrote: > You say I can find information about spills in the job counters. Are you > talking about “failed” map tasks, or is there something else that will help > me identify spill scenarios? "Spilled records" is a counter

Re: Configuring phoenix.query.dateFormatTimeZone

2015-08-20 Thread Gabriel Reid
-12 10:20:21.125 directly at Phoenix? Thanks בתאריך 18 באוג׳ 2015 21:48,‏ Gabriel Reid gabriel.r...@gmail.com כתב: Ok, thanks for those snippets -- I think that's enough to explain what is happening. The biggest cause for confusion here is probably the way that sqlline retrieves values from

Re: Understanding keys

2015-07-23 Thread Gabriel Reid
Filtering a query on the leading columns of the primary key (i.e. [A], [A,B], or [A,B,C]) will give optimal performance. This is because the records are in sorted order based on the combination of [A,B,C], so filtering on a leading subset of the primary key is basically the same as filtering on

Re: Problems with casts and TO_DATE in WHERE clauses in views

2015-07-19 Thread Gabriel Reid
Hi Tom, I've tried your SQL statements with 4.3.1, and the initial one does indeed work in 4.3.1 and later. The view definition that includes a CAST statement still fails (due to an bug in CastParseNode for which I'll post a patch shortly). By the way, the way I tested this (and an easy way to

Re: Help w/ connection issue

2015-07-06 Thread Gabriel Reid
Are you supplying the -z parameter with the ZooKeeper quorum? This will be necessary if ZooKeeper isn't running on the localhost and/or isn't configured in the local configuration (see http://phoenix.apache.org/bulk_dataload.html). - Gabriel On Mon, Jul 6, 2015 at 12:08 PM Riesland, Zack

Re: Bug in CsvBulkLoad tool?

2015-06-25 Thread Gabriel Reid
Hi Zack, No, you don't need to worry about the name of the primary key getting in the way of the rows being added. Like Anil pointed out, the best thing to look at first is the job counters. The relevant ones for debugging this situation are the total map inputs and total map outputs, total

Re: Bug in CsvBulkLoad tool?

2015-06-25 Thread Gabriel Reid
clear up a lot of confusion… *From:* Gabriel Reid [mailto:gabriel.r...@gmail.com] *Sent:* Thursday, June 25, 2015 2:44 PM *To:* user@phoenix.apache.org *Subject:* Re: Bug in CsvBulkLoad tool? Hi Zack, The job counters are available in the YARN resource manager and/or YARN

Re: CsvBulkLoad output questions

2015-06-23 Thread Gabriel Reid
… *From:* Gabriel Reid [mailto:gabriel.r...@gmail.com gabriel.r...@gmail.com] *Sent:* Tuesday, June 23, 2015 2:57 AM *To:* user@phoenix.apache.org *Subject:* Re: How To Count Rows In Large Phoenix Table? Hi Zack, Would it be possible to provide a few more details on what kinds

Re: What's column family name for columns of table created by phoniex create table statement without a specific cf name?

2015-06-23 Thread Gabriel Reid
The default column family name is 0. This is the string containing the character representation of zero (or in other words, a single byte with value 48). And yes, it's possible to read Phoenix tables using the HBase API (although it's of course a lot easier if you go via Phoenix). - Gabriel On

Re: How To Count Rows In Large Phoenix Table?

2015-06-23 Thread Gabriel Reid
Hi Zack, Would it be possible to provide a few more details on what kinds of failures that you're getting, both with the CsvBulkLoadTool, and with the SELECT COUNT(*) query? About question #1, there aren't any known bugs (that I'm aware of) that would cause some records to go missing in the

Re: Date parsing error, or user error.

2015-05-26 Thread Gabriel Reid
/it/java/org/apache/phoenix/mapreduce/CsvBulkLoadToolIT.java#L99-L100 On Sat, May 16, 2015 at 9:14 AM, Gabriel Reid gabriel.r...@gmail.com wrote: Hi Nick, The date format is (if I'm not mistaken) ISO-8601, so I think you'll have to format your date values as 1970-01-01. - Gabriel On Fri

Re: Date parsing error, or user error.

2015-05-16 Thread Gabriel Reid
Hi Nick, The date format is (if I'm not mistaken) ISO-8601, so I think you'll have to format your date values as 1970-01-01. - Gabriel On Fri, May 15, 2015 at 02:02 Nick Dimiduk ndimi...@gmail.com wrote: Heya, Giving the RC a spin, and also investigating the impact of HBASE-13604, I'm

Re: select w/ limit hanging on large tables

2015-05-12 Thread Gabriel Reid
Hi Kiru, How many regions are there on this table? Could you also share some information on the schema of the table (e.g. how many columns are defined)? Does a limit 10 query also hang in this table? Could you also elaborate a bit on the issues you were running into when loading data into the

Re: TO_DATE is not working as expected

2015-05-04 Thread Gabriel Reid
Hi Siva, Yes, that's pretty much correct -- TO_DATE is returning a Date value, which has millisecond granularity -- the fact that you're only seeing a date (with no time component) is due to the way in which the Date is formatted, and not it's internal value. - Gabriel On Sun, May 3, 2015 at

Re: TO_DATE is not working as expected

2015-05-01 Thread Gabriel Reid
Hi Siva, The TO_DATE function returns a java.sql.Date value, and the string representation of the java.sql.Date value is what you're seeing in your sqlline session. The internal long representation of the Date value coming out of TO_DATE will represent the date to millisecond granularity

Re: Timezones in Phoenix

2015-04-16 Thread Gabriel Reid
Hi Matt, How are you viewing the timestamps (or in other words, how are you verifying that they're not in GMT)? The reason I ask is because internally in Phoenix, timestamps are used without a timezone (they're just based on a long, as you've saved in your table). However, the

Re: group by problem

2015-04-06 Thread Gabriel Reid
That certainly looks like a bug. Would it be possible to make a small reproducible test case and if possible, log this in the Phoenix JIRA ( https://issues.apache.org/jira/browse/PHOENIX) ? - Gabriel On Mon, Apr 6, 2015 at 6:10 PM Marek Wiewiorka marek.wiewio...@gmail.com wrote: Hi All, I

Re: UPSERT SELECT works partially

2015-03-30 Thread Gabriel Reid
Hi, I believe Squirrel does some kind of implicit LIMIT 100 on statements, so my guess would be that it's adding a LIMIT 100 to your UPSERT SELECT statement. Could you try the same thing using sqlline to verify if it's a problem there as well? - Gabriel On Mon, Mar 30, 2015 at 12:28 PM Dark

Re: Connection params on phoenix

2015-03-18 Thread Gabriel Reid
You can't set auto-commit via the connection properties or JDBC url in version 3.0.0. However, this is possible (via the AutoCommit property) as of version 3.3 and 4.3. Other client-side properties can be set via the Properties object passed in to DriverManager.connect. - Gabriel On Wed, Mar

Re: Phoenix Multitenancy - sqlline tenant-specific connection

2015-03-13 Thread Gabriel Reid
The correct syntax for a Phoenix JDBC url with a tenant id is as follows: localhost:2181;TenantId=foo Note that the TenantId parameter is capitalized (it's case-sensitive). However (on Linux or Mac at least), it's not currently possible to connect with a tenant-specific connection like this, as

Re: Error: org.joda.time.format.DateTimeFormatter.withZoneUTC()Lorg/joda/time/format/DateTimeFormatter

2015-03-02 Thread Gabriel Reid
client.HConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x24bd5223c471259 15/03/02 15:21:35 INFO zookeeper.ZooKeeper: Session: 0x24bd5223c471259 closed 15/03/02 15:21:35 INFO zookeeper.ClientCnxn: EventThread shut down *From:* Gabriel Reid [mailto:gabriel.r...@gmail.com

Re: Phoenix bulk loading

2015-02-12 Thread Gabriel Reid
way like Hbase loader. What do you say, any thoughts on this? Thanks, Siva. On Wed, Feb 11, 2015 at 11:34 PM, Gabriel Reid gabriel.r...@gmail.com wrote: Hi Siva, If I understand correctly, you want to explicitly supply null values in a CSV file for some fields. In general, this should

Re: Cascading / Scalding Tap to read / write into Phoenix

2015-02-11 Thread Gabriel Reid
I'm not aware of a Cascading Tap for reading/writing to and from Phoenix. Phoenix-specific InputFormat and OutputFormat implementations were recently added to Phoenix, so if there's an easy way to wrap an existing InputFormat and OutputFormat as a Tap in Cascading, then this would probably be the

Re: Phoenix bulk loading

2015-02-11 Thread Gabriel Reid
Hi Siva, If I understand correctly, you want to explicitly supply null values in a CSV file for some fields. In general, this should work by just leaving the field empty in your CSV file. For example, if you have three fields (id, first_name, last_name) in your CSV file, then a record like

Re: Time change when bulk load csv to phoenix

2015-02-10 Thread Gabriel Reid
Hi Thanaphol, Could you elaborate on how you're debugging this issue? The reason I ask is that the JDBC Timestamp class does some of its own formatting when you query it as a string (it formats the string to a timestamp in the local timezone). The general rules are as follows: * the bulk loader

Re: MapReduce bulk load into Phoenix table

2015-01-16 Thread Gabriel Reid
Hi Constantin, The issues you're having sound like they're (probably) much more related to MapReduce than to Phoenix. In order to first determine what the real issue is, could you give a general overview of how your MR job is implemented (or even better, give me a pointer to it on GitHub or

Re: Whether or not multi-thread share the single PhoenixConnection object?

2015-01-13 Thread Gabriel Reid
Hi David, The PhoenixConnection class is not thread-safe, and shouldn't be shared over multiple threads. I think that this is probably the case with quite a few other JDBC drivers as well, so it's generally safer to use a JDBC connection pool if you want to use connections in multiple threads.

Re: JDBC: Result set is null (Select)

2015-01-12 Thread Gabriel Reid
Hi Marco, Because the name of your 'e' column is not quoted in your DDL statement, the column name 'e' internally gets upper-cased to 'E'. If you store values under column-family 'events' and column qualifier 'E', they will show up in Phoenix queries. Alternatively, you can change your DDL

Re: Numbers low-level format in Phoenix

2015-01-11 Thread Gabriel Reid
) = 1234567890 ll.to_byte_array NoMethodError: undefined method 'to_byte_array' for 1234567890:Fixnum -Original Message- From: Gabriel Reid [mailto:gabriel.r...@gmail.com] Sent: Thursday, January 08, 2015 7:30 PM To: user@phoenix.apache.org Subject: Re: Numbers low-level format in Phoenix

Re: Query returning different results in Apache Phoenix and MySQL

2015-01-11 Thread Gabriel Reid
Hi Kunal, I think you'll need to post some additional information to get an answer to your question. You said that MySQL returns 35 rows and Phoenix returns 32 rows, but it's not clear from your description what the rows are that are missing from the Phoenix result, or what it is that makes the

Re: high CPU when using bulk loading

2015-01-07 Thread Gabriel Reid
Hi Noam, It doesn't sound all that surprising that you're CPU bound on a batch import job like this if you consider everything that is going on within the mappers. Let's say you're importing data for a table with 20 columns. For each line of input data, the following is then occurring within the

Re: CSV bulk loading using map reduce

2014-12-23 Thread Gabriel Reid
Hi Noam, I think that the things that most typically can affect MR loading performance are: * number of regions (as this affects the number of reducers used to create the HFiles) * amount of memory used for sort buffers * use of compression on map output With your 32-region salted table, it

Re: CSV bulk loading using map reduce

2014-12-23 Thread Gabriel Reid
for the answer we will look in to it and update The impala is impala parquet table -Original Message- From: Gabriel Reid [mailto:gabriel.r...@gmail.com] Sent: Tuesday, December 23, 2014 2:27 PM To: user@phoenix.apache.org Subject: Re: CSV bulk loading using map reduce Hi Noam, I think

Re: Phoenix batch insert support

2014-12-19 Thread Gabriel Reid
Hi Vamsi, Running upsert statements like that will indeed not work in Phoenix (that grammar isn't supported). What you're trying to accomplish is technically the same as executing multiple upsert statements and then committing at the end of the batch. This can be accomplished by running multiple

Re: Mapreduce bulk csv error

2014-12-18 Thread Gabriel Reid
Hi, This is an issue when using newer versions of HBase with MapReduce, explained here: https://hbase.apache.org/book.html#hbase.mapreduce.classpath The specific command invocation that you need to get around this issue is documented in the Loading via MapReduce section of this page:

Re: Phoenix loading via psql.py - specifying tab separator

2014-12-12 Thread Gabriel Reid
character as separator. On Thu, Dec 11, 2014 at 2:04 AM, Gabriel Reid gabriel.r...@gmail.com wrote: I just discovered how to get this working properly (I had wrongly assumed that simply supplying '\t' was sufficient). In order to supply a tab character as the delimiter, you need to supply

Re: Phoenix loading via psql.py - specifying tab separator

2014-12-10 Thread Gabriel Reid
Hi Rama, Could you give a bit more information, including: * what is the exact full command that you're entering * which version of Phoenix are you using * what is your environment (i.e. which OS are you running on) * what is the exact error (and/or stacktrace) that you're getting Thanks,

Re: Phoenix - loading via mapreduce

2014-12-10 Thread Gabriel Reid
Hi Rama, Sorry, I lost track of this. The steps to set up your environment to run mapreduce will depend on which version of Hadoop you're using, as well as which distribution (i.e. the base Apache release, CDH, HDP, or something else). If you're running the base Apache release, then the docs

Re: Phoenix - loading via mapreduce

2014-12-04 Thread Gabriel Reid
Thanks for pointing out PHOENIX-976 James (I had lost track of that one), but I think that this is a different issue. @Rama, I see you're running on Windows. Can you confirm that you're able to start (non-Phoenix) MapReduce jobs from your Windows machine? In any case, the configuration parameter

Re: bulk loading with dynamic columns

2014-10-17 Thread Gabriel Reid
you'd approach generating hfiles. Would you extend the csv bulk loader? How would you represent dynamic columns in a csv? A general solution is also further complicated by the fact that a dynamic column may have heterogeneous types. -Bob On Thursday, October 16, 2014 12:24 AM, Gabriel Reid

Re: data ingestion

2014-10-10 Thread Gabriel Reid
__ Ralph Perko Pacific Northwest National Laboratory On 10/9/14, 11:17 AM, Gabriel Reid gabriel.r...@gmail.com wrote: Hi Ralph, I think this depends a bit on how quickly you want to get the data into Phoenix after it arrives, what kind

Re: How to change default field delimiter from COMMA to SEMICOLON

2014-10-09 Thread Gabriel Reid
Hi, You've got the usage of the command correct there, but the semi-colon character has a special meaning in most shells. Wrapping it with single quotes should resolve the issue, as follows: ./psql.py z1:/hbase -t NATION ../sample/NATION.csv -d ';' - Gabriel On Thu, Oct 9, 2014 at 5:26

Re: bulk loading using OOZIE

2014-10-06 Thread Gabriel Reid
Hi Noam, Could you post the error message and/or stack trace you're getting when Oozie says that a jar is missing or you don't have permission to read it? - Gabriel On Sun, Oct 5, 2014 at 8:40 AM, Bulvik, Noam noam.bul...@teoco.com wrote: Hi, We are trying to do periodic bulk loading using

Re: Error connecting through SQuirrel

2014-09-04 Thread Gabriel Reid
as we are using hadoop 2.3. Is there anything i am missing here? Thanks, Abe On Fri, Jul 25, 2014 at 8:55 AM, Gabriel Reid gabriel.r...@gmail.com wrote: Hi Sid, The location of the jar file looks correct. However, there seems to be an issue with the build of hadoop2

Re: Using Squirrel Sql Client to connect Phoenix

2014-08-12 Thread Gabriel Reid
Hi, This is due to the permgen space (a pre-allocated portion of memory set up by the JVM) being too small. It looks like you're on Windows, and I'm guessing that you're using a client-mode VM, which I think mean your permgen is 32 MB. I'm not really familiar with SQuirreL on windows, but from

Re: CsvBulkLoadTool error with Phoenix 4.0

2014-08-07 Thread Gabriel Reid
Hi Vadim, Sorry for the long delay on this. Just to be sure, can you confirm that you're using the hadoop-2 build of Phoenix 4.0 on the client when starting up the CsvBulkLoadTool? Even if you are, this may actually require a rebuild of Phoenix using CDH 5.1.0 dependencies. Could you post the

Re: checkandput op

2014-08-06 Thread Gabriel Reid
) at org.apache.hadoop.hbase.client.HTable.checkAndPut(HTable.java:946) at HAdminTest.testCheckNPut(HAdminTest.java:150) at HAdminTest.main(HAdminTest.java:257) Thanks, ~Ashish On Tue, Aug 5, 2014 at 1:21 PM, Gabriel Reid gabriel.r...@gmail.com wrote: Hi Ashish, Could you post the full stack

Re: table seems to get corrupt

2014-08-06 Thread Gabriel Reid
Hi Abe, I believe the second part of the Expected single, aggregated KeyValue error message is Ensure aggregating coprocessors are loaded correctly on server. The fact that a basic scan isn't finding the org.apache.phoenix.filter.ColumnProjectionFilter class also points to the same thing: that

Re: checkandput op

2014-08-05 Thread Gabriel Reid
Hi Ashish, Could you post the full stack trace you're getting when the checkAndPut fails? No immediate reason I can think of as to why this would happen. - Gabriel On Tue, Aug 5, 2014 at 7:57 PM, ashish tapdiya ashishtapd...@gmail.com wrote: Folks, any intuition why this is happening.

Re: counting returns different results

2014-06-11 Thread Gabriel Reid
the same results for relatively small query size, but diverge when the query size is really big (30*70 millions rows) On Tue, Jun 10, 2014 at 2:07 PM, Gabriel Reid gabriel.r...@gmail.com wrote: Hi Sean, That doesn't sound right -- any idea which of the queries (if either) is returning