Re: Spaces disappear in HBase?

2011-10-03 Thread Andrew Purtell
Keys and values need to be base64 encoded in all non-binary representations, XML and JSON currently.   Best regards,    - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) From: Ben West bwsithspaw...@yahoo.com To:

Re: Hbase-Hive integration performance issues

2011-09-30 Thread Andrew Purtell
I believe this is the latest status:     https://issues.apache.org/jira/browse/HIVE-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel Suggest following up to d...@hive.apache.org and/or u...@hive.apache.org. Best regards,    - Andy Problems worthy of attack prove their

Re: Creation of Hfiles for multiple tables using Single Bulk Load Job?

2011-09-23 Thread Andrew Purtell
Try this: https://gist.github.com/1237770 See line 135. Best regards,    - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) From: Shuja Rehman shujamug...@gmail.com To: user@hbase.apache.org; Andrew Purtell

Re: [announce] Accord: A high-performance coordination service for write-intensive workloads

2011-09-23 Thread Andrew Purtell
Some code seems licensed under the GPLv2, some under the LGPL.   Best regards,     - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) - Original Message - From: OZAWA Tsuyoshi ozawa.tsuyo...@lab.ntt.co.jp To: user@hbase.apache.org;

Re: Using TTL tout purge data automatically ?

2011-09-23 Thread Andrew Purtell
This is an occasional source of confusion. Curious if anyone thinks that TTLs in milliseconds makes sense. My opinion is no, of what practical use is data with a lifetime of 1, 10, or 100 ms? - Original Message - From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org

Re: Creation of Hfiles for multiple tables using Single Bulk Load Job?

2011-09-22 Thread Andrew Purtell
From: Shuja Rehman shujamug...@gmail.com I am using bulk load to insert data into hbase. Its runs fine if I need to insert in one table. But Now, I have the requirement in which I need to insert data into more than  one table. We started some support for this here: 

Re: Cannot create table over REST

2011-09-13 Thread Andrew Purtell
Try curl -v -X PUT \   http://localhost:8080/test/schema \   -H Accept: application/json \   -H Content-Type: application/json \   -d '{name:test,ColumnSchema:[{name:data}]}'   You should see: PUT /test/schema HTTP/1.1 User-Agent: curl/7.21.3 (x86_64-pc-linux-gnu) libcurl/7.21.3

Re: Cannot create table over REST

2011-09-13 Thread Andrew Purtell
are not very clear then -- I've copied over the JSON from `GET /table/schema` example (which contains `@name`, not `name` in the column definition). I think adding examples for creating tables (and rows, for that matter), would be very handy. Karel On 14.Sep, 2011, at 1:11 , Andrew Purtell wrote

Re: scanner deadlock?

2011-09-12 Thread Andrew Purtell
From: Sandy Pratt prat...@adobe.com TLDR: OpenJDK ~= Oracle JDK, so why not use it? This advice is given out of an abundance of caution. Some have been burned in production by bad JVM versions in the past. Oracle's 1.6.0_u18 is a particularly egregious example, it will segfault all over the

Re: HBase Vs CitrusLeaf?

2011-09-07 Thread Andrew Purtell
While generalizations are dangerous, the one place when C++ code could shine over java (JVM really) is one does not have to fight the GC. Yes. That being said, the folks working on hbase have been actively been addressing this problem to the extent possible in pure java by using unmanaged

Re: Get query in REST for HBASE

2011-09-01 Thread Andrew Purtell
Because keys in HBase are byte[], the REST interface base-64 encodes row key, column name, and the value if you choose XML representation. Though I do receive timestamp, but rest of column familes(info:name and info:age) are not sent as a response. You say you are looking for two values.

Re: HBase and Cassandra on StackOverflow

2011-09-01 Thread Andrew Purtell
So the setup starts by recommending rolling your own hadoop (pain in the ass). OR using a beta ( :(  ). CDH3 is not in beta. The latest version is release, CDH3U1. I think most people at this point will just use CDH, so all of that about rolling your own compile of Hadoop sources -- that is

Re: HBase and Cassandra on StackOverflow

2011-09-01 Thread Andrew Purtell
From: Michael Segel michael_se...@hotmail.com Can't we just all get along? :-) My personal introduction to Cassandra came maybe in the 2009 timeframe. We evaluated it and HBase at the time and chose HBase. No point to discuss why, the world has changed many times over. From there, my

Re: Coprocessors?

2011-08-31 Thread Andrew Purtell
We have both backported coprocessors to our 0.90-ish HBase (FrankenBase?) and use them in production with security enabled -- HBASE-3025, but more recent code than the patch on the issue. This code ported to HBase trunk is here: https://github.com/trendmicro/hbase/tree/security Backporting is

Re: HBase and Cassandra on StackOverflow

2011-08-31 Thread Andrew Purtell
http://www.quora.com/How-does-HBase-write-performance-differ-from-write-performance-in-Cassandra-with-consistency-level-ALL Thanks, that was what I was referring to earlier in this thread. Now bookmarked. Comments there from those more knowledgable about Cassandra than I seem to indicate that

Re: HBase and Cassandra on StackOverflow

2011-08-30 Thread Andrew Purtell
Hi Chris, Appreciate your answer on the post. Personally speaking however the endless Cassandra vs. HBase discussion is tiresome and rarely do blog posts or emails in this regard shed any light. Often, Cassandra proponents mis-state their case out of ignorance of HBase or due to commercial or

Re: HBase and Cassandra on StackOverflow

2011-08-30 Thread Andrew Purtell
) From: Sam Seigal selek...@yahoo.com To: user@hbase.apache.org; Andrew Purtell apurt...@apache.org Cc: hbase-u...@hadoop.apache.org hbase-u...@hadoop.apache.org Sent: Tuesday, August 30, 2011 7:35 PM Subject: Re: HBase and Cassandra on StackOverflow A question inline: On Tue, Aug 30

Re: HBase and Cassandra on StackOverflow

2011-08-30 Thread Andrew Purtell
Hi Chris, Would you mind if I paraphrase your responses on StackOverflow? Go right ahead. Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) From: Chris Tarnas c...@email.com To: Andrew

Re: HBase and Cassandra on StackOverflow

2011-08-30 Thread Andrew Purtell
on StackOverflow On Aug 30, 2011, at 2:47 AM, Andrew Purtell wrote: Better to focus on improving HBase than play whack a mole. Absolutely.  So let's talk about improving HBase.  I'm speaking here as someone who has been learning about and experimenting with HBase for more than six months. HBase

Re: HBase and Cassandra on StackOverflow

2011-08-30 Thread Andrew Purtell
) From: Sam Seigal selek...@yahoo.com To: user@hbase.apache.org; Andrew Purtell apurt...@apache.org Cc: hbase-u...@hadoop.apache.org hbase-u...@hadoop.apache.org Sent: Wednesday, August 31, 2011 3:22 AM Subject: Re: HBase and Cassandra on StackOverflow Will the write call to HBase block

Re: HBase and Cassandra on StackOverflow

2011-08-30 Thread Andrew Purtell
be in the write ahead log. ...joe On Tue, Aug 30, 2011 at 9:17 AM, Andrew Purtell apurt...@apache.org wrote: Is the replication strategy for HBase completely reliant on HDFS' block replication pipelining ? Yes. Is this replication process asynchronous ? No. Best regards

Re: Facing issues in rest webservice with Hbase using Ruby

2011-08-26 Thread Andrew Purtell
Stuti, 2011-08-25 15:10:56,807 ERROR org.mortbay.log: /api/userstable java.lang.RuntimeException: org.apache.hadoop.hbase.TableNotFoundException: api Whatever you are using to communicate with the HBase REST server was designed for the previous version of it. URLs to the REST interface no

Re: The number of fd and CLOSE_WAIT keep increasing.

2011-08-22 Thread Andrew Purtell
We are running cdh3u0 hbase/hadoop suites on 28 nodes.  For your information, CDHU1 does contain this:   Author: Eli Collins e...@cloudera.com   Date:   Tue Jul 5 16:02:22 2011 -0700       HDFS-1836. Thousand of CLOSE_WAIT socket.       Reason: Bug       Author: Bharath Mundlapudi       Ref:

Re: Announcing Crux - a reporting and charting application for HBase

2011-08-22 Thread Andrew Purtell
Wow!   Best regards,    - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) - Original Message - From: Sonal Goyal sonalgoy...@gmail.com To: user@hbase.apache.org Cc: Sent: Monday, August 22, 2011 12:02 AM Subject: Announcing Crux - a

Re: About puppet and fabric (WAS: operational overhead for HBase)

2011-08-18 Thread Andrew Purtell
From: Aravind Gottipati arav...@freeshell.org  imo, puppet is okay with configuration management tasks, but not really great at orchestrating a sequence of steps across multiple machines. This is our experience as well. We use Puppet to maintain a synchronized static configuration for the

Re: TTL for cell values

2011-08-14 Thread Andrew Purtell
 When I was talking to someone the other day about the current TTL policy, he was like WTF, who would want that, it eats your data?   I don't think anyone is well served by that kind of shallow analysis.  The TTL feature was introduced for the convenience of having the system automatically

Re: TTL for cell values

2011-08-14 Thread Andrew Purtell
 It's part of the mindset shift you have to go through coming from a database world to a NoSQL world This is useful. If you have more insights like this Ian and care to share them, I think we would be really interested to hear them. Best regards,    - Andy Problems worthy of attack prove

Re: Allow RegionCoprocessorEnvironment to register custom scanners?

2011-08-12 Thread Andrew Purtell
Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) From: lars hofhansl lhofha...@yahoo.com To: Andrew Purtell apurt...@apache.org; user@hbase.apache.org user@hbase.apache.org Sent: Thursday, August 11, 2011 11:19 PM Subject

Re: Allow RegionCoprocessorEnvironment to register custom scanners?

2011-08-09 Thread Andrew Purtell
) - Original Message - From: lars hofhansl lhofha...@yahoo.com To: user@hbase.apache.org user@hbase.apache.org; Andrew Purtell apurt...@apache.org Cc: Sent: Monday, August 8, 2011 7:53 PM Subject: Re: Allow RegionCoprocessorEnvironment to register custom scanners? I see.I just didn't see how

Re: Allow RegionCoprocessorEnvironment to register custom scanners?

2011-08-08 Thread Andrew Purtell
The RegionObserver already wraps all of the scanner operations. RegionObserver.preScannerOpen can create an InternalScanner and return it exactly as you propose with HRegionServer.addScanner(InternalScanner) .  preScannerOpen takes a Scan object. Only if preScannerOpen does not return an

Re: Monitoring

2011-07-29 Thread Andrew Purtell
Have you tried Ganglia2?     http://sourceforge.net/apps/trac/ganglia/wiki/ganglia-web-2   Best regards,     - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) From: Otis Gospodnetic otis_gospodne...@yahoo.com To:

Re: Something like Execution Plan as in the RDBMS world?

2011-07-27 Thread Andrew Purtell
 Or is this a complete different thinking? Yes. There isn't an execution plan when using HBase, as that term is commonly understood from RDBMS systems. The commands you issue against HBase using the client API are executed in order as you issue them.  Depending on the access pattern, we might

Re: Stargate: Only getting HTTP 200 responses in 0.90.x

2011-07-25 Thread Andrew Purtell
. I'll try out 0.90.4 when it's released. Thanks, Greg. -Original Message- From: Andrew Purtell [mailto:apurt...@apache.org] Sent: Saturday, 23 July 2011 4:20 PM To: user@hbase.apache.org Subject: Re: Stargate: Only getting HTTP 200 responses in 0.90.x  We used to get

Re: Filters for non-Java clients?

2011-07-25 Thread Andrew Purtell
The REST API has filter support. Strictly speaking the representation is multilanguage, but only the Java API -- the ScannerModel class, ScannerModel.stringifyFilter -- has support for converting a Java filter tree into a JSON encoded representation of same. However you could do this in Java

Re: Stargate: Only getting HTTP 200 responses in 0.90.x

2011-07-23 Thread Andrew Purtell
 We used to get a 201 after creating a scanner with the scanner ID in the Location property.  We still get this packet with a valid scanner ID but it's now an HTTP 200 packet. The real problem is that we used to get an HTTP 204 when we exhausted the  scanner, but now we get an 200 packet

Re: HBase rest RowSpec bug on startRow and endRow

2011-07-18 Thread Andrew Purtell
Allan, Thanks for the bug report, analysis, and contribution. I will incorporate you patches as part of HBASE-4116: https://issues.apache.org/jira/browse/HBASE-4116   Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) -

Avro connector

2011-07-14 Thread Andrew Purtell
HBASE-2400 introduced a new connector contrib architecturally equivalent to the Thrift connector, but using Avro serialization and associated transport and RPC server work. However, it remains unfinished, was developed against an old version of Avro, is currently not maintained, and is regarded

Re: Hbase performance with HDFS

2011-07-11 Thread Andrew Purtell
Message - From: Arvind Jayaprakash w...@anomalizer.net To: user@hbase.apache.org; Andrew Purtell apurt...@apache.org Cc: Sent: Monday, July 11, 2011 6:34 AM Subject: Re: Hbase performance with HDFS On Jul 07, Andrew Purtell wrote: Since HDFS is mostly write once how are updates/deletes

Re: IN_MEMORY setting

2011-07-08 Thread Andrew Purtell
IN_MEMORY means that HBase will try really hard to keep all of the data blocks for a table in the block cache. Block cache is expired on a LRU basis in three priority bands. Blocks for IN_MEMORY tables have the highest priority. They will be the last to be evicted. It does not mean data is

Re: Enable BLOOMFILTER on existing tables

2011-07-08 Thread Andrew Purtell
You can update existing data through manually triggering compaction. After you make a change like this, go to the hbase shell and execute:   major_compaction 'yourtablename' After major compaction all of the store files for the table will abide the most recent schema settings.   Best regards,

Re: Enable BLOOMFILTER on existing tables

2011-07-08 Thread Andrew Purtell
Sorry that is:   major_compact 'tablename' Typing too fast...   Best regards,    - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) - Original Message - From: Andrew Purtell apurt...@apache.org To: user@hbase.apache.org user

Re: Hbase performance with HDFS

2011-07-07 Thread Andrew Purtell
. Best regards,   - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) From: Mohit Anchlia mohitanch...@gmail.com To: user@hbase.apache.org; Andrew Purtell apurt...@apache.org Sent: Thursday, July 7, 2011 11:53 AM

Re: Hbase performance with HDFS

2011-07-07 Thread Andrew Purtell
) From: Mohit Anchlia mohitanch...@gmail.com To: Andrew Purtell apurt...@apache.org Cc: user@hbase.apache.org user@hbase.apache.org Sent: Thursday, July 7, 2011 12:30 PM Subject: Re: Hbase performance with HDFS Thanks that helps! Just few more questions: You mentioned about

Re: Hbase performance with HDFS

2011-07-07 Thread Andrew Purtell
Message- From: Mohit Anchlia [mailto:mohitanch...@gmail.com] Sent: Thursday, July 07, 2011 2:02 PM To: user@hbase.apache.org; Andrew Purtell Subject: Re: Hbase performance with HDFS Thanks Andrew. Really helpful. I think I have one more question right now :) Underneath HDFS replicates

Re: hbck -fix

2011-07-03 Thread Andrew Purtell
Wayne, Did you by chance have your NameNode configured to write the edit log to only one disk, and in this case only the root volume of the NameNode host? As I'm sure you are now aware, the NameNode's edit log was corrupted, at least the tail of it anyway, when the volume upon which it was

Re: hbck -fix

2011-07-03 Thread Andrew Purtell
earlier adequately conveyed the thought.   - Andy From: Andrew Purtell apurt...@apache.org To: user@hbase.apache.org user@hbase.apache.org Cc: Sent: Sunday, July 3, 2011 12:39 AM Subject: Re: hbck -fix Wayne, Did you by chance have your NameNode configured to write the edit log

Re: HBase region size

2011-07-01 Thread Andrew Purtell
From: Stack st...@duboce.net  3. The size of them varies like this            70% from them have their length 1MB            29% from them have their length between 1MB and 10 MB            1% from them have their length 10MB (they can have also  100MB)   What David says above though

Re: HBase region size

2011-07-01 Thread Andrew Purtell
One reasonable way to handle native storage of large objects in HBase would  be to introduce a layer of indirection.   Do you see this layer on the client or on the server side? Client side. I was also thinking on the update: Le's say we store a new version of  the large object which is

Re: how to get hbase 0.92.0 or any other version could work with hadoop 0.21.0

2011-06-30 Thread Andrew Purtell
HBase trunk will be 0.92.0 when released. HBASE-2233 (Support both Hadoop 0.20 and 0.22) went into trunk on June 9th. I have not personally tried it, though. Best regards,    - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) - Original

Re: Multi-family support for Bulkload

2011-06-27 Thread Andrew Purtell
I think the sender asked if HBASE-1861 is the right approach.    - Andy - Original Message - From: Stack st...@duboce.net To: user@hbase.apache.org Cc: Sent: Monday, June 27, 2011 2:51 PM Subject: Re: Multi-family support for Bulkload On Sun, Jun 26, 2011 at 8:42 PM, Gan, Xiyun

Re: Multi-family support for Bulkload

2011-06-27 Thread Andrew Purtell
From: Gan, Xiyun ganxi...@gmail.com I still have another question, how to remove the partitions_$timpstamp files produced in the HFileOutputFormat? One option is to accept a configuration option for the partitions file: -    Path partitionsPath = new Path(job.getWorkingDirectory(), -        

Re: Does anybody enable MSLAB in production system? I am not sure if it's stable enough for production system?

2011-06-25 Thread Andrew Purtell
From: Jack Zhang(jian) jack.zhangj...@huawei.com Subject: Does anybody enable MSLAB in production system? I am not sure if it's stable enough for production system? Yes, we use it in production and have not observed any negative effects of its use.  Best regards,   - Andy Problems worthy

Re: hadoop without append in the absence of puts

2011-06-23 Thread Andrew Purtell
From: Andreas Neumann neun...@gmail.com we will use LoadIncrementalHFiles, are you saying that this will never cause a split? Create the table with the region split threshold set to Long.MAX_VALUE and a set of pre-split points that partitions the key space as evenly as possible. Use HBase's

Re: hadoop without append in the absence of puts

2011-06-22 Thread Andrew Purtell
From: Andreas Neumann neun...@gmail.com If we only load data in bulk (that is, via doBulkLoad(), not using TableOutputFormat), do we still risk data loss? My understanding is that append is needed for the WAL, and the WAL is needed only for puts. But bulk loads bypass the WAL. Correct. If

Re: hadoop without append in the absence of puts

2011-06-22 Thread Andrew Purtell
From: Andreas Neumann neun...@gmail.com I guess that includes region splits but also reassignment of a region after its region server died. If you are not writing, then you won't see splits. Reassignment does involve writes to META but missing these is less serious than missing a change in

Re: Keytabs and secure hadoop

2011-06-21 Thread Andrew Purtell
From: Francis Christopher Liu fc...@yahoo-inc.com Thanks for the warning, we'd like to stick with the ASF releases of hadoop. That's not really advisable with HBase. It's a touchy subject, the 0.20-ish support for append in HDFS exists in production at some large places but isn't in any ASF

Re: on the impact of incremental counters

2011-06-20 Thread Andrew Purtell
From: Claudio Martella claudio.marte...@tis.bz.it So, basically it's expensive to increment old data. HBase employs a buffer hierarchy to make updating a working set that can fit in RAM reasonably efficient. (But like I said there are some things remaining we can improve in terms of internal

Re: any multitenancy suggestions for HBase?

2011-06-20 Thread Andrew Purtell
Hi Bill, From: Bill Graham billgra...@gmail.com Andy, I can see value in having ACLs on a per-column-pattern (or maybe just per-prefix to make multiple pattern conflict resolution simpler) basis. I know this isn't in scope for the initial release, but would the current design lend itself to

Re: coprocessor failure question and examples?

2011-06-20 Thread Andrew Purtell
Dean, I am considering putting a Listlines in the Account basically(ie. A column-family with n columns) and a coprocessor then processes the line for that account(lots of work is needed to be done here including checking the Activity table which is a key of account-sequence so it is

Re: on the impact of incremental counters

2011-06-18 Thread Andrew Purtell
This is from memory, but I expect someone will chime in if any detail is inaccurate. :-) If the blocks containing the values you are updating fit into blockcache then read IOPS are avoided, satisfied from cache, not disk. Evictions from blockcache are done on an LRU basis. (Packing related

Re: Replication state

2011-06-13 Thread Andrew Purtell
From: Jason Rutherglen jason.rutherg...@gmail.com Right thanks.  I think replication is fairly simple, I don't know much about the HDFS sync code, if one has sync'd on the HLog writer, then an HLog reader should be able to read from there? See my comments on HBASE-2357 regarding the

RE: in-memory data grid vs. ehcache + hbase

2011-06-12 Thread Andrew Purtell
From: Hiller, Dean x66079 dean.hil...@broadridge.com I would think most domains have a low write, high read rate There are IN_MEMORY tables and blockcache in general for that. with low number of rows in certain tables so I am kinda surprised this optimization is not there. Right, you want

Re: our customer delivering compressed file to hadoop question

2011-06-12 Thread Andrew Purtell
From: Hiller, Dean x66079 dean.hil...@broadridge.com Is there an example of LZO and is that what I want customers to deliver as, correct? As when LZO comes in and is layed over hadoop, it is split into chunks on different nodes so I can decompress in parallel? When you copy a big LZO

Re: Adding HQuorum dynamically.

2011-06-10 Thread Andrew Purtell
James, You should refer to the HBase processes by their short class name at least. If you execute 'jps' (java process list) on the command line, it will give you the process ID and the short class name. Region servers run as HRegionServer. Masters run as HMaster. ZooKeeper quorum peers, if

Re: increasing hbase get latencies

2011-06-10 Thread Andrew Purtell
Stack, Aside from the other ideas you mention, this could also be HBASE-3855 if a lot of values are up in memstore? By the way, I patched our 0.90-ish in house HBase with 3855 with no untoward affects and noticeable improvement under profiling. Why not commit it to 0.90 as well? Best

Re: Hadoop not working after replacing hadoop-core.jar with hadoop-core-append.jar

2011-06-10 Thread Andrew Purtell
From: Stack st...@duboce.net Also, I second what Andrew says where I do not know of any place there the name of the jar is inscribed so how the jar is named should play no role at all. Maybe the name does matter. Do you think you ran into the issue that Hari figured at the end of

Re: Hadoop not working after replacing hadoop-core.jar with hadoop-core-append.jar

2011-06-10 Thread Andrew Purtell
We (at least I) was talking about the name of the Hadoop core jar in HBase lib/ being not of any particular importance. Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) --- On Fri, 6/10/11, Mike Spreitzer mspre...@us.ibm.com

Re: Hbase Hardware requirement

2011-06-08 Thread Andrew Purtell
From: Ted Dunning tdunn...@maprtech.com Lots of people are moving towards more spindles per box to increase IOP/s This is particular important for cases where the working set gets pushed out of memory. Indeed. Our spec is more like 12x 500 GB SATA disks, to push IOPS and more evenly

Re: Hadoop not working after replacing hadoop-core.jar with hadoop-core-append.jar

2011-06-07 Thread Andrew Purtell
Pardon if I've missed something but I think this thread comes down to: On Mon, 6/6/11, Mike Spreitzer mspre...@us.ibm.com wrote: So my suggestion is to be unequivocal about it: when running distributed, always build your own Hadoop and put its -core JAR into your HBase installation (or use

Re: Best practices for HBase in EC2?

2011-06-04 Thread Andrew Purtell
I recommend you look at Whirr: http://incubator.apache.org/whirr/ specifically: http://www.philwhln.com/run-the-latest-whirr-and-deploy-hbase-in-minutes - Andy

Re: Failure to Launch: hbase-0.90.3 with hadoop-0.20.203.0

2011-06-03 Thread Andrew Purtell
If you are using security features of 0.20.203 (security != simple) then you will need to do what J-D says about making sure the Hadoop jars are in sync *and* use the version of HBase that works with secure Hadoop: https://github.com/trendmicro/hbase/tree/security Be advised this is HBase

Re: Does Hadoop 0.20.2 and HBase 0.90.3 compatible ??

2011-06-03 Thread Andrew Purtell
Is *Hadoop 0.20.2  also not compatible with Hbase 0.90.3 ???* In a strict sense they are, but without append support HBase cannot guarantee that the last block of write ahead logs are synced to disk, so in some failure cases edits will be lost. With append support then the hole of these

Re: REST API doesn't support checkAndPut

2011-06-03 Thread Andrew Purtell
From: Henri Chenosky henri.cheno...@gmail.com Subject: REST API doesn't support checkAndPut To: user@hbase.apache.org Date: Thursday, June 2, 2011, 9:52 PM It seems that HTable atomic operations (e.g., checkAndPut and checkAndDelete) are not supported by the current REST implementation.

Re: REST API doesn't support checkAndPut

2011-06-03 Thread Andrew Purtell
AM PDT Andrew Purtell wrote: From: Henri Chenosky henri.cheno...@gmail.com Subject: REST API doesn't support checkAndPut To: user@hbase.apache.org Date: Thursday, June 2, 2011, 9:52 PM It seems that HTable atomic operations (e.g., checkAndPut and checkAndDelete) are not supported

Re: ANN: HBase 0.90.3 available for download

2011-05-31 Thread Andrew Purtell
From: Jack Levin magn...@gmail.com Hello, is there a git repo URL I could use to check out that code version? git://git.apache.org/hbase.git or git://github.com/apache/hbase.git or https://github.com/apache/hbase.git Then checkout tag '0.90.3'

Re: How to improve HBase throughput with YCSB?

2011-05-30 Thread Andrew Purtell
The hypervisor steals a lot of CPU time from m1.large instances. You should be using c1.xlarge instances. Are you using local storage or EBS? Be aware that I/O performance on EC2 for any system is lower than if you are using real hardware, significantly so if not using one of the instance

Re: REST Atomic increment

2011-05-25 Thread Andrew Purtell
Do you have any preference for how this might be accomplished? Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) --- On Wed, 5/25/11, Mark Jarecki mjare...@bigpond.net.au wrote: From: Mark Jarecki mjare...@bigpond.net.au

Re: hbase and hypertable comparison

2011-05-25 Thread Andrew Purtell
I think I can speak for all of the HBase devs that in our opinion this vendor benchmark was designed by hypertable to demonstrate a specific feature of their system -- autotuning -- in such a way that HBase was, obviously, not tuned. Nobody from the HBase project was consulted on the results or

Re: Any trigger like facility for HBase tables

2011-05-24 Thread Andrew Purtell
For coprocessors you need to use trunk. - Andy --- On Tue, 5/24/11, Ted Yu yuzhih...@gmail.com wrote: From: Ted Yu yuzhih...@gmail.com Subject: Re: Any trigger like facility for HBase tables To: user@hbase.apache.org Cc: billgra...@gmail.com Date: Tuesday, May 24, 2011, 1:48 PM I don't

Re: 0.90.3

2011-05-24 Thread Andrew Purtell
From: Jack Levin magn...@gmail.com figured it out... the /etc/hosts file has ip to name, was used by zookeeper was *.prod.imageshack.com, while hostname was imgXX.imageshack.us... use by Regionserver/Master -  Ideally, all three components should source hostnames form same place, whether its

Re: GC and High CPU

2011-05-16 Thread Andrew Purtell
This is interesting because our conventional wisdom is those settings should increase the chance of stop-the-world GC and should be avoided. - Andy (who always gets nervous when we start talking about GC black magic) From: Jack Levin magn...@gmail.com Subject: Re: GC and High CPU To:

Re: Lost hbase table after restart

2011-05-11 Thread Andrew Purtell
Furthermore, be sure to read about what HBase 0.90.x requires: http://hbase.apache.org/notsoquick.html#requirements Best regards, - Andy --- On Wed, 5/11/11, Andrew Purtell apurt...@apache.org wrote: From: Andrew Purtell apurt...@apache.org Subject: Re: Lost hbase table after restart

Re: putting a border around 0.92 release

2011-05-02 Thread Andrew Purtell
I agree. We did get code back from an internal dev group for the secondary indexing implementation but I don't think we are satisfied with its current state. Also I have been swamped, therefore remiss, in online schema edit. Shouldn't hold up a release on our account. Next one. - Andy

Re: massive zk expirations under heavy network load

2011-04-20 Thread Andrew Purtell
Kazuki-san, Setting the ZK timeout to a large value will stop the expirations but may not provide sufficiently fast failure detection for your use case of course. However if even Ganglia stops working during a large mapreduce job, I think you need to question the adequacy of the network

Re: Region Splitting for moderate amount of daily data - Improve MapReduce Performance

2011-04-18 Thread Andrew Purtell
From: Jason Rutherglen jason.rutherg...@gmail.com With the new replication feature of 0.92 edits are streamed from one cluster to another Interesting, what does 'cluster' mean in this context? Cluster in this context is a typical data center deployment: HDFS + ZK + HBase master(s) +

Re: Region Splitting for moderate amount of daily data - Improve MapReduce Performance

2011-04-18 Thread Andrew Purtell
From: Jason Rutherglen jason.rutherg...@gmail.com Andrew, thanks for the information.  On the surface it looks like HBASE-2357 would be using the same mechanism for streaming the WAL (except the master slave failover) as HBASE-1295, however HBASE-2357 seems to imply that's not the case? It

Re: Region Splitting for moderate amount of daily data - Improve MapReduce Performance

2011-04-17 Thread Andrew Purtell
Jason, Andrew, when you say this: Because HBase is a DOT it can provide strongly consistent and atomic operations on rows, because rows exist in only one place at a time. This excludes the use of HBase replication? Yes. With the new replication feature of 0.92 edits are streamed

Re: Region Splitting for moderate amount of daily data - Improve MapReduce Performance

2011-04-15 Thread Andrew Purtell
From: Joe Pallas pal...@cs.stanford.edu Could it be that your row key is not distributing the data well enough? That is, if your key is primarily based on the current date, it will only put the data into a small number of regions. This, I have come to realize, is an essential

RE: HBase is not ready for Primetime

2011-04-13 Thread Andrew Purtell
Hi Doug, 3) Cluster restart We schedule a full shutdown and restart of our cluster each week.  It's pretty quick, and HBase just seems happier when we do this. Can you say a bit more about how HBase is happier versus not? I can speculate on a number of reasons why this may be the case,

Re: rpc call logging

2011-04-13 Thread Andrew Purtell
This sounds like HBASE-2014: https://issues.apache.org/jira/browse/HBASE-2014 BTW apologies for the weird English in that issue, it appears I cut and pasted a request from our China development center without sufficient editing. - Andy

Re: just open sourced Orderly -- a row key schema system (composite keys, etc) for use with HBase

2011-04-13 Thread Andrew Purtell
Michael (and GotoMetrics), Thank you for opening this up! Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) --- On Wed, 4/13/11, Michael Dalton mwdal...@gmail.com wrote: Hi all, I'm with a startup, GotoMetrics, doing

RE: cpu profiling

2011-04-11 Thread Andrew Purtell
We use JProfiler and connect to the remote VM via SSH tunnel. (Our testing is done up in EC2.) - Andy From: Peter Haidinyak phaidin...@local.com Subject: RE: cpu profiling To: user@hbase.apache.org user@hbase.apache.org Date: Monday, April 11, 2011, 8:51 AM I've been using JProfiler

Re: Hadoop 0.20.3 Append branch?

2011-04-11 Thread Andrew Purtell
Head of branch-0.20-append is 0.20.3-SNAPSHOT (http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append/build.xml) - Andy From: Jason Rutherglen jason.rutherg...@gmail.com Subject: Re: Hadoop 0.20.3 Append branch? To: apurt...@apache.org, hbase-u...@hadoop.apache.org Date:

Re: file is already being created by NN_Recovery

2011-04-08 Thread Andrew Purtell
I've wondered if the master should not copy logs to some '.' prefix directory, delete the original, then split the copies. Haven't thought through all of the consequences though. - Andy --- On Fri, 4/8/11, Daniel Iancu daniel.ia...@1and1.ro wrote: From: Daniel Iancu daniel.ia...@1and1.ro

RE: HBase Stability

2011-03-21 Thread Andrew Purtell
table#setAutoFlush(false) ? --- On Mon, 3/21/11, Buttler, David buttl...@llnl.gov wrote: From: Buttler, David buttl...@llnl.gov Subject: RE: HBase Stability To: user@hbase.apache.org user@hbase.apache.org Date: Monday, March 21, 2011, 1:46 PM Have you seen Todd Lipcon's post on MSLAB's? 

Re: hash function per table

2011-03-21 Thread Andrew Purtell
Or use a bulk load process to import sequential data as new stores all in one shot. - Andy --- On Sun, 3/20/11, Pete Haidinyak javam...@cox.net wrote: From: Pete Haidinyak javam...@cox.net Subject: Re: hash function per table To: user@hbase.apache.org Date: Sunday, March 20, 2011, 1:03

Re: Stargate and Hbase

2011-03-18 Thread Andrew Purtell
Whether to use REST or Thrift or Avro connectors is a matter of architecture, depends what you are trying to do. In all cases, we are here to help you if the system does not appear to function normally. We rely on volunteer effort for this. It is unlikely someone will volunteer time to help

Re: habse schema design and retrieving values through REST interface

2011-03-16 Thread Andrew Purtell
This facility is not exposed in the REST API at the moment (not that I know of -- please someone correct me if I'm wrong). Wrong. :-) See ScannerModel in the rest package: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/rest/model/ScannerModel.html ScannerModel#setBatch - Andy

Re: Long client pauses with compression

2011-03-15 Thread Andrew Purtell
We have a separate compression setting for major compaction vs store files written during minor compaction (for background/archival apps). Why not a separate compression setting for flushing? I.e. none? --- On Mon, 3/14/11, Jean-Daniel Cryans jdcry...@apache.org wrote: From: Jean-Daniel

Re: HBase = replication = Hive

2011-03-11 Thread Andrew Purtell
Pardon, I'm not as familiar with this area as I should, but apparently Hive queries run about x5 slower than queries that go against normal Hive tables. Is this not a reasonable place to start? Why is this? I was wondering if people think it would be possible to implement HBase=Hive

<    3   4   5   6   7   8   9   >