Re: Building Hadoop on macOS Monterey?

2022-03-25 Thread Andrew Purtell
figured it out (at least with an older version of > Hadoop). I ended up writing a short post about it > > > https://creechy.wordpress.com/2022/03/22/building-hadoop-spark-jupyter-on-macos/ > > --joe > > On Thu, Mar 24, 2022 at 3:14 PM Andrew Purtell > wrote: > >&

Re: Building Hadoop on macOS Monterey?

2022-03-24 Thread Andrew Purtell
If you build with -Dbundle.snappy -Dbundle.zstd on the Maven command line this would produce a tarball containing copies of the native shared libraries in lib/native/ and this would be like your symlink workaround but perhaps less hacky and something the build supports already. Does this work

Re: a non-commerial distribution of hadoop ecosystem?

2015-06-01 Thread Andrew Purtell
Bigtop, in a nutshell, is a non-commercial multi-stakeholder Apache project that produces a build framework that takes as input source from Hadoop and related big data projects and produces as output OS native packages for installation and management - certainly, a distribution of the Hadoop

Re: clarification on HBASE functionality

2014-07-15 Thread Andrew Purtell
HBase will take advantage of HDFS specific features if they are available but can run on anything that has a Hadoop FileSystem driver. Gluster is an option. Maybe Lustre and Ceph also. If you plan on dedicating storage to Cassandra, then you don't have to worry about managing a distributed

Re: Intel Hadoop Distribution.

2013-03-02 Thread Andrew Purtell
Chengi, This is not the forum for inquires about this or that vendor changes. I work for Intel but on community focused work. I couldn't even answer your questions. You should start by looking on the vendor's website for where to direct further inquiry. On the other hand if the question is about

Re: [OT] MapR m3

2013-02-11 Thread Andrew Purtell
There is no hate here. It's a simple courtesy. Questions about Apache Hadoop should be directed to the Apache Hadoop mailing lists. Questions about $VENDOR products should be directed to the vendor's mailing lists or support infrastructure. Furthermore, the question as posed has nothing to do

Re: Hadoop and Cuda , JCuda (CPU+GPU architecture)

2012-09-24 Thread Andrew Purtell
On Mon, Sep 24, 2012 at 10:38 AM, Harsh J ha...@cloudera.com wrote: Make sure to checkout the rootbeer compiler that makes life easy: https://github.com/pcpratts/rootbeer1 Indeed. Interesting to think about how one might plumb Mapper and Reducer to Rootbeer's ParallelRuntime. Best regards,

Re: Announcement of Project Panthera: Better Analytics with SQL, MapReduce and HBase

2012-09-17 Thread Andrew Purtell
Hi Jason, On Mon, Sep 17, 2012 at 6:55 AM, Dai, Jason jason@intel.com wrote: I'd like to announce Project Panthera, our open source efforts that showcase better data analytics capabilities on Hadoop/HBase (through both SW and HW improvements), available at

Re: hadoop security API (repost)

2012-07-02 Thread Andrew Purtell
You could do that, but that means your app will have to have keytabs for all the users want to act as. Proxyuser will be much easier to manage. Maybe getting proxyuser support in hbase if it is not there yet I don't think proxy auth is what the OP is after. Do I have that right? Implies the

HA MRv1 JobTracker?

2012-06-16 Thread Andrew Purtell
We are planning to run a next generation of Hadoop ecosystem components in our production in a few months. We plan to use HDFS 2.0 for the HA NameNode work. The platform will also include YARN but its use will be experimental. So we'll be running something equivalent to the CDH MR1 package to

Re: troubles with HBase unit tests using MiniMRCluster on 0.23.1-SNAPSHOT

2012-01-14 Thread Andrew Purtell
,     - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) From: Ted Yu yuzhih...@gmail.com To: mapreduce-user@hadoop.apache.org Cc: Andrew Purtell apurt...@apache.org; Stack st...@duboce.net Sent: Thursday, January 12, 2012 8

Re: troubles with HBase unit tests using MiniMRCluster on 0.23.1-SNAPSHOT

2012-01-12 Thread Andrew Purtell
Hi Mahadev, Was this reproducible?   Best regards,   - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) From: Andrew Purtell apurt...@apache.org To: mapreduce-user@hadoop.apache.org mapreduce-user

Re: Need help regarding HDFS-RAID

2011-09-20 Thread Andrew Purtell
) From: Dhruba Borthakur dhr...@gmail.com To: hdfs-user@hadoop.apache.org; Andrew Purtell apurt...@apache.org Sent: Tuesday, September 20, 2011 9:49 AM Subject: Re: Need help regarding HDFS-RAID Hi Andy, I will be very grateful to you if you merge and contribute it to Apache Hadoop

Re: Need help regarding HDFS-RAID

2011-09-17 Thread Andrew Purtell
. - Piet Hein (via Tom White) From: Dhruba Borthakur dhr...@gmail.com To: hdfs-user@hadoop.apache.org; Andrew Purtell apurt...@apache.org Sent: Thursday, September 15, 2011 10:14 AM Subject: Re: Need help regarding HDFS-RAID That's right Andy. 0.22+. We

Re: Need help regarding HDFS-RAID

2011-09-15 Thread Andrew Purtell
But that is the HDFS RAID effectively in 0.22+, not 0.21, right Dhruba?   Best regards,    - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) From: Dhruba Borthakur dhr...@gmail.com To:

Re: Need help regarding HDFS-RAID

2011-09-15 Thread Andrew Purtell
) From: Ajit Ratnaparkhi ajit.ratnapar...@gmail.com To: hdfs-user@hadoop.apache.org Cc: Andrew Purtell apurt...@apache.org Sent: Thursday, September 15, 2011 10:54 AM Subject: Re: Need help regarding HDFS-RAID Thanks for the info! So can I use HDFS-RAID taken from apache hdfs trunk

Re: Does Hadoop 0.20.2 and HBase 0.90.3 compatible ??

2011-06-03 Thread Andrew Purtell
Is *Hadoop 0.20.2  also not compatible with Hbase 0.90.3 ???* In a strict sense they are, but without append support HBase cannot guarantee that the last block of write ahead logs are synced to disk, so in some failure cases edits will be lost. With append support then the hole of these

Re: hbase and hypertable comparison

2011-05-25 Thread Andrew Purtell
I think I can speak for all of the HBase devs that in our opinion this vendor benchmark was designed by hypertable to demonstrate a specific feature of their system -- autotuning -- in such a way that HBase was, obviously, not tuned. Nobody from the HBase project was consulted on the results or

Re: HDFS + ZooKeeper

2011-04-24 Thread Andrew Purtell
From: Jason Rutherglen jason.rutherg...@gmail.com Right and AN is not using ZK for the actual NameNode methods/functions, only for the failover election of the 'backup' NameNode? What do you mean by actual NameNode methods/functions? I believe there was an effort during the development of

Re: ZooKeeper approved by Apache Board as TLP!

2010-11-22 Thread Andrew Purtell
Congratulations Patrick and all! Best regards, - Andy From: Patrick Hunt ph...@apache.org Subject: ZooKeeper approved by Apache Board as TLP! To: zookeeper-...@hadoop.apache.org, zookeeper-user zookeeper-user@hadoop.apache.org Date: Monday, November 22, 2010, 10:28 AM We are now

Re: Xcievers Load

2010-09-23 Thread Andrew Purtell
From: Todd Lipcon t...@cloudera.com [...] 4000 xcievers is a lot. 2:1 ratio of file descriptors to xceivers. 4000 xceivers is quite normal on a heavily loaded HBase cluster in my experience. We run with 10K xceivers... The problem is the pain is not quite high enough to devote months to

RE: Lots of Different Kind of Datanode Errors

2010-06-08 Thread Andrew Purtell
?    Thanks,   Gokul     From: Andrew Purtell [mailto:apurt...@apache.org] Sent: Tuesday, June 08, 2010 1:24 AM To: hdfs-user@hadoop.apache.org Subject: Re: Lots of Different Kind of Datanode Errors   Current synchronization on FSDataset seems not quite

Re: Lots of Different Kind of Datanode Errors

2010-06-07 Thread Andrew Purtell
Current synchronization on FSDataset seems not quite right. Doing what amounted to applying Todd's patch that modifies FSDataSet to use reentrant rwlocks cleared up that type of problem for us.    - Andy From: Jeff Whiting je...@qualtrics.com Subject: Re: Lots of Different Kind of Datanode

RE: Using HBase on other file systems

2010-05-15 Thread Andrew Purtell
: Andrew Purtell [mailto:apurt...@apache.org] Sent: Thu 5/13/2010 11:54 PM To: hbase-user@hadoop.apache.org Subject: RE: Using HBase on other file systems You really want to run HBase backed by Eucalyptus' Walrus? What do you have behind that? From: Gibbon, Robert, VF-Group

Re: HBase client hangs after upgrade to 0.20.4 when used from reducer

2010-05-14 Thread Andrew Purtell
That has been a problem before. But I see Jon has already filed a jira for this. From: Todd Lipcon Subject: Re: HBase client hangs after upgrade to 0.20.4 when used from reducer It appears like we might be stuck in an infinite loop here: IPC Server handler 9 on 60020 daemon prio=10

RE: Using HBase on other file systems

2010-05-13 Thread Andrew Purtell
You really want to run HBase backed by Eucalyptus' Walrus? What do you have behind that? From: Gibbon, Robert, VF-Group Subject: RE: Using HBase on other file systems [...] NB. I checked out running HBase over Walrus (an AWS S3 clone): bork - you want me to file a Jira on that?

Stargate WAR target

2010-05-12 Thread Andrew Purtell
Anybody use it? - Andy

public HBase 0.20.4 EC2 AMIs available in all regions

2010-05-11 Thread Andrew Purtell
HBase 0.20.4 EC2 AMIs are now available in all regions. These are instance store backed AMIs. The latest launch scripts can be found here: https://hbase.s3.amazonaws.com/hbase-ec2-0.20.4.tar.gz Region -- AMIID ArchName --

Re: Using HBase on other file systems

2010-05-09 Thread Andrew Purtell
Our experience with Gluster 2 is that self heal when a brick drops off the network is very painful. The high performance impact lasts for a long time. I'm not sure but I think Gluster 3 may only rereplicate missing sections instead of entire files. On the other hand I would not trust Gluster 3

Re: Does HBase do in-memory replication of rows?

2010-05-09 Thread Andrew Purtell
Others have followed up on the central question, which is about durability, and have pointed out that the text is misleading. However more generally regarding the question Does HBase do in-memory replication of rows?: HBase will have a replication feature in the next release independent of

Re: Using HBase on other file systems

2010-05-09 Thread Andrew Purtell
or you'll need to extend the FileSystem class to write a client that Hadoop Core can use. There is one: https://issues.apache.org/jira/browse/HADOOP-6253 It even exports stripe locations in a way useful for distributing MR task placement, but provides only one host per block. - Andy

RE: Theoretical question...

2010-04-30 Thread Andrew Purtell
Given your take, I encourage you to check out HBASE-1697. - Andy On Fri Apr 30th, 2010 6:14 AM PDT Michael Segel wrote: Andrew, Not exactly. Within HBase, if you have access, you can do anything to any resource. I don't believe there's a concept of permissions. (Unless you can use the

Re: Theoretical question...

2010-04-29 Thread Andrew Purtell
From: Michael Segel Imagine you have a cloud of 100 hadoop nodes. In theory you could create multiple instances of HBase on the cloud. Obviously I don't think you could have multiple region servers running on the same node. The use case I was thinking about if you have a centralized

Re: REST/Stargate Performance

2010-04-20 Thread Andrew Purtell
This is the way it is with REST. You have HTTP transaction overheads for each access to a (path specified) resource. Multiple clients and Stargate instances will help. REST/WS is best suited for the case where you will have thousands of concurrent clients making fairly infrequent requests along

Re: REST/Stargate Performance

2010-04-20 Thread Andrew Purtell
Actually I can get several thousand values per second using scanners and small-ish values, roughly on par with the Thrift connector. From: Andrew Purtell [...] Also, I can get several hundred reads per second using scanners and batching, and I can do several hundred puts per second using

Re: REST Interface: Required ordering of JSON name/value pairs when performing Insert/Update

2010-04-20 Thread Andrew Purtell
I filed a JIRA for this and will take a look at it soon: https://issues.apache.org/jira/browse/HBASE-2475 Thanks for the report, very helpful. In the equivalent XML notation, the ordering is specifically required per the schema. ... and Jersey adds a marshaller and unmarshaller to the JAXB

RE: Get operation in HBase Map-Reduce methods

2010-04-20 Thread Andrew Purtell
My advice is to use a scanner with explicit start/end key (best) or filters (still good), not temporary tables (not so good). HBase 0.21 will have MultiGet, so that would be another option then. - Andy From: Geoff Hendrey [...] As I understand it, you have a table, and you need to do

RE: extremely sluggish hbase

2010-04-20 Thread Andrew Purtell
From: Geoff Hendrey [...] Yes, it shows BLOCKCACHE = 'false' Hbase shell is taking 63 seconds to scan a table with {LIMIT=1}! The actual time to perform the action is subsecond by there's ~62 seconds of hanging around waiting for region locations to come back. ROOT or META might not

Hackathon agenda

2010-04-17 Thread Andrew Purtell
The Hackathon is basically agenda-less, but I'd like to propose a general topic of discussion we should cover while we are all in the room together: - For HBASE-1964 (HBASE-2183, HBASE-2461, and related): injecting and/or mocking exceptions thrown up from DFSClient. I think we want a toolkit

Re: Recommendations (WAS - Re: DFSClient errors during massive HBase load)

2010-04-17 Thread Andrew Purtell
The short answer is you need more HDFS datanodes. It's a question of trying to do too much at peak load with too few cluster resources. Brief reminder: I have a small cluster, 3 regionservers (+datanodes), 1 master (+namenode). We preform a massive load of data into hbase every few

Re: About test/production server configuration

2010-04-10 Thread Andrew Purtell
Hey Todd, I don't think commodity hardware is a joke at all, it's just a different definition of commodity. Yes, this is why I said: The commodity hardware talk around MR and BigTable is a bit of a joke -- I tend to think of commodity as USD$6000 or so. Really, standard server class

Re: get the impact hbase brings to HDFS, datanode log exploded after we started HBase.

2010-04-08 Thread Andrew Purtell
My suggestions: Don't run below INFO logging level for performance reasons once you have a cluster up and running. Instead of using DN logs, instead export HBase and HDFS metrics via Ganglia. http://wiki.apache.org/hadoop/GangliaMetrics

Re: get the impact hbase brings to HDFS, datanode log exploded after we started HBase.

2010-04-08 Thread Andrew Purtell
Subject: Re: get the impact hbase brings to HDFS, datanode log exploded after we started HBase. To: hbase-user@hadoop.apache.org, apurt...@apache.org Date: Thursday, April 8, 2010, 6:49 PM thanks Andrew, On Fri, Apr 9, 2010 at 2:30 AM, Andrew Purtell apurt...@apache.org wrote: My

Re: Using SPARQL against HBase

2010-04-05 Thread Andrew Purtell
Just some ideas, possibly half-baked: From: Amandeep Khurana Subject: Re: Using SPARQL against HBase To: hbase-user@hadoop.apache.org 1. We want to have a SPARQL query engine over it that can return results to queries in real time, comparable to other systems out there. And since we will

Re: Performance of reading rows with a large number of columns

2010-04-03 Thread Andrew Purtell
Sammy, Is HBase deserializing the entire row when it reads the data from disk No. so limiting the column doesn't have any effect. HBase is a column oriented store -- values are grouped independently at the store level by column family. It appears you are using only one column family,

come to HUG10!

2010-04-02 Thread Andrew Purtell
Internet - Special room rate of $135/night Best regards, Andrew Purtell apurt...@apache.org andrew_purt...@trendmicro.com

Re: [DISCUSSION] Release process

2010-04-01 Thread Andrew Purtell
Our org (Trend Micro) will be using an internal build based on 0.20 for at least the rest of this year. It is, really, already 1.0 from our point of view, the first ASF Hadoop release officially adopted into our production environment. I hope other users of Hadoop will speak up on this thread

Re: DFSClient errors during massive HBase load

2010-04-01 Thread Andrew Purtell
First, ulimit: 1024 That's fatal. You need to up file descriptors to something like 32K. See http://wiki.apache.org/hadoop/Hbase/Troubleshooting, item #6 From there, let's see. - Andy From: Oded Rosen o...@legolas-media.com Subject: DFSClient errors during massive HBase load To:

Re: Using SPARQL against HBase

2010-03-31 Thread Andrew Purtell
HBase has nice properties for efficiently storing, for example, a sparse adjacency representation of a graph, very large graphs. I'm sure it could be used to store an enormous number of RDF triples. But this is a long long way from something that can respond to SPARQL queries. The RDF store

RE: Using SPARQL against HBase

2010-03-31 Thread Andrew Purtell
I thought Heart was dead. - Andy From: Jonathan Gray Subject: RE: Using SPARQL against HBase Stack pointed this out to me yesterday which could be of interest to you: http://wiki.apache.org/incubator/HeartProposal http://heart.korea.ac.kr/

RE: Using SPARQL against HBase

2010-03-31 Thread Andrew Purtell
Hi Raffi, To read up on fundamentals I suggest Google's BigTable paper: http://labs.google.com/papers/bigtable.html Detail on how HBase implements the BigTable architecture within the Hadoop ecosystem can be found here: http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture

Re: Using SPARQL against HBase

2010-03-31 Thread Andrew Purtell
, 2010 at 11:56 AM, Andrew Purtell apurt...@apache.orgwrote: Hi Raffi, To read up on fundamentals I suggest Google's BigTable paper: http://labs.google.com/papers/bigtable.html Detail on how HBase implements the BigTable architecture within the Hadoop ecosystem can be found here

Re: Region assignment in Hbase

2010-03-30 Thread Andrew Purtell
Hi, Read this: http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html [...] In the thread Data distribution in HBase , one of the people mentioned that the data hosted by the Region Server may not actually reside on the same machine . So when asked for data , it fetches from

Re: Delete Range Of Rows In HBase or How To Age Out Old Data

2010-03-29 Thread Andrew Purtell
Hi David, What about setting time to lives on column families? You can add or change the 'TTL' attribute on a column family in the shell, or specify a time to live when creating a table. See javadoc for HColumnDescriptor. A time to live is a Long value (unit is microseconds) associated with

Re: Delete Range Of Rows In HBase or How To Age Out Old Data

2010-03-29 Thread Andrew Purtell
Please see inline. From: David Swift Andrew, The TimeToLive works exactly as you described.  It's perfect for our needs. However, I aged out several hundred thousand rows, waited about 10 minutes, and then ran a compact from the HBase shell.  During the whole period, I ran a periodic

Re: Questions about data distribution in HBase

2010-03-29 Thread Andrew Purtell
This use case is an ideal one for coprocessors. Alas, the coprocessor feature is not finished yet. More inline. From: William Kang Subject: Re: Questions about data distribution in HBase What I need  is a low latency system can perform some videos processes on the fly. For this reason, a

Re: how to do fast scan on huge table

2010-03-27 Thread Andrew Purtell
Steven, If you are going to go that route, please check out Coprocessors (HBASE-2000, HBASE-2001). The current patch on HBASE-2001 for example implements an in-process MapReduce framework for the regionserver that allows you to load (at this time) arbitrary classes from HDFS which implement

Re: how to do fast scan on huge table

2010-03-27 Thread Andrew Purtell
A really good suggestion. We advocate and use this extensively. When the queries (or some reasonable subset) can be anticipated and some amount of lag is acceptable, then you can periodically run a MR job that precomputes answers to anticipated queries and writes them to a table that you will

Re: EC2 scripts

2010-03-27 Thread Andrew Purtell
Hi Tim, Currently we only have public AMIs registered in the us-east-1 region. EC2 AMIs are region-locked. So the scripts are not finding any public AMIs for the region you are running in -- EU? -- hence the error. I have also never seen the Xalan complaints. What OS? I did replicate the

Re: EC2 scripts

2010-03-27 Thread Andrew Purtell
Yes, if you are creating your own private AMIs you have to change the S3_BUCKET setting in hbase-ec2-env.sh to a bucket that you own. The HBase public AMI bucket is read only. :-) If the bucket name you specify does not exist it will be created by ec2-upload-bundle. You may want to try that

Re: EC2 scripts

2010-03-27 Thread Andrew Purtell
From: Tim Robertson timrobertson...@gmail.com I added to the ec2 env:   EC2_URL=https://us-east-1.ec2.amazonaws.com and also:   $ echo $EC2_URL   https://us-east-1.ec2.amazonaws.com But the error remains: Required parameter 'AMI' missing (-h for usage) Waiting for instance  to

Re: EC2 scripts

2010-03-27 Thread Andrew Purtell
Tim, Try these scripts: http://hbase.s3.amazonaws.com/hbase/hbase-ec2-eu.tar.gz I'd appreciate it. They are what I'm working with now over in eu-west-1 with no issues. I'll have AMIs for HBase 0.20.3 on Hadoop 0.20.2 ready to go in the EU within the hour. Please let me know if you continue

Re: EC2 scripts

2010-03-27 Thread Andrew Purtell
x86_64 From: Andrew Purtell apurt...@apache.org Try these scripts:    http://hbase.s3.amazonaws.com/hbase/hbase-ec2-eu.tar.gz I'd appreciate it. They are what I'm working with now over in eu-west-1 with no issues. I'll have AMIs for HBase 0.20.3 on Hadoop 0.20.2 ready to go in the EU within

Re: EC2 scripts

2010-03-27 Thread Andrew Purtell
Not a waste of time at all. :-) Right, src/contrib/ec2/ becomes contrib/ec2/ after 'ant package' with appropriate substitutions. - Andy From: Tim Robertson timrobertson...@gmail.com Subject: Re: EC2 scripts To: apurt...@apache.org, hbase-user@hadoop.apache.org Date: Saturday, March 27,

Re: good, not evil and HBase package for Debian

2010-03-26 Thread Andrew Purtell
Hey Thomas, On Apache's legal-discuss@ list it was resolved some time ago that the Good, Not Evil phrase in the JSON license does not preclude inclusion of it with ASF 2.0 licensed works. ASF 2.0 licensed works are already in Debian main. I think this is a non issue but you can easily ask

Re: good, not evil and HBase package for Debian

2010-03-26 Thread Andrew Purtell
I've opened an issue on this topic in the hbase jira: https://issues.apache.org/jira/browse/HBASE-2383 Thanks for asking Debian, Thomas. - Andy

Re: About URL character escape in HBase Rest Servlet

2010-03-25 Thread Andrew Purtell
梁爽, I'm using hadoop 0.20.1 and hbase 0.20.3. I have stargate running in tomcat and use apache as a proxy But here is the problem. Some of my row key have special character like '/'. Can you provide more detail? Anything in the Stargate log? Or Tomcat log? Or Apache log? We have a simple

RE: Problems with region server OOME

2010-03-24 Thread Andrew Purtell
We would appreciate tips/information of how to change the configuration so that OOME probability is minimized. Try running with 4GB heaps if you can. On recent JVMs -- but don't use 1.6.0_18! -- you can have the JVM compress 64 bit object references into 32 bits. This will save heap at minor

Re: Data Loss During Bulk Load

2010-03-23 Thread Andrew Purtell
This was an exception talking to the HDFS NameNode. Please check the NameNode log, grep using the file URI there. - Andy From: Nathan Harkenrider nathan.harkenri...@gmail.com Subject: Re: Data Loss During Bulk Load To: hbase-user@hadoop.apache.org Date: Monday, March 22, 2010, 3:47 PM

Re: trouble with ec2 scripts

2010-03-21 Thread Andrew Purtell
Hi Seth, Very simple. You have to wait for EC2 to launch all of the slaves before your cluster will come up. The master in your case is waiting for DFS to initialize. Check the master log under /mnt/hbase/logs if you want to see the Waiting for DFS to initialize... messages. The master will

Re: EC2 Scripts

2010-03-20 Thread Andrew Purtell
[Copied to hbase-u...@] Hi Lars, Required parameter 'AMI' missing (-h for usage) I guess that is OK? No, that means the scripts are not picking up a valid public AMI. I think you may be the first to attempt to use the EC2 scripts in the EU. One thing I did not mention in my presentation is

Re: Remote Java client connection into EC2 instance

2010-03-20 Thread Andrew Purtell
The last time I checked the Rackspace instance types had less disk. For example the 8192 MB option has 320 GB of disk. For roughly the same price and RAM on EC2 you get 840 GB of instance storage (m1.large). Presumably for a HBase/Hadoop deployment, storage capacity is a top concern. And note

Re: Remote Java client connection into EC2 instance

2010-03-20 Thread Andrew Purtell
Something you might want to look into is the EC2 scripts on Hadoop core trunk (0.21-dev). These have moved beyond bash scripts tied to the EC2 command line tools to a set of Python scripts which use libcloud to abstract away the infrastructure mechanics. So it's about equal effort to deploy a

Re: Remote Java client connection into EC2 instance

2010-03-19 Thread Andrew Purtell
The IP addresses assigned on the cluster are all internal ones, so when the regionservers do a reverse lookup, they get something foo.internal. Then they report this to the master, which hands them out to the client library as region locations. So while you can telnet to 60020 on the slaves as

Re: Remote Java client connection into EC2 instance

2010-03-19 Thread Andrew Purtell
best to update should an instance fail and be replaced, but this should be hopefully a rare event and elastic IPs can help, though each account only gets 5 of them without justification to AWS. - Andy On Fri Mar 19th, 2010 9:45 AM PDT Andrew Purtell wrote: The IP addresses assigned

Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)

2010-03-15 Thread Andrew Purtell
ec2 related blog   (http://aws-musings.com/) Regards, Vaibhav On Sun, Mar 14, 2010 at 3:58 PM, Andrew Purtell wrote: Hey Vaibhav, Do you think any of your #2 would be generally useful for others and something we might fold into the public HBase EC2 scripts? I don't want

Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)

2010-03-14 Thread Andrew Purtell
Hey Vaibhav, Do you think any of your #2 would be generally useful for others and something we might fold into the public HBase EC2 scripts? I don't want to be presumptive, but let me kindly plant the idea... Best, - Andy - Original Message From: Vaibhav Puranik

Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)

2010-03-13 Thread Andrew Purtell
The data will be intact, but the config will be invalidated, right? After a cluster has been suspended and then resumed, all of the assigned IP addresses will be different. So this would render all of the Hadoop and HBase configuration files invalid. The data will be there but you will have to

Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)

2010-03-13 Thread Andrew Purtell
Hmm... I know you only used leeware as an example Edward. :-) I'd caution you have to be careful. Obviously only a subset of low cost options are suitable and you need to know what you are doing. Given this example, leeware servers would be possibly useful but underperforming for plain

Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)

2010-03-13 Thread Andrew Purtell
Hi Vaibhav, My advice is for the unaware. :-) No implication or disrespect is meant for others. We have targeted our EC2 scripts at the newcomer, early evaluator, or casual experimenter, though they can for sure serve as a starting point to build something more professional/production. So

Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)

2010-03-13 Thread Andrew Purtell
and trigger recovery actions. Maybe someone can experiment, confirm or refute, and then share their experiences? - Andy - Original Message From: Andrew Purtell apurt...@apache.org To: hbase-user@hadoop.apache.org Sent: Sat, March 13, 2010 12:32:22 PM Subject: Re: on Hadoop

on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)

2010-03-12 Thread Andrew Purtell
=AttachFiledo= gettarget=HUG9_HBaseUpdate_JonathanGray.pdf HBase and HDFS by Todd Lipcon of Cloudera http://wiki.apache.org/hadoop/HBase/HBasePresentations?action=AttachFiledo= gettarget=HUG9_HBaseAndHDFS_ToddLipcon_Cloudera.pdf HBase on EC2 by Andrew Purtell of Trend Micro http://hbase.s3

Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)

2010-03-12 Thread Andrew Purtell
Hi James, I architected something once using hosted servers in ServerBeach as core resources with elastic extension onto the EC2 cloud to handle peaks and spikes. My current thinking is this kind of hybrid model may be the best way to go for hosted elastic Hadoop clusters. HBase use is different

Re: Use cases of HBase

2010-03-10 Thread Andrew Purtell
of materialized? Would you kindly give more details? Thanks! - Hua On Wed, Mar 10, 2010 at 8:12 AM, Andrew Purtell wrote: I came to this discussion late. Ryan and J-D's use case is clearly successful. In addition to what others have said, I think another case where HBase really

Re: Regionserver problems because of datanode timeouts

2010-03-10 Thread Andrew Purtell
However, once and every while our Nagios (our service monitor) detects that requesting the Hbase master page takes a long time. Sometimes 10 sec, rarely around 30 secs but most of the time 10 secs. In the cases the page loads slowly, there is a fair amount of load on Hbase. I've

Re: Use cases of HBase

2010-03-09 Thread Andrew Purtell
I came to this discussion late. Ryan and J-D's use case is clearly successful. In addition to what others have said, I think another case where HBase really excels is supporting analytics over Big Data (which I define as on the order of petabyte). Some of the best performance numbers are put

Re: Trying to understand HBase/ZooKeeper Logs

2010-03-03 Thread Andrew Purtell
I built the HBase RPMs for Cloudera. Just for future reference if someone needs patched versions of those RPMs, it's easy enough for me to spin them for you. Just drop me a note. And/or you may want to send a note to Cloudera explaining your needs. I put together a version of Cloudera-ized

Re: Handling Interactive versus Batch Calculations

2010-03-01 Thread Andrew Purtell
I think Jonathan Gray began working on something similar to this a few months ago for Streamy. Regrettably that was proprietary and remains so to the best of my knowledge. As JD said, Coprocessors are very interesting, and I think they're worth looking at (or contributing a patch fo!)

Re: Handling Interactive versus Batch Calculations

2010-03-01 Thread Andrew Purtell
You may have made the mental substitution, but just in case not: Also the server side implementation holds all intermediate values in the heap. What we have now is a sketch that needs some work. It really should spill intermediates to local disk (as HFiles) as necessary and then

Re: Why windows support is critical

2010-02-28 Thread Andrew Purtell
Since you have already decided to move I don't see the point in asking what your troubles were. Suffice it to say I use HBase in all-localhost mode on a Windows development box every day (though I do prefer to use my Ubuntu box whenever possible for other reasons). You can't effectively run

HUG9 @ Mozilla March 10th, register at http://su.pr/4pe8Of

2010-02-26 Thread Andrew Purtell
To any interested parties, The 9th edition of the HBase Users' Group is kindly being sponsored and hosted by Mozilla in Mountain View (California) on March 10th. Register at http://su.pr/4pe8Of Yours, The HBaseistas

Re: Hbase has been Mavenized

2010-02-23 Thread Andrew Purtell
Thanks Paul! - Original Message From: Paul Smith psm...@aconex.com To: hbase-user@hadoop.apache.org; hbase-...@hadoop.apache.org Sent: Tue, February 23, 2010 10:29:00 PM Subject: Hbase has been Mavenized Just a quick cross post to mention that Hbase trunk has now been migrated

Re: thinking about HUG9

2010-02-22 Thread Andrew Purtell
I talked with Stack over the weekend. We're thinking more like the week of March 29. The 8th seems too soon. - Andy - Original Message From: Andrew Purtell apurt...@apache.org To: hbase-...@hadoop.apache.org; hbase-user@hadoop.apache.org Cc: hbase-...@hadoop.apache.org Sent

Re: thinking about HUG9

2010-02-22 Thread Andrew Purtell
3/10 at mozilla. Thats the week you can't do, right? St.Ack On Mon, Feb 22, 2010 at 2:08 PM, Andrew Purtell wrote: I talked with Stack over the weekend. We're thinking more like the week of March 29. The 8th seems too soon. - Andy - Original Message From: Andrew

Re: ext3 or ext4 filesystem for Hadoop/Hbase?

2010-02-20 Thread Andrew Purtell
ext4 is the clear winner over ext3. xfs if ext4 is not available (RHEL, CentOS, etc.) This is what our EC2 scripts use. Both ext4 and xfs use extents and do lazy/group allocation. - Original Message From: Sujee Maniyam su...@sujee.net To: hbase-user hbase-user@hadoop.apache.org

Re: ext3 or ext4 filesystem for Hadoop/Hbase?

2010-02-20 Thread Andrew Purtell
://people.canonical.com/~smoser/bugs/428692/ Sujee http://sujee.net On Sat, Feb 20, 2010 at 10:24 AM, Andrew Purtell wrote: ext4 is the clear winner over ext3. xfs if ext4 is not available (RHEL, CentOS, etc.) This is what our EC2 scripts use. Both ext4 and xfs use extents and do lazy

Re: thinking about HUG9

2010-02-19 Thread Andrew Purtell
, Andrew Purtell wrote: March 8 is ok -- afternoon/evening. - Andy From: Stack Can we do March 8th? I can't do March 9th. St.Ack On Thu, Feb 11, 2010 at 12:43 PM, Andrew Purtell wrote: Hi all, Trend Micro would like to host HUG9 at our offices in Cupertino:

thinking about HUG9

2010-02-11 Thread Andrew Purtell
Hi all, Trend Micro would like to host HUG9 at our offices in Cupertino:

Re: thinking about HUG9

2010-02-11 Thread Andrew Purtell
March 8 is ok -- afternoon/evening. - Andy From: Stack Can we do March 8th? I can't do March 9th. St.Ack On Thu, Feb 11, 2010 at 12:43 PM, Andrew Purtell wrote: Hi all, Trend Micro would like to host HUG9 at our offices in Cupertino: http://maps.google.com/maps?f=qsource

Re: Zookeeper - usage and load

2010-02-09 Thread Andrew Purtell
The Zookeeper devs suggest giving 1 GB heap to each process. I run it with default heap (256 MB) and it's stable for me, but I run relatively small clusters. ZK wants its own disk for the transaction log. So if you can, dedicate a disk, or run ZK on separate servers. Our EC2 scripts start a

  1   2   3   4   5   >