Re: Lily pre-release info available

2010-06-08 Thread Lars George
Hi Steven, First off, congrats for the progress! This is super exciting. As usual, if you need help you know where to find us :) but it seems you have it all well under control. As far as Buzzwords is concerned, I had a proposal submitted but was rejected. But with projects like Lily we will get

[ANN] Next Munich OpenHUG Meeting

2010-11-05 Thread Lars George
Ackermann - "Datenmodellierung und Architektur beim Einsatz von Semantik-Web-Technologien in der Praxis" by Sacha Berger - "Hadoop and HBase - Ein Überblick" by Lars George As usual this is followed by an open discussion at a nearby pub/restaurant so that we can also enjoy

Next Munich OpenHUG Meeting

2010-11-06 Thread Lars George
Ackermann - "Datenmodellierung und Architektur beim Einsatz von Semantik-Web-Technologien in der Praxis" by Sacha Berger - "Hadoop and HBase - Ein Überblick" by Lars George As usual this is followed by an open discussion at a nearby pub/restaurant so that we can also enjoy

Re: Xceiver problem

2010-11-17 Thread Lars George
Hi Lucas, What OS are you on? What kernel version? What is your Hadoop and HBase version? How much heap do you assign to each Java process? Lars On Wed, Nov 17, 2010 at 3:05 PM, Lucas Nazário dos Santos wrote: > Hi, > > This problem is widely know, but I'm not able to come up with a decent > so

Re: Correlating traffic with regions

2010-11-17 Thread Lars George
JD, Should we create a metric for it so that it dynamically counts per region its usage? That can then be exposed via Ganglia context or JMX. Just wondering. Lars On Wed, Nov 17, 2010 at 5:04 PM, Vaibhav Puranik wrote: > hi, > > Thanks for the suggestions JD & Michael. > The region servers serv

Re: Xceiver problem

2010-11-17 Thread Lars George
Lucas > > > > On Wed, Nov 17, 2010 at 2:12 PM, Lars George wrote: > >> Hi Lucas, >> >> What OS are you on? What kernel version? What is your Hadoop and HBase >> version? How much heap do you assign to each Java process? >> >> Lars >> >>

Re: Xceiver problem

2010-11-17 Thread Lars George
ase your epoll limit. Some > tips about that here: > > http://www.cloudera.com/blog/2009/03/configuration-parameters-what-can-you-just-ignore/ > > Thanks > -Todd > > On Wed, Nov 17, 2010 at 9:10 AM, Lars George wrote: > >> Are you running on EC2? Couldn't you sim

Re: Xceiver problem

2010-11-17 Thread Lars George
/proc/sys/fs/epoll/max_user_watches. I'm not quite sure about what to do. > > Can I favor max_user_watches over max_user_instances? With what value? > > I also tried to play with the Xss argument and decreased it to 128K with no > luck (xcievers at 4096). > > Lucas > > >

Re: Using Patched Version of Hadoop for HBase 0.90.0

2010-11-17 Thread Lars George
Hi Navraj, This is because 0.90 uses Maven, and that has a local cache (usually under ~/.m2). You need to replace the existing jar with yours, see http://maven.apache.org/guides/mini/guide-3rd-party-jars-local.html for example on how to do this. Replace the jar with yours and use the following art

Re: Restoring table from HFiles

2010-11-17 Thread Lars George
I would not say "no" immediately. I know some have done so (given the version was the same) and used add_table.rb to add the table to META. YMMV. Lars On Thu, Nov 18, 2010 at 6:01 AM, Ted Yu wrote: > No. > > See https://issues.apache.org/jira/browse/HBASE-1684 > > On Wed, Nov 17, 2010 at 8:25 PM

Re: TableInputFormat vs. a map of table regions (data locality)

2010-11-17 Thread Lars George
Hi Joy, [1] is what [2] does. They are just a thin wrapper around the raw API. And as Alex pointed out and you noticed too, [2] adds the benefit to have locality support. If you were to add this to [1] then you have [2]. Lars On Thu, Nov 18, 2010 at 5:30 AM, Saptarshi Guha wrote: > Hello, > >

Re: Restoring table from HFiles

2010-11-18 Thread Lars George
compactions do not run while you are copying. Splits/compactions >> change hfile layout on disk. You must freeze the table so your copy is >> consistent (source data remains unchanged start to finish). >> >> Best regards, >> >>    - Andy >> >> >&

Re: Confusing the retrieve result with a given timestamp

2010-11-19 Thread Lars George
Have a read here: http://outerthought.org/blog/417-ot.html Especially: "One interesting option that is missing is the ability to retrieve the latest version less than or equal to a given timestamp, thus giving the 'latest' state of the record at a certain point in time. Update: this is (obviously

Re: Ganglia website refuses connection despite proxy (Hbase EC2)

2010-11-19 Thread Lars George
Yeah, this will be superseded by WHIRR-25 over the next month or two. The "root" name was simply a choice, no reason not to change it. As for Ganglia, do you see the Ganglia daemon run on each node? If not, please have a look into the logs on the servers, the user scripts usually log their process

Re: map task performance degradation - any idea why?

2010-11-19 Thread Lars George
Hi Henning, Could you look at the Master UI while doing the import? The issue with a cold bulk import is that you are hitting one region server initially, and while it is filling up its in-memory structures all is nice and dandy. Then ou start to tax the server as it has to flush data out and it b

Re: map task performance degradation - any idea why?

2010-11-19 Thread Lars George
> the start of the run and then even got better. An unexpected load > behavior for me (would have expected early changes but then > some stable behavior up to the end). > > Thanks, >  Henning > > Am Freitag, den 19.11.2010, 15:21 +0100 schrieb Lars George: > >> Hi

Re: map task performance degradation - any idea why?

2010-11-19 Thread Lars George
Am Freitag, den 19.11.2010, 16:16 +0100 schrieb Lars George: > >> Hi Henning, >> >> And you what you have seen is often difficult to explain. What I >> listed are the obvious contenders. But ideally you would do a post >> mortem on the master and slave logs for H

Re: ClassNotFoundException while running some HBase m/r jobs

2010-11-20 Thread Lars George
Hi Hari, This is most certainly a classpath issue. You either have to add the jar to all TaskTracker servers and add it into the hadoop-env.sh in the HADOOP_CLASSPATH line (and copy it to all servers again *and* restart the TaskTracker process!) or put the jar into the job jar into a /lib direc

Re: ClassNotFoundException while running some HBase m/r jobs

2010-11-21 Thread Lars George
g the same error. Weird thing is that tasks on the master node are also > failing with the same error, even though all my files are available on > master. I am sure I'm missing something basic here, but unable to pinpoint > the exact problem. > > hari > > On Sun,

Re: Question about Zookeeper quorum

2010-11-21 Thread Lars George
Hi Hari, You are missing the quorum setting. It seems the hbase-site.xml is missing from the classpath on the clients. Did you pack it into the jar? And yes, even one ZK server is fine in such a small cluster. You can see it is trying to connect to localhost which is the default if the site f

Re: Question about Zookeeper quorum

2010-11-21 Thread Lars George
fixed when I add ejabber to the > quroum right? After all, it is responding to changes I make in my xml file. > What else can be the issue here? > > hari > > On Mon, Nov 22, 2010 at 12:54 AM, Lars George wrote: > >> Hi Hari, >> >> You are missing the quorum

Re: Cell versioning/timestamp limits

2010-11-22 Thread Lars George
Hi Mark, First please read this post: http://outerthought.org/blog/417-ot.html Rest inline below. On Nov 22, 2010, at 7:45, Mark Jarecki wrote: > Hi, > > I'm completely new to HBase and have some questions regarding cell > timestamps. > > My questions: Are there practical limitations to t

Re: Question about Zookeeper quorum

2010-11-22 Thread Lars George
what is the default value > of HBASE_MANAGES_ZK ? Because I have not explicitly set it to true in my > hbase-env.sh file. > > thanks, > hari > > On Mon, Nov 22, 2010 at 10:39 AM, Lars George wrote: > >> Hi Hari, >> >> On which of these for machines do you h

Re: Cell versioning/timestamp limits

2010-11-22 Thread Lars George
I agree with Mark. HBase starts the built in ZK support on the nodes that are listed in the quorum. That is why it works as Mark says when you add the ejabber host. What is broken is your job config. For some reason you do not seem to have the right config in your jar as it tries to connect to

Re: map task performance degradation - any idea why?

2010-11-22 Thread Lars George
ion/split(?) rate and make > judgement on whether some configuration is properly set (e.g. > hbase.hregion.memstore.flush.size). > > Thanks, > Alex Baranau > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase > > On Fri, Nov 19, 2010 at 5:

Re: Confusing the retrieve result with a given timestamp

2010-11-22 Thread Lars George
Also see: https://issues.apache.org/jira/browse/HBASE-2470 (I completely forgot I had opened it). On Fri, Nov 19, 2010 at 2:15 PM, Pan W wrote: > Hi,  Lars > > It's very nice of you to show the helpful blog to me. Now I see  :-) > -- > Pan W > >

Re: Question about Zookeeper quorum

2010-11-22 Thread Lars George
null this way. But when I > change dfs.replication in my config file, I do see a change in replication > after upload into HDFS. > > On Mon, Nov 22, 2010 at 2:36 PM, Hari Sreekumar > wrote: > >> Hi Lars, >> >>       I start them through HBase implicitly. I

Re: Hadoop/HBase hardware requirement

2010-11-22 Thread Lars George
Oleg, Do you have Ganglia or some other graphing tool running against the cluster? It gives you metrics that are crucial here, for example the load on Hadoop and its DataNodes as well as insertion rates etc. on HBase. What is also interesting is the compaction queue to see if the cluster is going

Re: Hadoop/HBase hardware requirement

2010-11-22 Thread Lars George
mon. > > It will help a lot if you can provide your configurations and system > characteristics (maybe in a Wiki page). > It will also help to get more of the "small tweaks" that you found helpful. > > > Lior Schachter > > > > > > > > On Mo

Re: Question about Zookeeper quorum

2010-11-22 Thread Lars George
lt config > all the while! Printing out the config turns was really helpful. So > getConf() reads the hadoop default and core-site.xml files, and > HBaseConfiguration() reads hbase properties. > > Thanks a lot, > Hari > > On Mon, Nov 22, 2010 at 4:41 PM, Lars George w

Re: Hadoop/HBase hardware requirement

2010-11-22 Thread Lars George
heap !! very cheap :) >> >> I believe that more money will come when we show the viability of the >> system... I also read that heterogeneous clusters are common. >> >> It will help a lot if you can provide your configurations and system >> characteristics (maybe

Re: LazyFetching of Row Results in MapReduce

2010-11-23 Thread Lars George
Hi fnord, See https://issues.apache.org/jira/browse/HBASE-1537 and https://issues.apache.org/jira/browse/HBASE-2673 for details. Not sure when that went in though but you should have that available, no? Lars On Tue, Nov 23, 2010 at 2:48 PM, fnord 99 wrote: > Hi, > > our machines have 24GB of RA

Re: Error: It took too long to wait for the table

2010-11-24 Thread Lars George
Hi Hari, Disabling a table simply takes time as all RSs need to report back that the regions are flushed and closed. You may time out on that. This is where the async, "fire and forget" version of that call comes in. But if you need to wait you need to use the async and then poll the status of the

Re: Error: It took too long to wait for the table

2010-11-24 Thread Lars George
ction. Set "hbase.client.retries.number" to something higher until it works. Lars On Wed, Nov 24, 2010 at 12:01 PM, Hari Sreekumar wrote: > Hi Lars, > >        Is the async version available in hbase-0.20.6 ASF version? It is > still in development right? > > hari >

Re: managing 5-10 servers

2010-11-24 Thread Lars George
I have set up and maintained clusters between 6 and 40 machines while being a full time developer, so all as part of the development process. I used simple scripts like the ones I documented here (http://www.larsgeorge.com/2009/02/hadoop-scripts-part-1.html). Cluster SSH as mentioned is also used m

Re: (Newbie) Use column family for versioning?

2010-11-25 Thread Lars George
Hi Alex, Oh no, you do NOT want to use column families that way. The are semi static and should not be changed too often nor should there be too many. Adding a CF requires disabling the table too. Use columns, row keys or timestamps for that use-case. Lars On Nov 25, 2010, at 17:31, Nanheng

Re: decommissioning nodes

2010-11-25 Thread Lars George
Hi Tim, You can issue a $ hbase-daemon.sh stop regionserver on each node which tells the master to move the regions over and shut down the RS properly. Lars On Nov 25, 2010, at 18:24, Tim Robertson wrote: > Hi all, > > Please forgive this rather naive question - I have a cluster and want >

Re: decommissioning nodes

2010-11-25 Thread Lars George
Hi Tim, Understood, but yes it is handled by the master. Lars On Nov 25, 2010, at 19:25, Tim Robertson wrote: > Hi Lars, > > Thanks. I was just wary of doing that on the .META and ROOT. and > wanted a confirmation. > > Cheers, > Tim > > > > On Thu, Nov

Re: Region server memory configuration question

2010-11-25 Thread Lars George
Hi Rod, I guess you will have to be careful setting them right. The hbase.regionserver.global.memstore.upperLimit does not allocate anything but acts as a barrier or threshold to flush data out. If you were to set this to 1.0 then it would never trigger, while only relying on the other triggers to

Re: (Newbie) Use column family for versioning?

2010-11-25 Thread Lars George
t; Alex > > > On Thu, Nov 25, 2010 at 10:18 AM, Lars George wrote: >> Hi Alex, >> >> Oh no, you do NOT want to use column families that way. The are semi static >> and should not be changed too often nor should there be too many. Adding a >> CF require

Re: Region server memory configuration question

2010-11-25 Thread Lars George
by the > hbase.regionserver.global.memstore.upperLimit > other than the block cache? > Rod > > On Thu, Nov 25, 2010 at 2:00 PM, Lars George wrote: > >> Hi Rod, >> >> I guess you will have to be careful setting them right. The >> hbase.regionserver.global.memstore.upperLimit does not allocate &

Re: IndexedTable class

2010-11-26 Thread Lars George
Hi Hari, ITHBase is what you are asking for I assume? Check the contrib directory, they are in a separate jar you also will need to add. Lars On Fri, Nov 26, 2010 at 8:12 AM, Hari Sreekumar wrote: > Hi, > >     From which version of HBase is this available. I have v0.20.6, but > couldn't find t

Re: IndexedTable class

2010-11-26 Thread Lars George
ything like that? What is the logic behind contrib > folder? > > On Fri, Nov 26, 2010 at 2:05 PM, Lars George wrote: > >> Hi Hari, >> >> ITHBase is what you are asking for I assume? Check the contrib >> directory, they are in a separate jar you also will need to add

Re: IndexedTable class

2010-11-26 Thread Lars George
n you knowledge? > > Thanks a lot for you time, > Hari > > On Fri, Nov 26, 2010 at 8:09 PM, Lars George wrote: > >> Hi Hari, >> >> In the new 0.89+ versions those were all complete removed and moved to >> GitHub. The biggest reason being that you need to ha

Re: Scalability on multi-core machines

2010-11-29 Thread Lars George
It sure does. You have a number of handler threads that serve multiple clients. As with Hadoop you can easily saturate on IO so that the core count is actually only helpful depending if you have a chance to use them properly. That is also why you must have if you can a spindle per core. Must run

Re: Garbage collection issues

2010-11-29 Thread Lars George
Hi Friso, Great to know! Todd was the last one to try to crash G1 and the recent iteration seemed much more stable. Lars On Nov 29, 2010, at 10:49, Friso van Vollenhoven wrote: > On a slightly related note, we've been running with G1 with default settings > on a 16GB heap for some weeks no

Re: incremental counters and a global String->Long Dictionary

2010-11-29 Thread Lars George
Hi Claudio, Did you have a look at Google's Percolator paper? I think a mechanism like this may work. Another option often used to implement distributed transactions is using Zookeeper where you could create an ephemeral node on the new word and the host succeeding to do so is adding it and the

Re: incremental counters and a global String->Long Dictionary

2010-11-29 Thread Lars George
I like that idea Dave. As for the checkAndPut(), this will not work as Claudio intended? He wanted the counter and put to run together, so that former is only half the deal? Just wondering. Lars On Tue, Nov 30, 2010 at 1:43 AM, Buttler, David wrote: > A while back I had a strange idea to bypass

Re: something wrong with hbase mapreduce

2010-12-01 Thread Lars George
What version of HBase are you using? On Dec 1, 2010, at 9:24, 梁景明 wrote: > i found that if i didnt control timestamp of the put > mapreduce can run, otherwise just one time mapreduce. > the question is i scan by timestamp to get my data > so to put timestamp is my scan thing. > > any ideas

Re: something wrong with hbase mapreduce

2010-12-02 Thread Lars George
: > 0.20.6 > > 2010/12/2 Lars George > >> What version of HBase are you using? >> >> On Dec 1, 2010, at 9:24, 梁景明 wrote: >> >> > i found that if i didnt control timestamp of the put >> > mapreduce can run, otherwise just one time mapreduce. &

Re: Hbase setup on a single windows machine - running commands throws MasterNotRunningException

2010-12-02 Thread Lars George
Have you seen the "HBase on Windows" page on the HBase Wiki? It may help along the way. On Dec 2, 2010, at 19:30, Vijay wrote: > Hi Dave, >As you suggested, I started the zookeeper explicitly using the > following command > $ /cygdrive/g/hbase/bin/hbase-daemon.sh --config > /cygdri

Re: something wrong with hbase mapreduce

2010-12-03 Thread Lars George
5、i ran javacode again. > 6、i ran shell_2 to scan ,insert data failed > -- > ROW                          COLUMN+CELL > ------ > 7、i ran shell_4 > 8、i ran javacode again. > 9、i ran shell_2 to scan ,ins

Re: Writing to HBase cluster on EC2 with Java client

2010-12-04 Thread Lars George
Hi Alex, You will need to add your client IP address (or - but not really recommmended - 0.0.0.0/0 for the world) into the Security Group that you used to start the cluster on EC2 and allow TCP access to a few ports that the client needs to communicate with HBase. For starters 2181 which is the Zo

Re: Writing to HBase cluster on EC2 with Java client

2010-12-05 Thread Lars George
mons.sh start zookeeper > Of cause, you need't restart HBase cluster if you do as i say. > > On Sun, Dec 5, 2010 at 1:03 AM, Lars George wrote: > >> Hi Alex, >> >> You will need to add your client IP address (or - but not really >> recommmended - 0.0.0.0/0 f

Re: Newbie question about scan filters

2010-12-05 Thread Lars George
Hi Hari, The filters are applied server side (aka predicate pushdown). If you were to scan and skip you incur extra IO cost. Filters, as they are implemented right now are simply skipping and therefore all they can do is slightly improve a normal scan performance. In the future you might get bette

Re: Command line integration question

2010-12-06 Thread Lars George
Hi Dmitriy, I think you sent this to the wrong list? You sent to hbase-user but this is a Mahout related question. Please check. Lars On Mon, Dec 6, 2010 at 12:17 AM, Dmitriy Lyubimov wrote: > Dear all, > > I am testing the command line integration for the SSVD patch in hadoop mode > and runnin

Re: something wrong with hbase mapreduce

2010-12-06 Thread Lars George
ather the shell delete thing setting some > current timestamp in hbase. > > so, when i put the data timestamp before current  , that would not set. > > i am not sure about this. > > thanks any way , > > 2010/12/3 Lars George > >> Did you check that the compaction was

Re: Newbie question about scan filters

2010-12-06 Thread Lars George
Hi Jiajun, Sure, why not? What are you trying to achieve? Lars On Mon, Dec 6, 2010 at 3:19 AM, 陈加俊 wrote: > Hi > > Do I can use the scan filter ? the HBase that we used is version 0.20.6. > > jiajun > > On Mon, Dec 6, 2010 at 2:05 AM, Lars George wrote: > >>

Re: Restricting insert/update in HBase

2010-12-06 Thread Lars George
Hi Hari, What you are asking for is transactions. I'd say try to avoid it. HBase can only guarantee atomicity on the row level. So if you want something across tables and rows then you need to use for example ZooKeeper to implement a transactional support system- There is also THBase, which gives

Re: blocked when creating HTable

2010-12-06 Thread Lars George
Hi Exception, For starters the logs say you are trying the wrong ZooKeeper node to get the HBase details (localhost) and you config has: hbase.zookeeper.quorum dev32 hbase.zookeeper.quorum localhost You are declaring it twice and the last one wins. Remove the second

Re: Issues running a large MapReduce job over a complete HBase table

2010-12-06 Thread Lars George
Hi Gabriel, What max heap to you give the various daemons? This is really odd that you see OOMEs, I would like to know what it has consumed. You are saying the Hadoop DataNodes actually crash with the OOME? Lars On Mon, Dec 6, 2010 at 9:02 AM, Gabriel Reid wrote: > Hi, > > We're currently runni

Re: Error while creating a table with compression enabled

2010-12-06 Thread Lars George
Hi AK, This issue? https://issues.apache.org/jira/browse/HBASE-3310 Lars On Mon, Dec 6, 2010 at 9:17 AM, Amandeep Khurana wrote: > The command I'm running on the shell: > > create 'table', {NAME=>'fam', COMPRESSION=>'GZ'} > > or > > create 'table', {NAME=>'fam', COMPRESSION=>'LZO'} > > > Here'

Re: blocked when creating HTable

2010-12-07 Thread Lars George
problem. > > I also run the flume instance on the same cluster. Do the flume and hbase > share the same zookeeper? Is this the reason why I get this problem? > > > > > On Mon, Dec 6, 2010 at 7:27 PM, Lars George wrote: > >> Hi Exception

Re: Best Practices Adding Rows

2010-12-07 Thread Lars George
Hi Alex, That is indeed the recommended way, i.e. use binary values if you can. As long as you can express the same sorting as a long as opposed to a string then that's the way to go for sure. Lars On Dec 7, 2010, at 8:21, Alex Baranau wrote: > I think I've faced by the key format, smth lik

Re: blocked when creating HTable

2010-12-07 Thread Lars George
= 0x12cc00271d00013, negotiated > timeout = 4 > > As you can see, it can connect to zookeeper but still block. > > I dig into the code a little bit and find the program blocked at this line: > this.connection.locateRegion(tableName, HConstants.EMPTY_START_ROW); > at HTable.java > > T

Re: Region loadbalancing

2010-12-14 Thread Lars George
Hi Jan, Any day now! Really, there just a few little road bumps but nothing major ad once they are resolved it will be released. Just rushing it for the sake of releasing it will not make anyone happy (if we find issues right away just afterwards). Please bear with us! Lars On Tue, Dec 14, 2010

Re: Region loadbalancing

2010-12-14 Thread Lars George
t; > On 14.12.2010 16:17, Lars George wrote: >> >> Hi Jan, >> >> Any day now! >> >> Really, there just a few little road bumps but nothing major ad once >> they are resolved it will be released. Just rushing it for the sake of >> releasing it will n

Re: recent TableInputFormat implementation

2010-12-14 Thread Lars George
Hi Norbert, You have seen http://dumbotics.com/2009/07/31/dumbo-over-hbase/ and https://github.com/tims/lasthbase thought, right? Isn't that what you are looking for? Lars On Tue, Dec 14, 2010 at 7:32 PM, Norbert Burger wrote: > Thanks J-D :-)  Somehow, I missed the javadocs for TIFB/TIF, w

Re: Difference between logs present under /hbase/.logs and /hbase/testTable/6674094/.logs directory

2010-12-14 Thread Lars George
Hi Mohit, The one under /hbase/.logs is the one per region server. It is split in case there is a region server crash and put into /hbase///.oldlogs before the region is redeployed. Are you sure you saw a .logs underneath the region directory or was it in fact a .oldlogs? Lars On Wed, Dec 15, 2

Re: Proper Versioning of Data is not happening as configured in HColumnDescriptor object while creating table via HBase client.

2010-12-20 Thread Lars George
Hi, What Stack says is right and the same for trunk, i.e. when you ask with a specific timestamp and have not compacted the stores yet you will see the specific version even if there are 3 or more newer ones. The logic is in the ScanQueryMatcher.match() function. It skips the newer version and

Re: Slow MR data load to table

2010-12-21 Thread Lars George
Hi Bradford, I heard this before recently and one of the things that bit the person in question in the butt was swapping. Could you check that all machines are positively healthy and not swapping etc. - just to rule out the (not so) obvious stuff. Lars On Mon, Dec 20, 2010 at 8:22 PM, Bradford S

Re: Slow MR data load to table

2010-12-22 Thread Lars George
Is this up on EC2 then you may know that write performance is a magnitude slower than an a comparable dedicated cluster! Most EC2 cluster I have tested (with and without EBS and various instance sizes etc.) only did about 2-3MB/s - taken this into account can you do the math if they do even less

Re: Location of 0.9 REST Documentation?

2010-12-23 Thread Lars George
Hi Michael, I noticed the same and raised that issue a few days back. We will add the documentation back in, it must have been dropped during the merge. Thanks for bringing it up here though. Lars On Thu, Dec 23, 2010 at 4:32 AM, Michael Russo wrote: > The 0.20 branch has detailed documentation

Re: HBase Client connect to remote HBase

2010-12-23 Thread Lars George
Please note that there is an issue with a test config file in the hbase-test.jar that overrides the configuration. Can you make sure you do not have the hbase-test.jar on your client's classpath? Lars On Thu, Dec 23, 2010 at 9:34 AM, King JKing wrote: > When I comment any line contain 127.0.0.1

Re: Insert into tall table 50% faster than wide table

2010-12-23 Thread Lars George
Writing data only hits the WAL and MemStore, so that should equal in the same performance for both models. One thing that Mike mentioned is how you distribute the load. How many servers are you using? How are inserting your data (sequential or random)? Why do you use a Put since this sounds like a

Re: HBase Bulk Load script

2010-12-23 Thread Lars George
Hi Marc, > 1) It seems importtsv will only accept one family at a time. It shows some > sort of security access error if I give it a column list with columns from > different families.  Is this a limitation of the bulk loader, or is this a > consequence of some security configuration somewhere? T

Re: HBase Bulk Load script

2010-12-28 Thread Lars George
> > Does that sound right? > > Marc > > > On Thu, Dec 23, 2010 at 2:34 PM, Todd Lipcon wrote: > >> You beat me to it, Lars! Was writing a response when some family arrived >> for >> the holidays, and when I came back, you had written just what I had started

Re: Hbase/Hadoop cluster setup on AWS

2011-01-03 Thread Lars George
Hi H, While you can do that by hand I strongly recommend using Apache Whirr (http://incubator.apache.org/projects/whirr.html) which has Hadoop and (in trunk now) also HBase support, straight from the Apache tarballs. If you want to set them up manually then you simply spin up N machines and follo

Re: Hbase/Hadoop cluster setup on AWS

2011-01-04 Thread Lars George
p.com/ > > > > - Original Message >> From: Lars George >> To: user@hbase.apache.org >> Sent: Mon, January 3, 2011 12:32:11 PM >> Subject: Re: Hbase/Hadoop cluster setup on AWS >> >> Hi H, >> >> While you can do that by hand I stro

Re: Hbase/Hadoop cluster setup on AWS

2011-01-04 Thread Lars George
.philwhln.com/map-reduce-with-ruby-using-hadoop > > Thanks, > Phil > > On Tue, Jan 4, 2011 at 12:23 AM, Lars George wrote: >> Hi Otis, >> >> It also supports CDH although it does only start Hadoop >> (HDFS/MapReduce). I am going to open a JIRA to facilitate the s

Re: Hbase/Hadoop cluster setup on AWS

2011-01-05 Thread Lars George
10-245-121-242.ec2.internal:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=/ > > I'm guessing if you do not browse the file-system through the browser. > Otherwise, it's possible I missed a step that then required me to have > to do this. > > Cheers, > Phil > >

Re: HBase / HDFS on EBS?

2011-01-05 Thread Lars George
Hi, I ran some tests on various EC2 clusters from c1.medium, c1.xlarge to m2.2xlarge with EBS on 1+10 instances. The instance storage usually averages at around 2-3 MB/s for writes and the EBS backed m2.2xlarge did 7-8 MB/s on writes. Reading I think is less an issue, but writing is really bad. On

Re: hbase

2011-01-05 Thread Lars George
I never got that reverse logic, who came up with that? On Wed, Jan 5, 2011 at 6:59 PM, Jean-Daniel Cryans wrote: > What you are doing is filtering out the rows with the value you are looking > for. > > J-D > > On Wed, Jan 5, 2011 at 12:45 AM, how to get the  cell value > wrote: >>  I would like

Re: hbase

2011-01-06 Thread Lars George
Oh, well spotted Tatsuya, didn't even see his use of "tablename" there! On Thu, Jan 6, 2011 at 4:22 AM, Tatsuya Kawano wrote: > > Hi, > >>  byte [] by = rs.getValue(Bytes.toBytes(tablename)); > > Result#getValue() doesn't take a table name but a column name in > "family:qualifier" format, so you

Re: HBase tuning - minimise table read latency

2011-01-14 Thread Lars George
Hi Joel, Marking it "in-memory" is *not* making it all stay or be loaded into memory. It is just a priority flag to retain blocks of that CF preferably in the block caches. So it caches it up to the max block cache size. The rest may cause some churn but that is the best you can do. Lars On Tue,

Re: How to delete a table manually

2011-01-21 Thread Lars George
Hi Wayne, 0.90.0 is out. Get it while it's hot from the HBase home page. Lars On Jan 21, 2011, at 20:22, Wayne wrote: > I enthusiastically created a ticket: > https://issues.apache.org/jira/browse/HBASE-3463 > > This might be a dumb question I should already know the answer to...but when > i

Re: Building LZO jar

2011-01-26 Thread Lars George
I agree with Friso, using Todd's LZO Packager this is really easy: https://github.com/toddlipcon/hadoop-lzo-packager Lars On Wed, Jan 26, 2011 at 10:41 AM, Friso van Vollenhoven wrote: > Are you sure it is not a problem with the network on your side? Clicking the > link and downloading that ja

Re: LZO Codec not found

2011-01-26 Thread Lars George
No as the data is now stored compressed and needs to be read somehow. You simply have to follow the steps outlined above and get it started again. :( Lars On Tue, Jan 25, 2011 at 8:16 PM, Peter Haidinyak wrote: > Thanks, is there a way to turn off compression on a table when the region > server

Re: estimate HBase DFS filesystem usage

2011-01-26 Thread Lars George
Benoit, You probably tripped this up? https://issues.apache.org/jira/browse/HBASE-3476 Lars On Wed, Jan 26, 2011 at 5:53 AM, tsuna wrote: > You can run ``hbase org.apache.hadoop.hbase.io.hfile.HFile -f > "$region" -m'' where $region is every HFile (located under > /hbase/$table/*/$family).  Thi

Region Split Comic

2011-01-26 Thread Lars George
Hi, Have a look at http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/ This is awesome and applies equally to HBase, simply s/Tablet/Region/g and you have the same issues using sequential row keys. Ikai, if you read this, please do another set for HB

Re: Compress output using HFileOutputFormat

2011-01-26 Thread Lars George
Hi Nanheng, You have to use its own configuration key to enable it. So when you create the job configuration do add an conf.set("hfile.compression", "gz"); Obviously before you create the job or use job.getConfiguration().set("hfile.compression", "gz"); instead. Lars On Thu, Jan 27, 2011 at

Re: No job jar file set Map Reduce Job

2011-01-26 Thread Lars George
Hi Stuart, Do you have the usual job.setJarByClass(.class); ? Lars On Thu, Jan 27, 2011 at 7:53 AM, Stack wrote: > Does the job run anyway? > St.Ack > > On Wed, Jan 26, 2011 at 10:43 PM, Stuart Scott wrote: >> Hi, >> >> >> >> Has anyone come across the error below? Any ideas how to resol

Re: LZO Codec not found

2011-01-26 Thread Lars George
Hi Jesse, Yeah, I'd recommend Todd's version as well. I will ask if we should update the Wiki accordingly. Lars On Wed, Jan 26, 2011 at 9:27 PM, Jesse Hutton wrote: > There may still be an issue with http://github.com/kevinweil/hadoop-lzo and > CDH3B3. I ran into something similar to > https://

Re: Building LZO jar

2011-01-26 Thread Lars George
, so it also works > for the local installs on dev machines... > > > Friso > > > > On 26 jan 2011, at 12:38, Lars George wrote: > >> I agree with Friso, using Todd's LZO Packager this is really easy: >> https://github.com/toddlipcon/hadoop-lzo-packager &g

Re: No job jar file set Map Reduce Job

2011-01-27 Thread Lars George
AME+"_"+tablename); >        job.setJarByClass(RowCount.class); > > > -Original Message- > From: Lars George [mailto:lars.geo...@gmail.com] > Sent: 27 January 2011 07:07 > To: user@hbase.apache.org > Subject: Re: No job jar file set Map Reduce Job > >

Re: Row Keys

2011-01-30 Thread Lars George
Hi Pete, Look into the Mozilla Socorro project (http://code.google.com/p/socorro/) for how to "salt" the keys to get better load balancing across sequential keys. The principle is to add a salt, in this case a number reflecting the number of servers available (some multiple of that to allow for gr

Re: maven repository

2011-01-30 Thread Lars George
Thank you Stack for doing this. Appreciated. On Mon, Jan 31, 2011 at 5:33 AM, Stack wrote: > Daniel: > > It looks like 0.90.0 hbase is showing in the releases repository now. > Let me know if an issue with it. > > Sorry it took so long, > St.Ack > > > > On Wed, Jan 26, 2011 at 1:05 PM, Daniel Ian

Re: is there a pluggable conflict resolver in hbase

2011-01-30 Thread Lars George
Hi Dean, Yes you can do that. See a (slightly outdated) oost about them here http://hbaseblog.com/2010/11/30/hbase-coprocessors/ Think of Coprocessors at their simplest level as triggers before and after basically every event that can happen on the server side. You can use this as you intended to

Re: Persist JSON into HBase

2011-02-03 Thread Lars George
Sorry for the late bump... It is quite nice to store JSON as strings in HBase, i.e. use for example JSONObject to convert to something like "{ "name' : "lars" }" and then Bytes.toBytes(jsonString). Since Hive now has a HBase handler you can use Hive and its built in JSON support to query cells lik

Re: Region Servers Crashing during Random Reads

2011-02-03 Thread Lars George
Hi Stack, I was just asking Todd the same thing, ie. fixed new size vs NewRatio. He and you have done way more on GC debugging than me so I trust whatever Todd or you say. I would leave the UseParNewGC for good measure (not relying on implicit defaults). I also re-read just before I saw your reply

  1   2   3   >