Re: TableInputFormat vs. a map of table regions (data locality)

2010-11-17 Thread Lars George
Hi Joy, [1] is what [2] does. They are just a thin wrapper around the raw API. And as Alex pointed out and you noticed too, [2] adds the benefit to have locality support. If you were to add this to [1] then you have [2]. Lars On Thu, Nov 18, 2010 at 5:30 AM, Saptarshi Guha wrote: > Hello, > >

Re: Restoring table from HFiles

2010-11-17 Thread Lars George
I would not say "no" immediately. I know some have done so (given the version was the same) and used add_table.rb to add the table to META. YMMV. Lars On Thu, Nov 18, 2010 at 6:01 AM, Ted Yu wrote: > No. > > See https://issues.apache.org/jira/browse/HBASE-1684 > > On Wed, Nov 17, 2010 at 8:25 PM

Re: Using Patched Version of Hadoop for HBase 0.90.0

2010-11-17 Thread Lars George
Hi Navraj, This is because 0.90 uses Maven, and that has a local cache (usually under ~/.m2). You need to replace the existing jar with yours, see http://maven.apache.org/guides/mini/guide-3rd-party-jars-local.html for example on how to do this. Replace the jar with yours and use the following art

Re: Xceiver problem

2010-11-17 Thread Lars George
You haven't answered all questions yet :) Are you running this on EC2? What instance types? On Thu, Nov 18, 2010 at 12:12 AM, Lucas Nazário dos Santos wrote: > It seems that newer Linux versions don't have the > file /proc/sys/fs/epoll/max_user_instances, but instead > /proc/sys/fs/epoll/max_user

Re: TableInputFormat vs. a map of table regions (data locality)

2010-11-17 Thread Alex Baranau
What are the benefits you are looking for with the first option? With TableInputFormat it'll start as many map tasks as you have regions and data processing will benefit from data locality. From javadoc ( http://hbase.apache.org/docs/r0.20.6/api/org/apache/hadoop/hbase/mapreduce/package-summary.htm

Re: Restoring table from HFiles

2010-11-17 Thread Ted Yu
No. See https://issues.apache.org/jira/browse/HBASE-1684 On Wed, Nov 17, 2010 at 8:25 PM, Hari Sreekumar wrote: > Hi, > >I just wanted to know if it is possible to copy an HBase table into > another HDFS by simply copying the directory from old HDFS to > local system and dumping it into it

TableInputFormat vs. a map of table regions (data locality)

2010-11-17 Thread Saptarshi Guha
Hello, I'm fairly new to HBase and would appreciate your comments. [1] One way compute across an HBase dataset would be to run as many maps as regions, for each map, run a scan across the region row limits (within the map method). This approach does not use TableInputFormat.In the reduce (if need

Restoring table from HFiles

2010-11-17 Thread Hari Sreekumar
Hi, I just wanted to know if it is possible to copy an HBase table into another HDFS by simply copying the directory from old HDFS to local system and dumping it into it into new HDFS? thanks, hari

Re: Problem in Running HBase in Pseudo Distributed Mode

2010-11-17 Thread Hari Sreekumar
Tried using localhost everywhere instead of server name(alaukik) too? Do you get the same exception in that case? By no master log you mean no exceptions right? On Wed, Nov 17, 2010 at 11:06 PM, Alaukik Aggarwal < alaukik.aggar...@gmail.com> wrote: > I tried this. I waited for 5 min before execut

RE: Problem in Running HBase in Pseudo Distributed Mode

2010-11-17 Thread Buttler, David
Why was zookeeper killed? Did you ever look into the zookeeper data to see if hbase was able to store it information in there? Dave -Original Message- From: Alaukik Aggarwal [mailto:alaukik.aggar...@gmail.com] Sent: Wednesday, November 17, 2010 4:45 PM To: user@hbase.apache.org Subject:

Re: Problem in Running HBase in Pseudo Distributed Mode

2010-11-17 Thread Alaukik Aggarwal
I have turned distributed flag to true, but still the problem persists ... Following are the recent logs of Zookeeper 010-11-17 18:39:26,029 INFO org.apache.zookeeper.server.ZooKeeperServerMain: Starting server 2010-11-17 18:39:26,071 INFO org.apache.zookeeper.server.ZooKeeperServer: Server envir

Using Patched Version of Hadoop for HBase 0.90.0

2010-11-17 Thread Navraj S. Chohan
Hi All, I'm trying to upgrade from HBase 0.20.6 to HBase 0.90.0. I'm running into a problem with "Wrong FS". The patch I'm applying to hadoop, which used to work for my hbase-0.20.6, is not correctly showing up in the 0.90.0 install when running. The patch I apply to hadoop is: http://bazaar.launc

Re: Xceiver problem

2010-11-17 Thread Lucas Nazário dos Santos
It seems that newer Linux versions don't have the file /proc/sys/fs/epoll/max_user_instances, but instead /proc/sys/fs/epoll/max_user_watches. I'm not quite sure about what to do. Can I favor max_user_watches over max_user_instances? With what value? I also tried to play with the Xss argument and

RE: Problem in Running HBase in Pseudo Distributed Mode

2010-11-17 Thread Buttler, David
I believe that you should have turned the distributed flag to true. Also, you should check your zookeeper to see if the master had successfully registered there. Dave -Original Message- From: Alaukik Aggarwal [mailto:alaukik.aggar...@gmail.com] Sent: Wednesday, November 17, 2010 1:34

Re: Xceiver problem

2010-11-17 Thread Lars George
That is what I was also thinking about, thanks for jumping in Todd. I was simply not sure if that is just on .27 or all after that one and the defaults have never been increased. On Wed, Nov 17, 2010 at 8:24 PM, Todd Lipcon wrote: > On that new of a kernel you'll also need to increase your epoll

Re: Problem in Running HBase in Pseudo Distributed Mode

2010-11-17 Thread Alaukik Aggarwal
Please... any suggestions ?? On Wed, Nov 17, 2010 at 12:36 PM, Alaukik Aggarwal wrote: > I tried this. I waited for 5 min before executing list command on shell. > > Also,there was no log in HMaster logs when this exception (ERROR: > org.apache.hadoop.hbase.MasterNotRunningException: null) was t

Re: Xceiver problem

2010-11-17 Thread Todd Lipcon
On that new of a kernel you'll also need to increase your epoll limit. Some tips about that here: http://www.cloudera.com/blog/2009/03/configuration-parameters-what-can-you-just-ignore/ Thanks -Todd On Wed, Nov 17, 2010 at 9:10 AM, Lars George wrote: > Are you running on EC2? Couldn't you simp

Re: Problem in Running HBase in Pseudo Distributed Mode

2010-11-17 Thread Alaukik Aggarwal
I tried this. I waited for 5 min before executing list command on shell. Also,there was no log in HMaster logs when this exception (ERROR: org.apache.hadoop.hbase.MasterNotRunningException: null) was thrown while accessing through HBase shell. On Wed, Nov 17, 2010 at 3:12 AM, Hari Sreekumar wr

Re: Correlating traffic with regions

2010-11-17 Thread Jean-Daniel Cryans
AFAIK most monitoring systems don't like dynamically-named metrics, for example in ganglia you would end up with an ever growing number of metrics for req/regions (one for each region that the region server ever had). At the very least it should be included in the region server report so that the m

Re: Xceiver problem

2010-11-17 Thread Lars George
Are you running on EC2? Couldn't you simply up the heap size for the java processes? I do not think there is a hard and fast rule to how many xcievers you need, trial and error is common. Or ifmyou have enough heap simply set it too high, like 4096 and that usually works fine. It all depends on ho

Re: Xceiver problem

2010-11-17 Thread Lucas Nazário dos Santos
I'm using Linux, the Amazon beta version that they recently released. I'm not very familiar with Linux, so I think the kernel version is 2.6.34.7-56.40.amzn1.x86_64. Hadoop version is 0.20.2 and HBase version is 0.20.6. Hadoop and HBase have 2 GB each and they are not sawpping. Besides all other q

Re: Correlating traffic with regions

2010-11-17 Thread Lars George
JD, Should we create a metric for it so that it dynamically counts per region its usage? That can then be exposed via Ganglia context or JMX. Just wondering. Lars On Wed, Nov 17, 2010 at 5:04 PM, Vaibhav Puranik wrote: > hi, > > Thanks for the suggestions JD & Michael. > The region servers serv

Re: Xceiver problem

2010-11-17 Thread Lars George
Hi Lucas, What OS are you on? What kernel version? What is your Hadoop and HBase version? How much heap do you assign to each Java process? Lars On Wed, Nov 17, 2010 at 3:05 PM, Lucas Nazário dos Santos wrote: > Hi, > > This problem is widely know, but I'm not able to come up with a decent > so

Re: Correlating traffic with regions

2010-11-17 Thread Vaibhav Puranik
hi, Thanks for the suggestions JD & Michael. The region servers serving ROOT & META regions are fine. I will try analysing tcpdump output. Regards, Vaibhav GumGum On Tue, Nov 16, 2010 at 7:15 AM, Michael Segel wrote: > > Beyond this... which region is serving your ROOT and meta data? > > Tha

Xceiver problem

2010-11-17 Thread Lucas Nazário dos Santos
Hi, This problem is widely know, but I'm not able to come up with a decent solution for it. I'm scanning 1.000.000+ rows from one table in order to index their content. Each row has around 100 KB. The problem is that I keep getting the exception: Exception in thread "org.apache.hadoop.dfs.datano

Re: Bulk Load Sample Code

2010-11-17 Thread Alex Baranau
I believe "by chance" here = *1* reduce job. Alex Baranau Sematext :: http://sematext.com/ On Mon, Nov 15, 2010 at 7:09 PM, Todd Lipcon wrote: > On Mon, Nov 15, 2010 at 12:24 AM, Shuja Rehman >wrote: > > > If HRegionPartitioner works correctly then what is the use of > > configureIncremen

Re: Problem in Running HBase in Pseudo Distributed Mode

2010-11-17 Thread Hari Sreekumar
It seems to be all right. How long do you wait after starting HBase before trying to list? Wait 5 mins after starting the daemons before listing. Did you try list after checking the master log (the one you have uploaded here) ? You could also try using localhost instead of the hostname in the conf