Re: Help with Map/Reduce program

2009-06-10 Thread llpind
Sorry I forgot to mention the overflow then overflows into new row keys per 10,000 column entries (or some other split number). llpind wrote: > > > When is the plan for releasing .20? This particular issue is really > important to us. > > Stack, I also have another question: The problem we

Re: Help with Map/Reduce program

2009-06-10 Thread llpind
When is the plan for releasing .20? This particular issue is really important to us. Stack, I also have another question: The problem we are trying to solve doesn't really need the extra layer present in HBase (BigTable) structure (RowResult holds row key and a HashMap of column name, value).

Re: PerformanceEvaluation test

2009-06-10 Thread llpind
cambridgemike wrote: > > > -tried moving hbase-0.19.2.jar to the hadoop/lib folder of all the slave > machines. > > Hmm thats weird. Moving the hbase jars solved my issue. go to the job tracker UI, and look at what machine is throwing the exception, and make sure you have hbase jars in

Re: fetch data in 1-n relationship

2009-06-10 Thread Jean-Daniel Cryans
monty, 2 things you can do: 1- Serialize courses data into the courses family in the student table. You duplicate data, but disk is cheap so that's ok now. 2- If all you need is to first show the course id and the course title (or description), you can just put that as the value in the courses f

Re: Help with Map/Reduce program

2009-06-10 Thread stack
On Wed, Jun 10, 2009 at 4:52 PM, llpind wrote: > > Thanks. I think the problem is I have potentially millions of columns. > > where a given RowResult can hold millions of columns to values. Thats why > Map/Reduce is having problems as well (Java Heap exception). I've upped > mapred.child.jav

Re: PerformanceEvaluation test

2009-06-10 Thread cambridgemike
Hi, I'm having almost the exact same problem (this rowcounter jar is one I compiled myself) ./bin/hadoop jar rowcounter.jar org.myorg.RowCounter /user/myUser/output/ TABLE_NAME 09/06/10 19:44:19 INFO mapred.TableInputFormatBase: split: 0->domain:,18226263 09/06/10 19:44:19 INFO mapred.TableInput

Re: Help with Map/Reduce program

2009-06-10 Thread llpind
Thanks. I think the problem is I have potentially millions of columns. where a given RowResult can hold millions of columns to values. Thats why Map/Reduce is having problems as well (Java Heap exception). I've upped mapred.child.java.opts, but problem presists. Ryan Rawson wrote: > > Hey,

Re: Help with Map/Reduce program

2009-06-10 Thread Ryan Rawson
Hey, A scanner's lease expires in 60 seconds. I'm not sure what version you are using, but try: table.setScannerCaching(1); This way you won't retrieve 60 rows that each take 1-2 seconds to process. This is the new default value in 0.20, but I don't know if it ended up in 0.19.x anywhere. On

Re: HBase Failing on Large Loads

2009-06-10 Thread Ryan Rawson
Hey, Looks lke you have some HDFS issues. Things I did to make myself stable: - run HDFS with -Xmx=2000m - run HDFS with 2047 xciever limit (goes into hdfs-core.xml or hadoop-site.xml) - ulimit -n 32k - also important With this I find that HDFS is very stable, I've imported hundreds of gigs. Y

Re: HBase Failing on Large Loads

2009-06-10 Thread Bradford Stephens
Thanks so much for all the help, everyone... things are still broken, but maybe we're getting close. All the regionservers were dead by the time the job ended. I see quite a few error messages like this: (I've put the entirety of the regionserver logs on pastebin:) http://pastebin.com/m2e6f9283

Re: HBase Failing on Large Loads

2009-06-10 Thread Ryan Rawson
That is a client exception that is a sign of problems on the regionserver...is it still running? What do the logs look like? On Jun 10, 2009 2:51 PM, "Bradford Stephens" wrote: OK, I've tried all the optimizations you've suggested (still running with a M/R job). Still having problems like this:

Re: HBase Failing on Large Loads

2009-06-10 Thread Bradford Stephens
Also, there's a slight variation: "Trying to contact region server Some server for region joinedcontent" "Some server"? Interesting :) On Wed, Jun 10, 2009 at 2:50 PM, Bradford Stephens wrote: > OK, I've tried all the optimizations you've suggested (still running > with a M/R job). Still having p

Re: HBase Failing on Large Loads

2009-06-10 Thread Bradford Stephens
OK, I've tried all the optimizations you've suggested (still running with a M/R job). Still having problems like this: org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server 192.168.18.15:60020 for region joinedcontent,242FEB3ED9BE0D8EF3856E9C4251464C,12446665943

Re: Help with Map/Reduce program

2009-06-10 Thread llpind
Okay, I think I got it figured out. although when scanning large row keys I do get the following exception: NativeException: java.lang.RuntimeException: org.apache.hadoop.hbase.UnknownScannerException: org.apache.hadoop.hbase.UnknownScannerException: -4424757523660246367 at org.apache.ha

Re: scanner on a given column: whole table scan or just the rows that have values

2009-06-10 Thread Billy Pearson
might look in to the api for there packages org.apache.hadoop.hbase.regionserver.tableindexed org.apache.hadoop.hbase.client.tableindexed http://hadoop.apache.org/hbase/docs/r0.19.3/api/index.html Not sure anything about them I never used but I thank it allows a index on columns Billy "Navee

Re: Help with Map/Reduce program

2009-06-10 Thread Billy Pearson
Yes that's what scanners are good for they will return all the columns:lables combos for a row What does the MR job stats say for rows processed for the maps and reduces? Billy Pearson "llpind" wrote in message news:23967196.p...@talk.nabble.com... also, I think what we want is a way to

Re: for one specific row: are the values of all columns of one family stored in one physical/grid node?

2009-06-10 Thread Billy Pearson
All the columns for any row key will be stored on one server hosted by one region the regions are split by row key not columns So all the columns for rowx will be only in one region on one server. A table is made up of regions 1 to start with as more rows are added the regions split by row eac

Re: Help with Map/Reduce program

2009-06-10 Thread llpind
also, I think what we want is a way to wildcard everything after colFam1: (e.g. colFam1:*). Is there a way to do this in HBase? This is assuming we dont know the column name, we want them all. llpind wrote: > > Thanks. > > Yea I've got that colFam for sure in the HBase table: > > {NAME =

Re: Help with Map/Reduce program

2009-06-10 Thread stack
rowcounter counts rows only. it does not produce any output. St.Ack On Wed, Jun 10, 2009 at 10:03 AM, llpind wrote: > > Thanks. > > Yea I've got that colFam for sure in the HBase table. > > I've been trying to play with rowcounter, and not having much luck either. > > I run the command: > hadoo

Re: Help with Map/Reduce program

2009-06-10 Thread llpind
Thanks. Yea I've got that colFam for sure in the HBase table. I've been trying to play with rowcounter, and not having much luck either. I run the command: hadoop19/bin/hadoop org.apache.hadoop.hbase.mapred.Driver rowcounter /home/hadoop/dev/rowcounter7 tableA colFam1: The map/reduce finishe

Re: for one specific row: are the values of all columns of one family stored in one physical/grid node?

2009-06-10 Thread Ric Wang
Billy, By saying "columns for key1 will not be on all the nodes but just one node in the cluster", you really mean "columns of the SAME family for key1...", right? Please correct me if I am wrong, but I think for the row key "key1", the data value of "familyA:lableX" and that of "familyB:labelY"

Re: scanner on a given column: whole table scan or just the rows that have values

2009-06-10 Thread Naveen Koorakula
That's correct - if you meant "it will have to scan EACH row in that column family with atleast one non-empty cell". >From http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture: "Each column family in a region is managed by an *HStore*. Each HStore may have one or more *MapFiles* (a Hadoop HDFS fi

fetch data in 1-n relationship

2009-06-10 Thread monty123
Hi All I define two tables having one to many relationship: Student: student id student data (name, address, ...) courses (use course ids as column qualifiers here) Course: course id course data (name, syllabus, ...) My problem is, using Java client/program how to fetch cour

Re: HBase Failing on Large Loads

2009-06-10 Thread stack
On Tue, Jun 9, 2009 at 11:51 AM, Bradford Stephens < bradfordsteph...@gmail.com> wrote: > I sort of need the reduce since I'm combining primary keys from a CSV > file. Although I guess I could just use the combiner class... hrm. > > How do I decrease the batch size? Below is from hbase-default.

Re: HBase/Hadoop production use?

2009-06-10 Thread stack
You have seen this list: http://wiki.apache.org/hadoop/Hbase/PoweredBy? These are the folks who volunteered to share the fact that they are using hbase in production. St.Ack 2009/6/9 Jürgen Kaatz > Hi, > > can anybody tell me, if one uses Hbase/Hadoop in a production environment? > Any hints wou