Re: incremental loads into hadoop

2011-10-01 Thread Bejoy KS
Sam Try looking into Flume if you need to load incremental data into hdfs . If the source data is present on some JDBC compliant data bases then you can use SQOOP to get in the data directly into hdfs or hive incrementally. For Big Data Aggregation and Analytics Hadoop is definitely a good ch

Re: Hadoop : Linux-Window interface

2011-10-05 Thread Bejoy KS
Hi Aditya Definitely you can do it. As a very basic solution you can ftp the contents to LFS(LOCAL/Linux File System ) and they do a copyFromLocal into HDFS. Create a hive table with appropriate regex support and load the data in. Hive has classes that effectively support parsing and loadi

Re: FW: Error running org.apache.hadoop.examples.DBCountPageView

2011-10-07 Thread Bejoy KS
Hi Clovis From the exception, it is clearly due to a type mismatch in the Key Value flow between mapper,combiner and reducer. The reducer/combiner is expecting a key from the mapper of type Text,but instead it is receiving a Key of the type LongWritable. I didn't get a chance to debug the

Re: execute hadoop job from remote web application

2011-10-18 Thread Bejoy KS
Oleg If you are looking at how to submit your jobs using JobClient then the below sample can give you a start. //get the configuration parameters and assigns a job name JobConf conf = new JobConf(getConf(), MyClass.class); conf.setJobName("SMS Reports"); //setting ke

Re: execute hadoop job from remote web application

2011-10-18 Thread Bejoy KS
s all classes from my_hadoop_job.jar? > 2) Do I need to submit a my_hadoop_job.jar too? If yes what is the way to > do > it? > > Thanks In Advance > Oleg. > > On Tue, Oct 18, 2011 at 2:11 PM, Uma Maheswara Rao G 72686 < > mahesw...@huawei.com> wrote: > > > - Origi

Re: running sqoop on hadoop cluster

2011-10-21 Thread Bejoy KS
Hi Firantika HDFS is the underlying file system and the meta data to HDFS is stored in Name Node and the actual data blocks are in DataNode. You can have a NameNode and DataNode running on the same physical machine then the metadata and some data blocks would be on same physical machine.

Re: Input split for a streaming job!

2011-11-11 Thread Bejoy KS
Hi Raj Is your Streaming job using WholeFileInput Format or some Custom Input Format that reads files as a whole? If that is the case then this is the expected behavior. Also you mentioned you changed the dfs.block.size to 32 Mb.AFAIK this value would be applicable only for new fi

Re: Performance test practices for hadoop jobs - capturing metrics

2011-11-15 Thread Bejoy Ks
Including hadoop common user group as well in loop. On Tue, Nov 15, 2011 at 1:01 PM, Bejoy Ks wrote: > Hi Experts > > I'm currently working out to incorporate a performance test plan > for a series of hadoop jobs.My entire application consists of map reduce, >

Re: Regarding loading a big XML file to HDFS

2011-11-21 Thread Bejoy Ks
Hi All I'm sharing my understanding here. Please correct me if I'm wrong (Uma and Michael). The explanation by Michael is the common working of map reduce programs I believe. Just take case of a common text file of size 96MB and if my HDFS block size is 64 MB then this fil

Re: Issue with DistributedCache

2011-11-24 Thread Bejoy Ks
Hi Denis Unfortunately the mailing lists strips off attachments, So it'd be great if you could paste the source in some location and share the url of the same. If the source is small enough then please include the same in subject body. For a quick comparison, Try comparing your code with t

Re: MapReduce Examples

2011-11-24 Thread Bejoy Ks
Hi Hans-Peter I can just help you out with a different example than the conventional Word Count but not really sophisticated and real time. This just a cooked up scenario for demonstrating the use of Distributed Cache. http://kickstarthadoop.blogspot.com/2011/05/hadoop-for-dependent-data

Re: Issue with DistributedCache

2011-11-24 Thread Bejoy Ks
Hi Denis I tried your code with out distributed cache locally and it worked fine for me. Please find it at http://pastebin.com/ki175YUx I echo Mike's words in submitting a map reduce jobs remotely. The remote machine can be your local PC or any utility server as Mike specified. What you nee

Re: Issue with DistributedCache

2011-11-24 Thread Bejoy Ks
t.n...@googlemail.com> wrote: > Hi, > > a typo? > import com.bejoy.sampels.worcount.WordCountDriver; > = wor_d_count ? > > - alex > > On Thu, Nov 24, 2011 at 3:45 PM, Bejoy Ks wrote: > > > Hi Denis > > I tried your code with out distributed cache locally

Re: Multiple reducers

2011-11-29 Thread Bejoy Ks
Hi Hoot You can specify the number of reducers explicitly using -D mapred.reduce.tasks=n. hadoop jar wordcount.jar com.wc.WordCount –D mapred.reduce.tasks=n /input /output Currenty your word count is triggering just 1 reducer because the defaukt value of mapred.reduce.tasks woulld be set as 1 in

Re: Multiple Mappers for Multiple Tables

2011-12-05 Thread Bejoy Ks
Justin If I get your requirement right you need to get in data from multiple rdbms sources and do a join on the same, also may be some more custom operations on top of this. For this you don't need to go in for writing your custom mapreduce code unless it is that required. You can achieve t

Re: Multiple Mappers for Multiple Tables

2011-12-05 Thread Bejoy Ks
h use cases. If you have Pig competency you can also look into pig instead of hive. Hope it helps!... Regards Bejoy.K.S On Tue, Dec 6, 2011 at 1:36 AM, Bejoy Ks wrote: > Justin > If I get your requirement right you need to get in data from > multiple rdbms sources and do a jo

Re: Pig Output

2011-12-05 Thread Bejoy Ks
Hi Aaron Instead of copyFromLocal use getmerge. It would do your job. The syntax for CLI is hadoop fs -getmerge /xyz.txt Hope it helps!... Regards Bejoy.K.S On Tue, Dec 6, 2011 at 1:57 AM, Aaron Griffith wrote: > Using PigStorage() my pig script output gets put into partial files on

Re: MAX_FETCH_RETRIES_PER_MAP (TaskTracker dying?)

2011-12-05 Thread Bejoy Ks
Hi Chris From the stack trace, it looks like a JVM corruption issue. It is a known issue and have been fixed in CDH3u2, i believe an upgrade would solve your issues. https://issues.apache.org/jira/browse/MAPREDUCE-3184 Then regarding your queries,I'd try to help you out a bit.In mapreduce

Re: Running a job continuously

2011-12-05 Thread Bejoy Ks
Burak If you have a continuous inflow of data, you can choose flume to aggregate the files into larger sequence files or so if they are small and when you have a substantial chunk of data(equal to hdfs block size). You can push that data on to hdfs based on your SLAs you need to schedule you

Re: Where do i see Sysout statements after building example ?

2011-12-13 Thread Bejoy Ks
Adding on to Harsh's response. If your Sysouts are on mapper or reducer classes, you can get the same from from WebUI as well, http://:50030/jobtracker.jsp . You need to select your job and drill down to individual task level. Regards Bejoy.K.S On Tue, Dec 13, 2011 at 10:30 PM, Harsh J wrote:

Re: DistributedCache in NewAPI on 0.20.X branch

2011-12-16 Thread Bejoy Ks
Hi Shi Try out the following,it could get things working Use DistributedCache.getCacheFiles() instead of DistributedCache.getLocalCacheFiles() public void setup(JobConf job) { DistributedCache.getLocalCacheFiles(job) . . . } If that also doesn't seem to work and if you have

Re: DistributedCache in NewAPI on 0.20.X branch

2011-12-17 Thread Bejoy Ks
Hi Shi My Bad, the syntax i posted last time was not the right one , sorry was from my hand held @Override public void setup(Context context) { File file = new File("TestFile.txt"); . . . } I didn't get a chance to debug your code, but if you are looking for a working example

Re: How to create Output files of about fixed size

2011-12-21 Thread Bejoy Ks
Hi JJ If you use the default TextInputFormat, it wont do the job as it would generate at least one split for each file. So in your case there would be a min of 78 splits as there are that many input files and 78 mappers and hence same 78 output files. You need to use CombineFileInputFormat

Re: has bzip2 compression been deprecated?

2012-01-09 Thread Bejoy Ks
Hi Tony Adding on to Harsh's comments. If you want the generated sequence files to be utilized by a hive table. Define your hive table as CREATE EXTERNAL TABLE tableNAme(col1 INT, c0l2 STRING) ... ... STORED AS SEQUENCEFILE; Regards Bejoy.K.S On Mon, Jan 9, 2012 at 10:32 PM, alo.alt

Re: increase number of map tasks

2012-01-09 Thread Bejoy Ks
Hi Satish What is your value for mapred.max.split.size? Try setting these values as well mapred.min.split.size=0 (it is the default value) mapred.max.split.size=40 Try executing your job once you apply these changes on top of others you did. Regards Bejoy.K.S On Mon, Jan 9, 2012 at 10:16 P

Re: has bzip2 compression been deprecated?

2012-01-09 Thread Bejoy Ks
e > latter, how would I read a sequence file into a Hive table? > > Thanks, > > Tony > > > > > -Original Message- > From: Bejoy Ks [mailto:bejoy.had...@gmail.com] > Sent: 09 January 2012 17:33 > To: common-user@hadoop.apache.org > Subject: Re: has bzi

Re: has bzip2 compression been deprecated?

2012-01-10 Thread Bejoy Ks
Hi Tony Please find responses inline So, to summarise: when I CREATE EXTERNAL TABLE in Hive, the STORED AS, ROW FORMAT and other parameters you mention are telling Hive what to expect when it reads the data I want to analyse, despite not checking the data to see if it meets these criteria?

Re: how to specify class name to run in mapreduce job

2012-01-10 Thread Bejoy Ks
Hi Vinod You can use the format as hadoop jar Like -> hadoop jar /home/user/sample.jar com.sample.apps.MainClass .. Don't specify the main class while packing your jar. This would help you incorporate multiple entry points in same jar for different functionality. Hope it helps!..

Re: Processing compressed files in Hadoop

2012-02-08 Thread Bejoy Ks
Hi Leo You can index the LZO files as //Run theLZO indexer on files in hdfs LzoIndexer indexer = new LzoIndexer(fs.getConf()); indexer.index(filePath); Regards Bejoy.K.S On Wed, Feb 8, 2012 at 11:26 PM, Tim Broberg wrote: > Leo, splittable bzip is available > ...in versions > 0.21 - ht

Re: Writing small files to one big file in hdfs

2012-02-21 Thread Bejoy Ks
Mohit Rather than just appending the content into a normal text file or so, you can create a sequence file with the individual smaller file content as values. Regards Bejoy.K.S On Tue, Feb 21, 2012 at 10:45 PM, Mohit Anchlia wrote: > We have small xml files. Currently I am planning to app

Re: Writing small files to one big file in hdfs

2012-02-21 Thread Bejoy Ks
PM, Mohit Anchlia wrote: > On Tue, Feb 21, 2012 at 9:25 AM, Bejoy Ks wrote: > > > Mohit > > Rather than just appending the content into a normal text file or > > so, you can create a sequence file with the individual smaller file > content > > as values. &

Re: how to get rid of -libjars ?

2012-03-06 Thread Bejoy Ks
Hi Jane + Adding on to Joey's comments If you want to eliminate the process of distributing the dependent jars every time, then you need to manually pre-distribute these jars across the nodes and add them on to the classpath of all nodes. This approach may be chosen if you are periodically runnin

Re: does hadoop always respect setNumReduceTasks?

2012-03-09 Thread Bejoy Ks
k because i don't know if there is a gotcha like the combiner (where it may or may not run at all). Yes it is guaranteed. Combiner comes to place only if you specify one,else no combiner is used by default. Regards Bejoy KS On Fri, Mar 9, 2012 at 8:08 AM, Lance Norskog wrote: > Instead

Re: SequenceFile split question

2012-03-15 Thread Bejoy Ks
Hi Mohit If you are using a stand alone client application to do the same definitely there is just one instance of the same running and you'd be writing the sequence file to one hdfs block at a time. Once it reaches hdfs block size the writing continues to next block, in the mean time the fir

Re: Best practice to setup Sqoop,Pig and Hive for a hadoop cluster ?

2012-03-15 Thread Bejoy Ks
Hi Manu Please find my responses inline >I had read about we can install Pig, hive & Sqoop on the client node, no need to install it in cluster. What is the client node actually? Can I use my management-node as a client? On larger clusters we have different node that is out of hadoop cluste

Re: SequenceFile split question

2012-03-15 Thread Bejoy Ks
one of the options I should consider? > > On Thu, Mar 15, 2012 at 2:15 AM, Bejoy Ks wrote: > > > Hi Mohit > > If you are using a stand alone client application to do the same > > definitely there is just one instance of the same running and you'd be > > writ

Re: Hadoop 1.0.0 - stuck at map reduce

2012-03-16 Thread Bejoy Ks
Hey Bob Looks like you don't have the map and reduce slots configured on your system it is pointing to zero. In your mapred-site.xml set mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum . Also restart the task tracker daemon and try again Regards Bejoy On F

Re: Not able to Make directory in HDFS

2012-03-16 Thread Bejoy Ks
Hi Sujit Strange. Is it throwing any errors? If so please attach the console log. Regards Bejoy.K.S On Fri, Mar 16, 2012 at 11:23 PM, Sujit Dhamale wrote: > Hi, > i am not able to create Directory in Hadoop file system > > while putting file only Directory is created but file is not copying

Re: _temporary doesn't exist

2012-03-16 Thread Bejoy Ks
Hi Vipul Is there any reason you are with 0.17 version of hadoop? It is a pretty old version of hadoop (more than 2 years back) and tons of bug fixes and optimizations have gone into trunk after the same. You should badly upgrade to any 1.0.X releases. It would be hard for any one on the list

Re: _temporary doesn't exist

2012-03-16 Thread Bejoy Ks
porary folder before the task is finished. > the clean up should happen at the task completion If I am not wrong. > Correct me If I am wrong. I am new to hadoop. > Thank you. > -Vipul > > On Fri, Mar 16, 2012 at 12:09 PM, Bejoy Ks wrote: > > > Hi Vipul > > Is the

Re: how to implements the 'diff' cmd in hadoop

2012-03-20 Thread Bejoy Ks
to determine the number of non matching patterns and write those patterns to output file . If the values matches for a key do nothing, no need even writing to output dir. Regards Bejoy KS On Tue, Mar 20, 2012 at 2:01 PM, botma lin wrote: > Hi, all > > I'm newbie to hadoop

Re: how to implements the 'diff' cmd in hadoop

2012-03-20 Thread Bejoy Ks
which all files then the value from mapper could be prefixed with some value like file name. Regards Bejoy KS 2012/3/20 botma lin > Thanks Bejoy, that makes sense . > > If I want to know the different record's original file, I need to > put an extra file id into the mapp

Re: setNumTasks

2012-03-22 Thread Bejoy Ks
e Job based on the splits and Input Format. Regards Bejoy KS On Thu, Mar 22, 2012 at 8:28 PM, Mohit Anchlia wrote: > Sorry I meant *setNumMapTasks. *What is mapred.map.tasks for? It's > confusing as to what it's purpose is for? I tried setting it for my job > still I see mor

Re: tasktracker/jobtracker.. expectation..

2012-03-22 Thread Bejoy Ks
Hi Patai JobTracker automatically handles this situation by attempting the task on different nodes.Could you verify the number of attempts that these failed tasks made. Was that just one? If more whether all the task attempts were triggered on the same node or not? Did all of them fail with th

Re: Number of retries

2012-03-22 Thread Bejoy KS
there a way to look at which tasks were retried? I am not sure what else might cause because when I look at the output data I don't see any duplicates in the file. Regards Bejoy KS Sent from handheld, please excuse typos.

Re: Number of retries

2012-03-22 Thread Bejoy KS
Hi Mohit To add on, duplicates won't be there if your output is written to a hdfs file. Because if one attempt of a task is completed only that output file is copied to the final output destn and the files generated by other task attempts that are killed are just ignored. Regards Bej

Re: Zero Byte file in HDFS

2012-03-26 Thread Bejoy KS
Hi Abshikek I can propose a better solution. Enable merge in hive. So that the smaller files would be merged to at lest the hdfs block size(your choice) and would benefit subsequent hive jobs on the same. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message

Re: Zero Byte file in HDFS

2012-03-26 Thread Bejoy KS
merges the smaller files into larger ones. The intermediate output files are not retained in hive and hence the final large enough files remains in hdfs. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Abhishek Pratap Singh Date: Mon, 26 Mar 2012 14

Re: Parts of a file as input

2012-03-26 Thread Bejoy KS
r use cases or the queries that are intended for the data set. If you are looking at sampling then you may need to incorporate Buckets as well. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Franc Carter Date: Tue, 27 Mar 2012 17:26:49 To: Reply-To: c

Re: 0 tasktrackers in jobtracker but all datanodes present

2012-04-02 Thread Bejoy Ks
small files, use HAR or Sequence File for grouping the same - Increase the NN heap Regards Bejoy KS On Mon, Apr 2, 2012 at 12:08 PM, madhu phatak wrote: > Hi, > 1. Stop the job tracker and task trackers. - bin/stop-mapred.sh > > 2. Disable namenode safemode - bin/hadoop dfsadm

Re: how to fine tuning my map reduce job that is generating a lot of intermediate key-value pairs (a lot of I/O operations)

2012-04-03 Thread Bejoy Ks
Jane, From my first look, properties that can help you could be - Increase io sort factor to 100 - Increase io.sort.mb to 512Mb - increase map task heap size to 2GB. If the task still stalls, try providing lesser input for each mapper. Regards Bejoy KS On Tue, Apr 3, 2012 at 2:08 PM

Re: reducers and data locality

2012-04-27 Thread Bejoy KS
Hi Mete A custom Paritioner class can control the flow of keys to the desired reducer. It gives you more control on which key to which reducer. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: mete Date: Fri, 27 Apr 2012 09:19:21 To: Reply-To

Re: updating datanode config files on namenode recovery

2012-05-01 Thread Bejoy KS
Hi Sumadhur, The easier approach is to make the hostname of the new NN same as the old one, else you'll have to update the new one on config files across cluster. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: sumadhur Date: Tue, 1 May 2012

Re: set the mapred.map.tasks.speculative.execution=false, but it is not useful.

2012-06-12 Thread Bejoy KS
Hi If your intension is controlling the number of attempts every task make, then the property to be tweaked is mapred.map.max.attempts The default value is 4, for no map task re attempts make it 1. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From

Re: [Newbie] How to make Multi Node Cluster from Single Node Cluster

2012-06-14 Thread Bejoy KS
You can follow the documents for 0.20.x . It is almost same for 1.0.x as well. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Alpha Bagus Sunggono Date: Thu, 14 Jun 2012 17:15:16 To: Reply-To: common-user@hadoop.apache.org Subject: Re: [Newbie

Re: Setting number of mappers according to number of TextInput lines

2012-06-16 Thread Bejoy KS
very small input size (kB), but processing to produce some output takes several minutes. Is there a way how to say, file has 100 lines, i need 10 mappers, where each mapper node has to process 10 lines of input file? Thanks for advice. Ondrej Klimpera Regards Bejoy KS Sent from handheld

Re: change hdfs block size for file existing on HDFS

2012-06-26 Thread Bejoy KS
Regards Bejoy KS Sent from handheld, please excuse typos.

Re: change hdfs block size for file existing on HDFS

2012-06-26 Thread Bejoy Ks
Hi Anurag, To add on, you can also change the replication of exiting files by hadoop fs -setrep http://hadoop.apache.org/common/docs/r0.20.0/hdfs_shell.html#setrep On Tue, Jun 26, 2012 at 7:42 PM, Bejoy KS wrote: > Hi Anurag, > > The easiest option would be , in your map reduce jo

Re: Error using MultipleInputs

2012-07-05 Thread Bejoy Ks
Hi Sanchita Try your code after commenting the following Line of code, //conf.setInputFormat(TextInputFormat.class); AFAIK This explicitly sets the input format as TextInputFormat instead of MultipleInput and hence the compiler throws an error stating 'no input path specified'. Reg

Re: Hive/Hdfs Connector

2012-07-05 Thread Bejoy KS
Regards Bejoy KS Sent from handheld, please excuse typos.

Re: fail and kill all tasks without killing job.

2012-07-20 Thread Bejoy KS
Hi Jay Did you try hadoop job -kill-task ? And is that not working as desired? Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: jay vyas Date: Fri, 20 Jul 2012 17:17:58 To: common-user@hadoop.apache.org Reply-To: common-user@hadoop.apache.org

Re: Unexpected end of input stream (GZ)

2012-07-24 Thread Bejoy Ks
Hi Oleg >From the job tracker page, you can get to the failed tasks and see which was the file split processed by that task. The split information is available under the status column for each task. The file split information is not available on job history. Regrads Bejoy KS On Tue, Jul

Re: Hadoop 1.0.3 start-daemon.sh doesn't start all the expected daemons

2012-07-27 Thread Bejoy Ks
Hi Dinesh Try using $HADOOP_HOME/bin/start-all.sh . It starts all the hadoop daemons including TT and DN. Regards Bejoy KS

Re: No Support for setting MapFileOutputFormat in newer API

2012-07-27 Thread Bejoy Ks
HI Abhinav MapFileOutputFormat is currently not available for the new mapreduce API in hadoop 1.x . However a jira is in place to accommodate it in the future releases. https://issues.apache.org/jira/browse/MAPREDUCE-4158 Regards Bejoy KS

Re: Retrying connect to server: localhost/127.0.0.1:9000.

2012-07-27 Thread Bejoy KS
Hi Keith Your NameNode is not up still. What does the NN logs say? Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: anil gupta Date: Fri, 27 Jul 2012 11:30:57 To: Reply-To: common-user@hadoop.apache.org Subject: Re: Retrying connect to server

Re: Merge Reducers Output

2012-07-30 Thread Bejoy KS
Hi Why not use 'hadoop fs -getMerge ' while copying files out of hdfs for the end users to consume. This will merge all the files in 'outputFolderInHdfs' into one file and put it in lfs. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Messag

Re: java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.LongWritable, recieved org.apache.hadoop.io.Text

2012-08-02 Thread Bejoy Ks
); job.setMapOutputValueClass(Text.class); //setting the final reduce output data type classes job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); Regards Bejoy KS

Re: Disable retries

2012-08-02 Thread Bejoy KS
two steps you can ensure that a task is attempted only once. These properties to be set in mapred-site.xml or at job level. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Marco Gallotta Date: Thu, 2 Aug 2012 16:52:00 To: Reply-To: common-user

Re: Problem with hadoop filesystem after restart cluster

2012-08-08 Thread Bejoy Ks
Hi Andy Is your hadoop.tmp.dir or dfs.name.dir configured to /tmp? If so it can happen as /tmp dir gets wiped out on OS restarts Regards Bejoy KS >

Re: help in distribution of a task with hadoop

2012-08-13 Thread Bejoy Ks
er in your jar 2) use -libjars / -files to distribute jars or files Regards Bejoy KS

Re: Using hadoop for analytics

2012-09-05 Thread Bejoy Ks
Hi Prashant Welcome to Hadoop Community. :) Hadoop is meant for processing large data volumes. Saying that, for your custom requirements you should write your own mapper and reducer that contains your business logic for processing the input data. Also you can have a look at hive and pig, which ar

Re: Add file to distributed cache

2012-10-01 Thread Bejoy KS
to distributed cache Sent: Oct 2, 2012 05:44 Hi all How do you add a small file to distributed cache in MR program Regards Abhi Sent from my iPhone Regards Bejoy KS Sent from handheld, please excuse typos.

Re: Hadoop installation on mac

2012-10-16 Thread Bejoy KS
) Regards Bejoy KS -- View this message in context: http://hadoop-common.472056.n3.nabble.com/Hadoop-installation-on-mac-tp3999520p3999535.html Sent from the Users mailing list archive at Nabble.com.

Hive and Hbase not working with cloudera VM

2011-09-08 Thread Bejoy KS
Hi I was using cloudera training VM to test out my map reduce codes which was working really well. Now i do have some requirements to run hive,hbase,Sqoop as well on on this VM for testing purposes. For hive and hbase I'm able to log in on to the cli client, but none of the commands are get

Hadoop Streaming job Fails - Permission Denied error

2011-09-12 Thread Bejoy KS
Hi I wanted to try out hadoop steaming and got the sample python code for mapper and reducer. I copied both into my lfs and tried running the steaming job as mention in the documentation. Here the command i used to run the job hadoop jar /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streami

Re: Out of heap space errors on TTs

2011-09-19 Thread Bejoy KS
John, Did you try out map join with hive? It uses the Distributed Cache and hash maps to achieve the goal. set hive.auto.convert.join = true; I have* *tried the same over joins involving huge tables and a few smaller tables.My smaller tables where less than 25MB(configuration tables) and It

Maintaining map reduce job logs - The best practices

2011-09-23 Thread Bejoy KS
Hi All I do have a query here on maintaining Hadoop map-reduce logs. In default the logs appear in respective task tracker nodes which you can easily drill down from the job tracker web UI at times of any failure.(Which I was following till now) . Now I need to get into the next level