Re: Job tracker history not visible

2012-09-17 Thread Mapred Learn
Found the issue. For people interested, the issue was: /mapred/history/done directory was missing. Thanks, JJ Sent from my iPhone On Sep 16, 2012, at 11:03 AM, Mapred Learn wrote: > Hi, > When I go to job tracker web ui and click on job tracker history, I see no > history, >

Job tracker history not visible

2012-09-16 Thread Mapred Learn
Hi, When I go to job tracker web ui and click on job tracker history, I see no history, Looks like some dir on Hdfs is missing where jobs' history go and JT pulls it from. Does anyone know which dir is it and what permissions ? Thanks, JJ Sent from my iPhone

Re: how to set huge memory for reducer in streaming

2012-07-29 Thread Mapred Learn
mem > limit via mapred.child.ulimit (try setting it to 2x or 3x the heap > size, in KB, or higher). I think its the latter you're running out > with, since there's a subprocess involved. > > Let us know if that helps. > > On Sun, Jul 29, 2012 at 1:47 PM, Mapred Learn wrote: >

Re: how to set huge memory for reducer in streaming

2012-07-29 Thread Mapred Learn
+ CDH users Sent from my iPhone On Jul 29, 2012, at 1:17 AM, Mapred Learn wrote: > hi, > One of my programs create a huge python dictionary and reducers fails with > Memory Error everytime. > > Is there a way to specify reducer memory to be a bigger value for reducers to >

how to set huge memory for reducer in streaming

2012-07-29 Thread Mapred Learn
hi, One of my programs create a huge python dictionary and reducers fails with Memory Error everytime. Is there a way to specify reducer memory to be a bigger value for reducers to succeed ? I know we shuold not have this requirement in first place and not cerate this kind of dictionary, but stil

delete _logs dir ?

2012-06-22 Thread Mapred Learn
Hi, I see that eveyr job has _logs dir that has history dir taking 1 block. Is it safe to delete such _logs directories as we hv lot of them ? Thanks, JJ

Re: how to access a mapper counter in reducer

2011-12-06 Thread Mapred Learn
Hi Praveen, Could you share here so that we can use ? Thanks, Sent from my iPhone On Dec 6, 2011, at 6:29 AM, Praveen Sripati wrote: > Robert, > > > I have made the above thing work. > > Any plans to make it into the Hadoop framework. There had been similar > queries about it in other forum

Re: how to access a mapper counter in reducer

2011-12-01 Thread Mapred Learn
Hi, I have a similar query. Infact, I sent it yesterday and waiting for anybody's response who might have done it. Thanks, Anurag Tangri 2011/11/30 rabbit_cheng > I have created a counter in mapper to count something, I wanna get the > counter's value in reducer phase, the code segment is as

How to use mapper counters in reducer

2011-11-30 Thread Mapred Learn
Hi, I m defining custom counters in mapper that I want to access in reducer in new API. Does anyone know how to do this ? Thanks, JJ Sent from my iPhone

Re: how to implement error thresholds in a map-reduce job ?

2011-11-16 Thread Mapred Learn
comparator tweaks). > > But, also good to fail if a single map task itself exceeds > 10. The above is > to ensure the global check, while doing this as well would ensure faster > failure depending on the situation. > > On 16-Nov-2011, at 1:16 AM, Mapred Learn wrote: >

Re: how to implement error thresholds in a map-reduce job ?

2011-11-15 Thread Mapred Learn
ve it fail the job faster. > > Is killing the job immediately a necessity? Why? > > I s'pose you could call kill from within the mapper, but I've never seen > that as necessary in any situation so far. Whats wrong with letting the job > auto-die as a result of a failing task?

Re: how to implement error thresholds in a map-reduce job ?

2011-11-15 Thread Mapred Learn
ries. > > In the mapper, you parse each line, and use the result to update the > counter. > > ** ** > > -Mingxi > > ** ** > > *From:* Mapred Learn [mailto:mapred.le...@gmail.com] > *Sent:* Monday, November 14, 2011 3:06 PM > *To:* mapreduce-user@h

Re: how to implement error thresholds in a map-reduce job ?

2011-11-15 Thread Mapred Learn
PM, Mapred Learn wrote: > >> Hi, >> >> I have a use case where I want to pass a threshold value to a map-reduce >> job. For eg: error records=10. >> >> I want map-reduce job to fail if total count of error_records in the job >> i.e. all mappers, is reached.

how to implement error thresholds in a map-reduce job ?

2011-11-14 Thread Mapred Learn
Hi, I have a use case where I want to pass a threshold value to a map-reduce job. For eg: error records=10. I want map-reduce job to fail if total count of error_records in the job i.e. all mappers, is reached. How can I implement this considering that each mapper would be processing some part

Re: Example of chain map red

2011-10-28 Thread Mapred Learn
> http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security/src/test/org/apache/hadoop/mapred/jobcontrol/TestJobControl.java > > On Fri, Oct 28, 2011 at 8:03 PM, Mapred Learn wrote: >> Hi, >> Could somebody point me to chained map- red example ? >> I

Example of chain map red

2011-10-28 Thread Mapred Learn
Hi, Could somebody point me to chained map- red example ? I m trying to run another map only job after a map red job. Thanks, JJ Sent from my iPhone

How to create Output files of about fixed size

2011-10-25 Thread Mapred Learn
Hi, I am trying to create output files of fixed size by using : -Dmapred.max.split.size=6442450812 (6 Gb) But the problem is that the input Data size and metadata varies and I have to adjust above value manually to achieve fixed size. Is there a way I can programmatically determine split size

Re: Streaming jar creates only 1 reducer

2011-10-22 Thread Mapred Learn
Hi Arun, Thanks ! I was thinking streaming jar would do that itself but looks like not. Sent from my iPhone On Oct 21, 2011, at 11:46 PM, Arun C Murthy wrote: > You can also use -numReduceTasks <#reduces> option to streaming. > > On Oct 21, 2011, at 10:22 PM, Mapred Learn wrot

Re: Streaming jar creates only 1 reducer

2011-10-21 Thread Mapred Learn
t; You need to pass -Dmapred.reduce.tasks=N along. Reducers are a per-job > configurable number, unlike mappers whose numbers can be determined based on > inputs. > > P.s. Please do not cross post questions to multiple lists. > > On 22-Oct-2011, at 4:05 AM, Mapred Learn wrote: > >&g

Re: Streaming jar creates only 1 reducer

2011-10-21 Thread Mapred Learn
s on the submitting node. > > Nick Jones > > On Oct 21, 2011, at 5:00 PM, Mapred Learn wrote: > >> Hi, >> Does streaming jar create 1 reducer by default ? We have reduce tasks per >> task tracker configured to be more than 1 but my job has about 150 mappers >

Streaming jar creates only 1 reducer

2011-10-21 Thread Mapred Learn
Hi, Does streaming jar create 1 reducer by default ? We have reduce tasks per task tracker configured to be more than 1 but my job has about 150 mappers and only 1 reducer: reducer.py basically just reads the line and prints it. Why doesn't streaming.jar invokes multiple reducers for this case ?

Re: sample usage of custom counters with new map Reduce API

2011-07-29 Thread Mapred Learn
; throws IOException, InterruptedException { >context.getCounter(MyCounters.INPUT_UNIQUE_USERS).increment(1); >doReduce(); > } > > @Override > protected void cleanup(Context context) throws java.io.IOException, > InterruptedException { > doClean(); &

Re: Can you access Distributed cache in custom output format ?

2011-07-29 Thread Mapred Learn
used the -files option (I don't know if it will copy the > files to HDFS for your or you have to put them there first). > > My usage pattern of the DC is copying the files to HDFS, then use the DC > API to add those files to the jobconf. > > Alejandro > > > On Fri

Re: Can you access Distributed cache in custom output format ?

2011-07-29 Thread Mapred Learn
n are you adding the file/JAR to the DC? > How are you retrieving the file/JAR from your outputformat code? > > Thxs. > > Alejandro > > > On Fri, Jul 29, 2011 at 10:43 AM, Mapred Learn wrote: > >> I am trying to create a custom text outputformat where I want to

Can you access Distributed cache in custom output format ?

2011-07-29 Thread Mapred Learn
Hi, I am trying to access distributed cache in my custom output format but it does not work and file open in custom output format fails with file does not exist even though it physically does. Looks like distributed cache only works for Mappers and Reducers ? Is there a way I can read Distributed

Loading seq file into hive

2011-06-27 Thread Mapred Learn
Hi, I have seq files with key as line number and value is ctrl B delimited text. a sample value is: 45454^B567^Brtrt^B-7.8 56577^B345^Bdrtd^B-0.9 when I create a table like: create table temp_seq (no. int, code string, rank string, amt string) row format delimited fields terminated by '\002' lines

Re: Resend -> how to load sequence file with decimal data

2011-06-27 Thread Mapred Learn
te command ? On Fri, Jun 24, 2011 at 5:12 PM, Steven Wong wrote: > Not sure if this is what you’re asking for: Hive has a LOAD DATA command. > There is no decimal data type. > > ** ** > > ** ** > > *From:* Mapred Learn [mailto:mapred.le...@gmail.com] > *Sent:* Thur

Re: sample usage of custom counters with new map Reduce API

2011-06-24 Thread Mapred Learn
t;context.getCounter(MyCounters.INPUT_UNIQUE_USERS).increment(1); >doReduce(); > } > > @Override > protected void cleanup(Context context) throws java.io.IOException, > InterruptedException { > doClean(); > } > } > > On Fri, Jun 24, 2011 at 2:

sample usage of custom counters with new map Reduce API

2011-06-24 Thread Mapred Learn
Hi, Could anyone point me to an example of custom counters with new map reduce API ? Thanks, -JJ

Re: how to identify inputsplit that a mapper is failing on ?

2011-06-24 Thread Mapred Learn
Hi Ian, i think input split locations on WebGUI are the block locations in the cluster where input file is distributed while uploading to HDFS, not the physical split being processed by the mapper. On Fri, Jun 24, 2011 at 10:37 AM, Mapred Learn wrote: > Hi Ian, > I do not see the sp

Re: how to identify inputsplit that a mapper is failing on ?

2011-06-24 Thread Mapred Learn
Hi Ian, I do not see the split in mapper GUI I was avoiding logging in from mapper but looks like only option Thanks for the response ! Sent from my iPhone On Jun 24, 2011, at 10:28 AM, Ian Wrigley wrote: > Hi > > On Jun 24, 2011, at 10:19 AM, Mapred Learn wrote: > >>

how to identify inputsplit that a mapper is failing on ?

2011-06-24 Thread Mapred Learn
Hi, One of the mappers in my job is failing and I cannot find out the input split that it is failing on. How can I find this info ? Thanks, -JJ

Resend -> how to load sequence file with decimal data

2011-06-23 Thread Mapred Learn
> Hi, > I have a sequence file where The value is text with delimited data and some > fields are decimal fields. > For eg: decimal(16,6). Sample value : 123.456735. > How do I upload such a sequence file in hive and what shud I give in table > definition for decimal values as above ? > > Thanks

can we split a big gzipped file on HDFS ?

2011-06-22 Thread Mapred Learn
Hi, If I have a big gzipped text file (~ 60 GB) in HDFS, can i split it into smaller chunks (~ 1 GB) so that I can run a map-red job on those files and finish faster than running job on 1 big file ? Thanks, -JJ

Fwd: How to load a sequence file with decimal data to hive ?

2011-06-22 Thread Mapred Learn
In case anybody has some inputs : Sent from my iPhone Begin forwarded message: > From: Mapred Learn > Date: June 22, 2011 6:21:03 PM PDT > To: "u...@hive.apache.org" > Subject: How to load a sequence file with decimal data to hive ? > > Hi, > I have a seque

Re: how to get output files of fixed size in map-reduce job output

2011-06-22 Thread Mapred Learn
B read can't be guaranteed local (theoretically speaking). > > On Thu, Jun 23, 2011 at 12:04 AM, Mapred Learn > wrote: > > Hi Harsh, > > Thanks ! > > i) I was currently doing it by extending CombineFileInputFormat and > > specifying -Dmapred.max.split.size but

Re: how to get output files of fixed size in map-reduce job output

2011-06-22 Thread Mapred Learn
parameter > until the results are satisfactory. > > Note: Tasks would no longer be perfectly data local since you're > requesting much > block size perhaps. > > On Wed, Jun 22, 2011 at 10:52 PM, Mapred Learn > wrote: > > I have a use case where I want to proce

how to get output files of fixed size in map-reduce job output

2011-06-22 Thread Mapred Learn
I have a use case where I want to process data and generate seq file output of fixed size , say 1 GB i.e. each map-reduce job output should be 1 Gb. Does anybody know of any -D option or any other way to achieve this ? -Thanks JJ

Re: AW: How to split a big file in HDFS by size

2011-06-20 Thread Mapred Learn
gt; across the cluster. > > Regards, > Christoph > > -Ursprüngliche Nachricht- > Von: Mapred Learn [mailto:mapred.le...@gmail.com] > Gesendet: Montag, 20. Juni 2011 08:36 > An: mapreduce-user@hadoop.apache.org > Betreff: Re: How to split a big file in HDFS by size

Re: how to change default name of a sequnce file

2011-06-19 Thread Mapred Learn
Another question here is in getDefaultWorkFile() is that, how is it possible to find out the mapper number that is used in output. For eg, if you have 30 mappers, how can I add to output file( ) of 30th mapper - _30 ? On Sun, Jun 19, 2011 at 11:19 PM, Mapred Learn wrote: > Thanks ! > I wi

Re: How to split a big file in HDFS by size

2011-06-19 Thread Mapred Learn
a single hard drive, > reading in parallel might actually be slower than reading serially because > it means a lot of random disk accesses. > > Regards, > Christoph > > -Ursprüngliche Nachricht- > Von: Mapred Learn [mailto:mapred.le...@gmail.com] > Gesendet: Monta

Re: how to change default name of a sequnce file

2011-06-19 Thread Mapred Learn
t;return new Path(committer.getWorkPath(), > myOwnMethodToComputeTheFileName(context)); >} > } > > Regards, > > Christoph > > -Ursprüngliche Nachricht- > Von: Mapred Learn [mailto:mapred.le...@gmail.com] > Gesendet: Montag, 20. Juni 2011 06:59

how to change default name of a sequnce file

2011-06-19 Thread Mapred Learn
Hi, I want to name output files of my map-red job (sequence files) to be a certain name instead of part* default format. Has anyone ever tried to over-ride the default filename and give output file name per map-red ? Thanks, -JJ

How to split a big file in HDFS by size

2011-06-19 Thread Mapred Learn
Hi, I am trying to upload text files in size 60 GB or more. I want to split these files into smaller files of say 1 GB each so that I can run further map-red jobs on it. Anybody has any idea how can I do this ? Thanks a lot in advance ! Any ideas are greatly appreciated ! -JJ

hadoos -lsr does not show right status

2011-06-16 Thread Mapred Learn
Hi, I am trying to do a copyFromLocal to HDFS. But when I do fs -lsr, I always see something like: bash-3.2$ hadoop fs -lsr /user/cloudera/input1 -rw-r--r-- 3 cloudera supergroup 0 2011-06-16 23:30 /user/cloudera/input1/out_0.seq I run above every few seconds and still same output. Seei

Re: Delimiter selection for Sequence Files

2011-06-15 Thread Mapred Learn
chosen. > > I've seen Hive take ascii and octal representations in its statements > for delimiters. You can use a hex value in your shell simply by > passing it as a literal. > > For ex., on Bash/ZSH I do: > $ echo $'\x1B' # For the 'escape' character.

Re: Delimiter selection for Sequence Files

2011-06-15 Thread Mapred Learn
If I use hex value of a delimiter as delimiter for eg. \x01 for ctrl A. Can I use it as a delimiter in hive/unix cut commands ? On Tue, Jun 14, 2011 at 7:10 AM, Mapred Learn wrote: > Thanks Joe fit the reply ! > "@@##@@" looks like a big value for a delimiter. > I will al

Re: mapred.child.java.opts question

2011-06-14 Thread Mapred Learn
s if you do `ps aux` > on the slave during the execution (but you need to catch the right time to > catch the execution). -- Alex K > > > On Tue, Jun 14, 2011 at 8:34 AM, Mapred Learn wrote: > Sorry about the last message. Here we go again: > > I am t

Re: mapred.child.java.opts question

2011-06-14 Thread Mapred Learn
opagate to all the task-trackers ? Thanks a lot in advance, -JJ On Tue, Jun 14, 2011 at 8:30 AM, Mapred Learn wrote: > Hi, > I am trying to pass this option with my >

mapred.child.java.opts question

2011-06-14 Thread Mapred Learn
Hi, I am trying to pass this option with my

Re: Delimiter selection for Sequence Files

2011-06-14 Thread Mapred Learn
wing > what to split on... well that is very related to your context > > you could JOIN map side a list of all possible characters with your data set > and then reduce output only characters not found and use that as your > delimiter.... who knows maybe you will find out that ~ i

Delimiter selection for Sequence Files

2011-06-13 Thread Mapred Learn
Hi, I was thinking of using CTRL A as delimiter but data that I am loading to Hadoop already has CTRL A in it. What are other good choices of delimiters that anybody might have used in this kind of scenario, considering that I also want to query this data using Hive. Thanks in advance -JJ

Re: How to send files to task trackers in a map-red job

2011-06-10 Thread Mapred Learn
Found the solution: Hadoop jar <> -conf <> -files Sent from my iPhone On Jun 9, 2011, at 11:06 PM, Mapred Learn wrote: > Resending, in case anybody has any inputs ::: > > > On Jun 9, 2011, at 5:35 PM, Mapred Learn wrote: > >> Hi, >> I have 2 files

Re: How to send files to task trackers in a map-red job

2011-06-09 Thread Mapred Learn
Resending, in case anybody has any inputs ::: On Jun 9, 2011, at 5:35 PM, Mapred Learn wrote: > Hi, > I have 2 files that I want to send to all tasktrackers during job execution. > I try something like: > hadoop jar abc.jar -conf -cacheFile > 'hdfs://:port/user/

How to send files to task trackers in a map-red job

2011-06-09 Thread Mapred Learn
Hi, I have 2 files that I want to send to all tasktrackers during job execution. I try something like: hadoop jar abc.jar -conf -cacheFile 'hdfs://:port/user/jj/dummy/abc.dat#abc' -cacheFile 'hdfs://:port/user/jj/dummy/abc.txt#abc1' But looks like I don't get second file to the task trackers and

how to add metadata to a sequence file output format in a map-red job

2011-06-08 Thread Mapred Learn
Hi, I am runnnig a map-red job, basically a map only job. I want to add metadata to every sequence file that is created as output. How can I do it ? I know that when you manually create a seq file writer you can add metadata as key,value pairs to it, but how can we do it in a map-red job ? Than

Re: Sequence file format in python and serialization

2011-06-02 Thread Mapred Learn
ite your python > mapper/reducer. The dumbo module handles the > serialization/deserialization to/from typedbytes to native python types. > > J > > On Thu, 2011-06-02 at 00:06 -0700, Mapred Learn wrote: >> Hi, >> I have a question regarding using sequence file in

Sequence file format in python and serialization

2011-06-02 Thread Mapred Learn
Hi, I have a question regarding using sequence file input format in hadoop streaing jar with mappers and reducers written in python. If i use sequence file as input format for streaming jar and use mappers written in python, can I take care of serialization and de-serialization in mapper/reducer c

Query regarding internal/working of hadoop fs -copyFromLocal and fs.write()

2011-05-31 Thread Mapred Learn
Hi guys, I asked this question earlier but did not get any response. So, posting again. Hope somebody can point to the right description: When you do hadoop fs -copyFromLocal or use API to call fs.write() (when Filesystem fs is HDFS), does it write to local filesystem first before writing to HDFS

Re: Are hadoop fs commands serial or parallel

2011-05-26 Thread Mapred Learn
pyFromLocal $FILE $DEST_PATH & > done > > If doing this via the Java API, then, yes you will have to use multiple > threads. > > On Wed, May 18, 2011 at 1:04 AM, Mapred Learn >wrote: > > > Thanks harsh ! > > That means basically both APIs as well as hadoop client com

Re: How to store an instance of a class in the Configuration?

2011-05-26 Thread Mapred Learn
I agree. Specially people like Harsh who are always there to answer everyone's queries ! On Wed, May 25, 2011 at 11:38 AM, Michael Giannakopoulos < miccagi...@gmail.com> wrote: > Thanks a lot! Your help was invaluable! Those guys like you, who answer to > anyone are heroes! Thanks mate! Hope to t

Re: how to use mapred.min.split.size option ?

2011-05-25 Thread Mapred Learn
Sorry it is working,, i was not giving right value with -Dmapred.max.split.size. Thanks for your help ! On Wed, May 25, 2011 at 11:34 AM, Mapred Learn wrote: > Hi Harsh, > I just implemented a combineFile InputFormat and its record reader for my > case. > > Now my input has 10 fi

Re: how to use mapred.min.split.size option ?

2011-05-25 Thread Mapred Learn
ap > task. > > On Wed, May 25, 2011 at 10:28 PM, Mapred Learn > wrote: > > I gave mapred.min.size=10L i.e. 1 GB and each input file is 233 > MB > > and block size = 64 MB. > > With all these values, i thought my split size would work and 4 input > files >

Re: how to use mapred.min.split.size option ?

2011-05-25 Thread Mapred Learn
MB file. On Wed, May 25, 2011 at 7:59 AM, Mapred Learn wrote: > Thanks Juwei ! > I will go through this.. > > Sent from my iPhone > > On May 25, 2011, at 7:51 AM, Juwei Shi wrote: > > The following are suitable for hadoop 0.20.2. > > 2011/5/25 Juwei Shi > >

Re: how to use mapred.min.split.size option ?

2011-05-25 Thread Mapred Learn
to refer to FileInputFormat.java for more details. > > > 2011/5/25 Mapred Learn > Resending > > > > > Hi, > > I have few input splits that are few MB in size. > > I want to submit 1 GB of input to every mapper. Does anyone know how can I > > do

Re: how to use mapred.min.split.size option ?

2011-05-25 Thread Mapred Learn
Resending > > Hi, > I have few input splits that are few MB in size. > I want to submit 1 GB of input to every mapper. Does anyone know how can I do > it ? > Currently each mapper gets one input split that results in many small > map-output files. > > I tried setting -Dmapred.map.min.spli

how to use mapred.min.split.size option ?

2011-05-24 Thread Mapred Learn
Hi, I have few input splits that are few MB in size. I want to submit 1 GB of input to every mapper. How can I do it ? Currently each mapper gets one input split that results in many small map-output files. I tried setting -Dmapred.map.min.split.size= , but still it does not take effect. Thanks,

Re: Custom input format query

2011-05-23 Thread Mapred Learn
rride > public Text getCurrentValue() { > return value; > } > > 2) do not make offset static > 3) nextKeyValue should read a single record not > while (offset < fileSize ) { ... > > On Thu, May 19, 2011 at 5:44 PM, Mapred Learn wrote: > >&

Custom input format query

2011-05-19 Thread Mapred Learn
Hi, I have implemented a custom record reader to read fixed length records. Pseudo code is as: class CRecordReader extends RecordReader { private FileSplit fileSplit; private Configuration conf; private int recordSize; private int fileSize; private int recordNum = 0; private FSDataInputStre

Re: Are hadoop fs commands serial or parallel

2011-05-17 Thread Mapred Learn
Hadoop. > > Doing copyFromLocal could write multiple files in parallel (I'm not > sure if it does or not), but a single file would be written serially. > > -Joey > > On Tue, May 17, 2011 at 5:44 PM, Mapred Learn wrote: >> Hi, >> My question is when I run a

Are hadoop fs commands serial or parallel

2011-05-17 Thread Mapred Learn
Hi, My question is when I run a command from hdfs client, for eg. hadoop fs -copyFromLocal or create a sequence file writer in java code and append key/values to it through Hadoop APIs, does it internally transfer/write data to HDFS serially or in parallel ? Thanks in advance, -JJ

Null pointer exception in Mapper initialization

2011-05-10 Thread Mapred Learn
Hi, I get error like: java.lang.NullPointerException at org.apache.hadoop.io .serializer.SerializationFactory.getSerializer(SerializationFactory.java:73) at org.apache.hadoop.mapred.MapTask$MapOut

Re: how to specify -Xmx option in a hadoop jar command ?

2011-04-28 Thread Mapred Learn
--config <> jar -D > mapred.child.java.opts=-Xmx1024M > > -s > > > On Thu, Apr 28, 2011 at 2:26 PM, Mapred Learn wrote: > >> Hi, >> I am runnnig a hadoop jar command as: >> hadoop --config <> jar -conf >> >> My question is how and where can I specify -Xmx option to increase heap >> assigned to my JVM ? >> >> Thanks in advance >> -JJ >> > >

how to specify -Xmx option in a hadoop jar command ?

2011-04-28 Thread Mapred Learn
Hi, I am runnnig a hadoop jar command as: hadoop --config <> jar -conf My question is how and where can I specify -Xmx option to increase heap assigned to my JVM ? Thanks in advance -JJ

Re: A way to monitor HDFS for a file to come live, and then kick off a job?

2011-03-25 Thread Mapred Learn
Does Oozie co-ordinator work ? Last time I tried it, it had lot of problems: i) job from start to end_timestamp were all being submitted at once not at actual wall clock time. ii) The links to all the jobs in a particular co-ordinator work-flow were not working i.e. you were not able to see the p

How to generate Value Classes at runtime for Sequence files

2011-03-11 Thread Mapred Learn
Hi, I have some types of data that I have to upload on HDFS as Sequence Files. Initially, I had thought of creating a .jr file at runtime depending on the type of schema and use rcc DDL tool by Hadoop to create these classes and use them. But looking at rcc documentation, I see that it has been d

rename() removed from FileSystem.java but not FsShell.java

2011-03-09 Thread Mapred Learn
Hi, I was trying to use rename() in FileSystem.java to mv files through my java code but found that it has been deprecated from FileSystem.java but not FsShell.java. Is there a particular reason for this ? Should I use FsShell.java's rename() instead or avoid it all together and implement "fs -mv

Re: how to use hadoop apis with cloudera distribution ?

2011-03-08 Thread Mapred Learn
, 2011 at 7:22 AM, Marcos Ortiz wrote: > On Tue, 2011-03-08 at 07:16 -080s, > > 0, Mapred Learn wrote: > > > > > Hi, > > > I downloaded CDH3 VM for hadoop but if I want to use something like: > > > > > > import org.apache.hadoop.conf.Configuration;

Re: how to use hadoop apis with cloudera distribution ?

2011-03-08 Thread Mapred Learn
> Hi, > I downloaded CDH3 VM for hadoop but if I want to use something like: > > import org.apache.hadoop.conf.Configuration; > > in my java code, what else do I need to do ? > > > Do i need to download hadoop from apache ? > > if yes, then what does cdh3 do ? > > if not, then where

how to use hadoop apis with cloudera distribution ?

2011-03-07 Thread Mapred Learn
Hi, I downloaded CDH3 VM for hadoop but if I want to use something like: import org.apache.hadoop.conf.Configuration; in my java code, what else do I need to do ? Do i need to download hadoop from apache ? if yes, then what does cdh3 do ? if not, then where can i find hadoop code on cdh VM ?

Re: TextInputFormat to SequenceFile Output format question

2011-02-28 Thread Mapred Learn
On Sat, Feb 26, 2011 at 7:22 AM, Mapred Learn > wrote: > > Hi guys, > > If I have a text file of 10 GB and I want to convert it to sequence file > > using map-reduce and make filesplits of 1 GB each so that 10 mappers work > in > > parallel on it and convert it to Seque

TextInputFormat to SequenceFile Output format question

2011-02-25 Thread Mapred Learn
Hi guys, If I have a text file of 10 GB and I want to convert it to sequence file using map-reduce and make filesplits of 1 GB each so that 10 mappers work in parallel on it and convert it to Sequence file output. Can I combine these 10 mapper outputs into 1 sequence file of 10 GB size in reduce st

Re: Change the storage directory.

2011-02-24 Thread Mapred Learn
Did you try running the stop and start scripts ? On Thu, Feb 24, 2011 at 4:32 PM, real great.. wrote: > Hi, > As i guess, Hadoop creates the default dfs in temp directory. > I tried changing it by editing the hdfs-site.xml to: > ?xml version="1.0"?> > > > > > > > dfs.replication > 2

Re: Sequence File usage queries

2011-02-23 Thread Mapred Learn
on, long blockSize, > CompressionType compressionType, CompressionCodec codec, > Progressable progress, Metadata metadata) throws > IOException { > > > > On Thu, Feb 17, 2011 at 1:16 PM, Mapred Learn wrote: > >> Hi, >> I have a use cas

Sequence File usage queries

2011-02-17 Thread Mapred Learn
Hi, I have a use case to upload some tera-bytes of text files as sequences files on HDFS. These text files have several layouts ranging from 32 to 62 columns (metadata). What would be a good way to upload these files along with their metadata: i) creating a key, value class per text file layout

how/where to set metadata for a sequence file ?

2011-02-16 Thread Mapred Learn
Hi, I have text file data that I want to upload to hdfs as sequence file. So, where can I define the metadata for this file so that users accessing it as sequence file can undersrand and read it ? -thks JJ

hadoop fs -put vs writing text files to hadoop as sequence files

2011-02-16 Thread Mapred Learn
Hi, I have to upload some terabytes of data that is text files. What would be good option to do so: i) using hadoop fs -put to copy text files directly on hdfs. ii) copying text files as sequence files on hdfs ? What would be extra time in this case as opposed to (i). Thanks, Jimmy

question for understanding partitioning

2011-01-18 Thread Mapred Learn
hi, I have a basic question. How does partitioning work ? Following is a scenario I created to put up my question. i) A parttition function is defined as partitioning map-output based on aphabetical sorting of the key i.e. a partition for keys starting with 'a', partition for keys starting with '

Namenode shutting down again and again in single node setup

2011-01-11 Thread Mapred Learn
Hi, I have setup single node setup and any commaand I run gives error: $ /home/Owner/hadoop/hadoop-0.20.2/bin/hadoop jar hadoop-*-examples.jar grep in put output 'dfs[a-z.]+' 11/01/11 10:54:28 INFO ipc.Client: Retrying connect to server: localhost/127.0.0 .1:9000. Already tried 0 time(s). 11/01/11

Re: hadoop single user setup

2011-01-10 Thread Mapred Learn
ipc.Client: Retrying connect to server: > localhost/127.0.0 > .1:9000. Already tried 0 time(s). > > > > On Mon, Jan 10, 2011 at 2:35 PM, Mapred Learn wrote: > >> hi, >> I am a newbie and am trying to setup hadoop in single user setup on my >> windows

hadoop single user setup

2011-01-10 Thread Mapred Learn
hi, I am a newbie and am trying to setup hadoop in single user setup on my windows 7 machine. I followed steps at: http://hadoop.apache.org/common/docs/current/single_node_setup.html#L... but i keep on getting error: $ bin/