Re: log

2013-04-19 Thread Bejoy Ks
This basically happens while running a mapreduce job. When a map reduce job is triggered the job files are put in hdfs with high replication ( replication is controlled by - 'mapred.submit.replication' default value is 10). The job files are cleaned up after the job is completed and hence that c

Re: Physically moving HDFS cluster to new

2013-04-17 Thread Bejoy Ks
Adding on to the comments You might need to update the etc-hosts with new values. If the host name changes as well, you may need to update the fs.default.name and mapred.job.tracker with new values. On Thu, Apr 18, 2013 at 10:08 AM, Azuryy Yu wrote: > Data nodes name or IP changed cannot ca

Re: Adjusting tasktracker heap size?

2013-04-17 Thread Bejoy Ks
Hi Marcos, You need to consider the slots based on the available memory Available Memory = Total RAM - (Memory for OS + Memory for Hadoop Daemons like DN,TT + Memory for other servicess if any running in that node) Now you need to consider the generic MR jobs planned on your cluster. Say if your

Re: How to improve performance of this cluster

2013-04-17 Thread Bejoy Ks
Hi Geelong Let me just put in my thoughts here You have 8G of RAM. But you have 8+8 = 16 slots with task jvm size of 1G. This means if all slots are utilized simultaneously then tasks need 16G but only 8G is available, hence high chances of OOM errors. When you decide on slots you need to consid

Re: How to change secondary namenode location in Hadoop 1.0.4?

2013-04-17 Thread Bejoy Ks
Hi Henry You can change the secondary name node storage location by overriding the property 'fs.checkpoint.dir' in your core-site.xml On Wed, Apr 17, 2013 at 2:35 PM, Henry Hung wrote: > Hi All, > > ** ** > > What is the property name of Hadoop 1.0.4 to change secondary namenode > locatio

Re: VM reuse!

2013-04-16 Thread Bejoy Ks
you have > mentioned. > Much more number of mappers and less number of mappers slots. > > Regards, > Rahul > > > On Tue, Apr 16, 2013 at 2:40 PM, Bejoy Ks wrote: > >> Hi Rahul >> >> If you look at larger cluster and jobs that involve larger input data

Re: HW infrastructure for Hadoop

2013-04-16 Thread Bejoy Ks
+1 for "Hadoop Operations" On Tue, Apr 16, 2013 at 3:57 PM, MARCOS MEDRADO RUBINELLI < marc...@buscapecompany.com> wrote: > Tadas, > > "Hadoop Operations" has pretty useful, up-to-date information. The chapter > on hardware selection is available here: > http://my.safaribooksonline.com/book/dat

Re: VM reuse!

2013-04-16 Thread Bejoy Ks
Hi Rahul If you look at larger cluster and jobs that involve larger input data sets. The data would be spread across the whole cluster, and a single node might have various blocks of that entire data set. Imagine you have a cluster with 100 map slots and your job has 500 map tasks, now in that ca

Re: Submitting mapreduce and nothing happens

2013-04-16 Thread Bejoy Ks
Hi Amit Are you seeing any errors or warnings on JT logs? Regards Bejoy KS

Re: Unexpected Hadoop behavior: map task re-running after reducer has been running

2013-03-11 Thread Bejoy Ks
triggering the job after increasing the value for tasktracker.http.threads Regards Bejoy KS

Re: fundamental doubt

2012-11-21 Thread Bejoy KS
. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: jamal sasha Date: Wed, 21 Nov 2012 14:50:51 To: user@hadoop.apache.org Reply-To: user@hadoop.apache.org Subject: fundamental doubt Hi.. I guess i am asking alot of fundamental questions but i thank you guys

Re: guessing number of reducers.

2012-11-21 Thread Bejoy KS
. You can round this value and use it to set the number of reducers in conf programatically. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Manoj Babu Date: Wed, 21 Nov 2012 23:28:00 To: Cc: bejoy.had...@gmail.com Subject: Re: guessing number of

Re: guessing number of reducers.

2012-11-21 Thread Bejoy KS
Reducers at the max capacity was very clos. :) AK47 From: Bejoy KS [mailto:bejoy.had...@gmail.com] Sent: Wednesday, November 21, 2012 11:51 AM To: user@hadoop.apache.org Subject: Re: guessing number of reducers. Hi Sasha In general the number of reduce tasks is chosen mainly based on the data vo

Re: guessing number of reducers.

2012-11-21 Thread Bejoy KS
then you need lesser volume of data per reducer for better performance results. In general it is better to have the number of reduce tasks slightly less than the number of available reduce slots in the cluster. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message

Re: Supplying a jar for a map-reduce job

2012-11-20 Thread Bejoy KS
Hi Pankaj AFAIK You can do the same. Just provide the properties like mapper class, reducer class, input format, output format etc using -D option at run time. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Pankaj Gupta Date: Tue, 20 Nov 2012 20

Re: number of reducers

2012-11-20 Thread Bejoy KS
Hi Sasha By default the number or reducers are set to be 1. If you want more you need to specify it as hadoop jar myJar.jar myClass -D mapred.reduce.tasks=20 ... Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: jamal sasha Date: Tue, 20 Nov 2012 14

Re: Strange error in Hive

2012-11-15 Thread Bejoy KS
r jar Guava jar Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Mark Kerzner Date: Wed, 14 Nov 2012 17:05:20 To: Hadoop User Reply-To: user@hadoop.apache.org Subject: Strange error in Hive Hi, I am trying to insert a table in hive, and I am getting thi

Re: Hadoop Java compilation error

2012-11-14 Thread Bejoy KS
on your hadoop client machine and run it using 'hadoop jar ...' command. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: R J Date: Wed, 14 Nov 2012 15:21:39 To: Robert Evans; user@hadoop.apache.org Reply-To: user@hadoop.apache.org S

Re: Setting up a edge node to submit jobs

2012-11-14 Thread Bejoy KS
. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Manoj Babu Date: Thu, 15 Nov 2012 10:03:24 To: Reply-To: user@hadoop.apache.org Subject: Setting up a edge node to submit jobs Hi, How to setup a edge node for a hadoop cluster to submit jobs? Thanks in

Re: Map-Reduce V/S Hadoop Ecosystem

2012-11-07 Thread Bejoy KS
updates along with the Map/Reduce job that does hdfs data processing. Summarizing my thought the custom MR code can limit the no of MR jobs in this case. There can be n number of complex scenarios like this where your custom code turns more efficient and performant. Regards Bejoy KS Sent from

Re: Map-Reduce V/S Hadoop Ecosystem

2012-11-07 Thread Bejoy KS
code can be efficient as yours would very specific to your app but the MR in hive and pig may be more generic. To just write your custom mapreduce functions, just basic knowledge on java is good. As you are better with java you can understand the internals better. Regards Bejoy KS Sent from

Re: Set the number of maps

2012-11-01 Thread Bejoy KS
map tasks for your job you need to increase the value for min and max split sizes. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Ted Dunning Date: Thu, 1 Nov 2012 09:50:10 To: Reply-To: user@hadoop.apache.org Subject: Re: Set the number of ma

Re: How to do HADOOP RECOVERY ???

2012-10-29 Thread Bejoy KS
. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Date: Mon, 29 Oct 2012 10:43:44 To: Reply-To: user@hadoop.apache.org Subject: RE: How to do HADOOP RECOVERY ??? Thanks Uma, I am using hadoop-0.20.2 version. UI shows. Cluster Summary 379 files and

Re: Data locality of map-side join

2012-10-22 Thread Bejoy KS
query. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Sigurd Spieckermann Date: Mon, 22 Oct 2012 22:29:15 To: Reply-To: user@hadoop.apache.org Subject: Data locality of map-side join Hi guys, I've been trying to figure out whether a map-side join

Re: Old vs New API

2012-10-22 Thread Bejoy KS
was not available in new mapreduce API at that point. Now mapreduce API is pretty good and you can go ahead with that for development. AFAIK mapreduce API is the future. Let's wait for a commiter to officially comment on this. Regards Bejoy KS Sent from handheld, please excuse

Re: extracting lzo compressed files

2012-10-21 Thread Bejoy KS
Hi Manoj You can get the file in a readable format using hadoop fs -text Provided you have lzo codec within the property 'io.compression.codecs' in core-site.xml A 'hadoop fs -ls' command would itself display the file size. Regards Bejoy KS Sent from handheld,

Re: Hadoop counter

2012-10-19 Thread Bejoy KS
Thanks Harsh. Great learning from you as always. :) Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Harsh J Date: Fri, 19 Oct 2012 21:20:07 To: ; Reply-To: user@hadoop.apache.org Subject: Re: Hadoop counter Bejoy is almost right, except that

Re: Hadoop counter

2012-10-19 Thread Bejoy KS
Hi Jay Counters are reported at the end of a task to JT. So if a task fails the counters from that task are not send to JT and hence won't be included in the final value of counters from that Job. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From

Re: document on hdfs

2012-10-10 Thread Bejoy KS
Hi Murthy Hadoop - The definitive Guide by Tom White has the details on file write anatomy. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: murthy nvvs Date: Wed, 10 Oct 2012 04:27:58 To: user@hadoop.apache.org Reply-To: user@hadoop.apache.org

Re: stable release of hadoop

2012-10-09 Thread Bejoy KS
Hi Nisha The current stable version is the 1.0.x releases. This is well suited for production environments. 0.23.x/2.x.x releases is of alpha quality and hence not that recommended on production. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From

Re: Task Attempt Failed

2012-10-08 Thread Bejoy Ks
Hi Dave Can you post the task logs corresponding to this. You can browse the web Ui till the failed task log, it'll conatin more information to help you analyze task failure reasons. On Mon, Oct 8, 2012 at 5:58 PM, Dave Shine < dave.sh...@channelintelligence.com> wrote: > I’m starting to see th

Re: One file per mapper?

2012-10-08 Thread Bejoy Ks
Hi Terry If you are having files smaller than hdfs block size and if you are using Default TextInputFormat with the default properties for split sizes there would be just one file per mapper. If you are having larger file sizes, greater than the size of a hdfs block. Please take a look at a sampl

Re: What is the difference between Rack-local map tasks and Data-local map tasks?

2012-10-07 Thread Bejoy KS
tasks when the number of input splits/map tasks are large which is quite common. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: centerqi hu Date: Sun, 7 Oct 2012 23:28:55 To: Reply-To: user@hadoop.apache.org Subject: Re: What is the difference

Re: hadoop memory settings

2012-10-05 Thread Bejoy KS
Hi Sadak AFAIK HADOOP_HEAPSIZE determines the jvm size of the daemons like NN,JT,TT,DN etc. mapred.child.java.opts and mapred.child.ulimit is used to set the jvm heap for child jvms launched for each map/reduce task launched. Regards Bejoy KS Sent from handheld, please excuse typos

Re: Multiple Aggregate functions in map reduce program

2012-10-05 Thread Bejoy KS
aggregated sum and count for each key. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: iwannaplay games Date: Fri, 5 Oct 2012 12:32:28 To: user; ; hdfs-user Reply-To: user@hadoop.apache.org Subject: Multiple Aggregate functions in map reduce program Hi All

Re: copyFromLocal

2012-10-04 Thread Bejoy KS
ld be accessible from this client. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: "Kartashov, Andy" Date: Thu, 4 Oct 2012 16:51:35 To: user@hadoop.apache.org Reply-To: user@hadoop.apache.org Subject: RE: copyFromLocal I use -put -get commands to

Re: How to lower the total number of map tasks

2012-10-02 Thread Bejoy KS
Hi Shing Is your input a single file or set of small files? If latter you need to use CombineFileInputFormat. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Shing Hing Man Date: Tue, 2 Oct 2012 10:38:59 To: user@hadoop.apache.org Reply-To: user

Re: How to lower the total number of map tasks

2012-10-02 Thread Bejoy KS
Shing This doesn't change the block size of existing files in hdfs, only new files written to hdfs will be affected. To get this in effect for old files you need to re copy them atleast within hdfs. hadoop fs -cp src destn. Regards Bejoy KS Sent from handheld, please excuse

Re: How to lower the total number of map tasks

2012-10-02 Thread Bejoy Ks
Sorry for the typo, the property name is mapred.max.split.size Also just for changing the number of map tasks you don't need to modify the hdfs block size. On Tue, Oct 2, 2012 at 10:31 PM, Bejoy Ks wrote: > Hi > > You need to alter the value of mapred.max.split size to a value

Re: How to lower the total number of map tasks

2012-10-02 Thread Bejoy Ks
Hi You need to alter the value of mapred.max.split size to a value larger than your block size to have less number of map tasks than the default. On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man wrote: > > > > I am running Hadoop 1.0.3 in Pseudo distributed mode. > When I submit a map/reduce j

Re: File block size use

2012-10-01 Thread Bejoy KS
Hi Anna If you want to increase the block size of existing files. You can use a Identity Mapper with no reducer. Set the min and max split sizes to your requirement (512Mb). Use SequenceFileInputFormat and SequenceFileOutputFormat for your job. Your job should be done. Regards Bejoy KS

Re: CombineFileInputFormat and mapreduce in v20.2

2012-09-27 Thread Bejoy Ks
Hi Anna One option I can think of is getting the CombineFileInputFormat from the latest release add it as a Custom Input format in your application code and ship it with your map reduce appl jar. Similar to how you'll implement a input format of your own and use it with map reduce. Regards

Re: CombineFileInputFormat and mapreduce in v20.2

2012-09-27 Thread Bejoy Ks
Hi Anna CombineFileInputFormat is included within in the mapreduce package in the latest releases http://hadoop.apache.org/docs/r1.0.3/api/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.html Regards Bejoy KS

Re: Two map inputs (file & HBase). "Join" file data and Hbase data into a map reduce.

2012-09-27 Thread Bejoy Ks
Hi Pablo >I could read the file and do a get followed by a put, but this would not be a MR job >and would be very slow if there are a lot of entries in the file. If you have a large file, by using mapreduce you can parallelize the hbase gets and puts. Configure the split size accordingly so

Re: How not to clean MapReduce temp data?

2012-09-27 Thread Bejoy Ks
Hi The temporary output from tasks can be preserved using the following property'keep.task.files.pattern' http://books.google.co.in/books?id=drbI_aro20oC&pg=PA178&lpg=PA178&dq=keep.task.files.pattern&source=bl&ots=tZAmxgm_j4&sig=Guc0bh2BQzlbMqOADtic5WciIz0&hl=en&sa=X&ei=zI9kULbDM8zhrAe3jYH4BA&ved

Re: Advice on Migrating to hadoop + hive

2012-09-27 Thread Bejoy Ks
s://cwiki.apache.org/Hive/tutorial.html#Tutorial-PartitionBasedQuery Regards Bejoy KS

Re: Unit tests for Map and Reduce functions.

2012-09-26 Thread Bejoy Ks
Hi Ravi You can take a look at mockito http://books.google.co.in/books?id=Nff49D7vnJcC&pg=PA138&lpg=PA138&dq=mockito+%2B+hadoop&source=bl&ots=IifyVu7yXp&sig=Q1LoxqAKO0nqRquus8jOW5CBiWY&hl=en&sa=X&ei=b2pjULHSOIPJrAeGsIHwAg&ved=0CC0Q6AEwAg#v=onepage&q=mockito%20%2B%20hadoop&f=false On Thu, Sep 27,

Re: Programming Question / Joining Dataset

2012-09-26 Thread Bejoy Ks
Hi Oliver I have scribbled a small post on reduce side joins , the implementation matches with your requirement http://kickstarthadoop.blogspot.in/2011/09/joins-with-plain-map-reduce.html Regards Bejoy KS

Re: Detect when file is not being written by another process

2012-09-25 Thread Bejoy Ks
Hi Peter AFAIK oozie has a mechanism to achieve this. You can trigger your jobs as soon as the files are written to a certain hdfs directory. On Tue, Sep 25, 2012 at 10:23 PM, Peter Sheridan < psheri...@millennialmedia.com> wrote: > These are log files being deposited by other processes, which

Re: Help on a Simple program

2012-09-25 Thread Bejoy Ks
Hi If you don't want either key or value in the output, just make the corresponding data types as NullWritable. Since you just need to filter out a few records/itemd from your logs, reduce phase is not mandatory just a mappper would suffice your needs. From your mapper just output the records tha

Re: Map issue in Hive.

2012-09-21 Thread Bejoy KS
column if it is just a collection of elements rather than a key value pair, you can use an Array data type instead. Here just specify the delimiter for each values using 'COLLECTION ITEMS TERMINATED BY' Regards, Bejoy KS From: Manish To: user Cc:

Re: Job failed with large volume of small data: java.io.EOFException

2012-09-20 Thread Bejoy Ks
inux boxes that run DNs. You can verify the current value using 'ulimit -n' and then try increasing the same to a much higher value. Regards Bejoy KS

Re: How to make the hive external table read from subdirectories

2012-09-12 Thread Bejoy KS
x27;, dayofmonth='11') LOCATION '/user/myuser/MapReduceOutput/2012/09/11'; Like this you need to register each of the paritions. After this your query should work as desired. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Nataraj Rashmi -

Re: How to make the hive external table read from subdirectories

2012-09-12 Thread Bejoy KS
Hi Natraj Create a partitioned table and add the sub dirs as partitions. You need to have some logic in place for determining the partitions. Say if the sub dirs denote data based on a date then make date as the partition. Regards Bejoy KS Sent from handheld, please excuse typos

Re: How to remove datanode from cluster..

2012-09-11 Thread Bejoy Ks
Hi Yogesh The detailed steps are available in hadoop wiki on FAQ page http://wiki.apache.org/hadoop/FAQ#I_want_to_make_a_large_cluster_smaller_by_taking_out_a_bunch_of_nodes_simultaneously._How_can_this_be_done.3F Regrads Bejoy KS On Wed, Sep 12, 2012 at 12:14 AM, yogesh dhari wrote

Re: Issue in access static object in MapReduce

2012-09-11 Thread Bejoy Ks
of map()/reduce() on each record can have a look up into the json object. *public* *void* configure(JobConf job) Regards Bejoy KS On Tue, Sep 11, 2012 at 8:23 PM, Stuti Awasthi wrote: > ** ** > > Hi, > > I have a configuration JSON file which is accessed by MR job for every

Re: Some general questions about DBInputFormat

2012-09-11 Thread Bejoy KS
ions >allowed for that db. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Yaron Gonen Date: Tue, 11 Sep 2012 15:41:26 To: Reply-To: user@hadoop.apache.org Subject: Some general questions about DBInputFormat Hi, After reviewing the

Re: what's the default reducer number?

2012-09-11 Thread Bejoy Ks
Hi Lin The default values for all the properties are in core-default.xml hdfs-default.xml and mapred-default.xml Regards Bejoy KS On Tue, Sep 11, 2012 at 5:06 PM, Jason Yang wrote: > Hi, Bejoy > > Thanks for you reply. > > where could I find the default value of mapred.reduce

Re: what's the default reducer number?

2012-09-11 Thread Bejoy Ks
Hi Lin The default value for number of reducers is 1 mapred.reduce.tasks 1 It is not determined by data volume. You need to specify the number of reducers for your mapreduce jobs as per your data volume. Regards Bejoy KS On Tue, Sep 11, 2012 at 4:53 PM, Jason Yang wrote: > Hi, all &g

Re: Understanding of the hadoop distribution system (tuning)

2012-09-10 Thread Bejoy Ks
can degrade the performance to a greater extent. In larger data volumes a few non data local map tasks are common. Regards Bejoy KS On Tue, Sep 11, 2012 at 11:37 AM, Elaine Gan wrote: > Hi Hermanth > > Thank you for your detailed answered. Your answers helped me much in > u

Re: Replication Factor Modification

2012-09-05 Thread Bejoy Ks
Hi Uddipan As Harsh mentioned, replication factor is a client side property . So you need to update the value for 'dfs.replication' in hdfs-site.xml as per your requirement in your edge nodes or from the machines your are copying files to hdfs. If you are using some of the existing DN's for this

Re: Replication Factor Modification

2012-09-05 Thread Bejoy Ks
Hi You can change the replication factor of an existing directory using '-setrep' http://hadoop.apache.org/common/docs/r0.20.0/hdfs_shell.html#setrep The below command will recursively set the replication factor to 1 for all files within the given directory '/user' hadoop fs -setrep -w 1 -R /use

Re: Exception while running a Hadoop example on a standalone install on Windows 7

2012-09-04 Thread Bejoy Ks
Hi Udayani By default hadoop works well for linux and linux based OS. Since you are on Windows you need to install and configure ssh using cygwin before you start hadoop daemons. On Tue, Sep 4, 2012 at 6:16 PM, Udayini Pendyala wrote: > Hi, > > > Following is a description of what I am trying t

Re: Integrating hadoop with java UI application deployed on tomcat

2012-09-04 Thread Bejoy KS
the exact configuration files and hadoop jars from the cluster machines on this tomcat environment as well. I mean on the classpath of your application. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Visioner Sadak Date: Tue, 4 Sep 2012 15:31:25

Re: knowing the nodes on which reduce tasks will run

2012-09-03 Thread Bejoy Ks
ot;0" without > restarting tasktracker? > > ~Abhay > > > On Mon, Sep 3, 2012 at 8:53 PM, Bejoy Ks wrote: > >> HI Abhay >> >> The TaskTrackers on which the reduce tasks are triggered is chosen in >> random based on the redu

Re: knowing the nodes on which reduce tasks will run

2012-09-03 Thread Bejoy Ks
HI Abhay The TaskTrackers on which the reduce tasks are triggered is chosen in random based on the reduce slot availability. So if you don't need the reduce tasks to be scheduled on some particular nodes you need to set 'mapred.tasktracker.reduce.tasks.maximum' on those nodes to 0. The bottleneck

Re: reading a binary file

2012-09-03 Thread Bejoy Ks
Hi Francesco TextInputFormat reads line by line based on '\n' by default, there the key values is the position offset and the line contents respectively. But in your case it is just a sequence of integers and also it is Binary. Also you require the offset for each integer value and not offset by l

Re: MRBench Maps strange behaviour

2012-08-29 Thread Bejoy KS
Hi Gaurav You can get the information on the num of map tasks in the job from the JT web UI itself. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Gaurav Dasgupta Date: Wed, 29 Aug 2012 13:14:11 To: Reply-To: user@hadoop.apache.org Subject: Re

Re: one reducer is hanged in "reduce-> copy" phase

2012-08-28 Thread Bejoy KS
is re attempted. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Abhay Ratnaparkhi Date: Tue, 28 Aug 2012 19:40:58 To: Reply-To: user@hadoop.apache.org Subject: one reducer is hanged in "reduce-> copy" phase Hello, I have a MR job which ha

Re: namenode not starting

2012-08-24 Thread Bejoy KS
Hi Abhay What is the value for hadoop.tmp.dir or dfs.name.dir . If it was set to /tmp the contents would be deleted on a OS restart. You need to change this location before you start your NN. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Abhay

Re: Reading multiple lines from a microsoft doc in hadoop

2012-08-23 Thread Bejoy KS
Hi Siddharth I believe doc and docx have custom formatting other than text. In that case you may have to build your own input format. Also your own record reader if you want to have the record delimiter as an empty line. Regards Bejoy KS Sent from handheld, please excuse typos

Re: Info required regarding JobTracker Job Details/Metrics

2012-08-23 Thread Bejoy Ks
en to LFS. Regrads Bejoy KS On Thu, Aug 23, 2012 at 4:54 PM, Gaurav Dasgupta wrote: > Sorry, the correct outcomes are for the single wordcount job are: > > 12/08/23 04:31:22 INFO mapred.JobClient: Job complete: > job_201208230144_0002 > 12/08/23 04:31:22 INFO mapred.JobClient:

Re: why is num of map tasks gets overridden?

2012-08-23 Thread Bejoy Ks
believe you need to restart TT as well). Regards Bejoy KS On Thu, Aug 23, 2012 at 4:42 PM, nutch buddy wrote: > how do I adjust number of slots per node? > and also, is the parameter maperd.tasktracker.map.tasks.maximum relevant > here? > > thanks > > > On Wed, Aug 22

Re: why is num of map tasks gets overridden?

2012-08-21 Thread Bejoy KS
tsize to the maximum data chuck a map task can process with your current memory constrain. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: nutch buddy Date: Wed, 22 Aug 2012 08:57:31 To: Reply-To: user@hadoop.apache.org Subject: Re: why is num of map

Re: Changing tab delimiter between key and value

2012-08-21 Thread Bejoy Ks
Hi Siddharth Can you try providing a single character instead of multiple ones and see whether it is giving desired result? Regrads Bejoy KS

Re: Collecting MAP output in a Iterator

2012-08-21 Thread Bejoy Ks
performance edge to your jobs. Details on map join in hive : https://cwiki.apache.org/Hive/joinoptimization.html http://hive.apache.org/docs/r0.9.0/language_manual/joins.html Regards Bejoy KS

Re: Collecting MAP output in a Iterator

2012-08-21 Thread Bejoy Ks
Hi Siddharth Cross joins are implemented in a well optimized manner in Hive and Pig. Your use case is a good fit for Hive or pig , it'd be better going for those rather than spending much time on implementing the MR jobs yourself. Regards Bejoy KS

Re: Streaming issue ( URGENT )

2012-08-20 Thread Bejoy Ks
/joins-with-plain-map-reduce.html Regards Bejoy KS

Re: hadoop 1.0.3 config exception

2012-08-18 Thread Bejoy KS
Rahul, Please start a new thread for your queries. That would keep the mail archives neat and clean. Also share more details like logs to help you better. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: rahul p Date: Sat, 18 Aug 2012 19:37:04 To

Re: Number of Maps running more than expected

2012-08-17 Thread Bejoy Ks
Hi Gaurav While calculating you got the number of map tasks per file as 8.12 ie 9 map tasks for each file. So for 100 files it is 900 map tasks and now your numbers match. Doesn't it look right? Regards Bejoy KS

Re: Number of Maps running more than expected

2012-08-17 Thread Bejoy Ks
even for a hdfs block. Regards Bejoy KS

Re: Number of Maps running more than expected

2012-08-17 Thread Bejoy Ks
for number of splits won't hold. If small files are there then definitely the number of maps tasks should be more. Also did you change the split sizes as well along with block size? Regards Bejoy KS

Re: medical diagnosis project

2012-08-16 Thread Bejoy Ks
Hi Mallik It looks like a good candidate for Mahout. Please post this in mahout user group as well, there is a possibility that similar projects are already implemented using mahout. Regards Bejoy KS

Re: how to enhance job start up speed?

2012-08-13 Thread Bejoy KS
considering data locality. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Matthias Kricke Sender: matthias.zeng...@gmail.com Date: Mon, 13 Aug 2012 17:53:55 To: Reply-To: user@hadoop.apache.org Subject: Re: how to enhance job start up speed? @Bejoy

Re: how to enhance job start up speed?

2012-08-13 Thread Bejoy KS
the map task node. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Matthias Kricke Sender: matthias.zeng...@gmail.com Date: Mon, 13 Aug 2012 16:33:06 To: Reply-To: user@hadoop.apache.org Subject: Re: how to enhance job start up speed? Ok, I try to

Re: Hbase JDBC API

2012-08-10 Thread Bejoy Ks
/HBaseIntegration Regards Bejoy KS

Re: fs.local.block.size vs file.blocksize

2012-08-09 Thread Bejoy Ks
To understand more details on the working, i have just scribbled something long back, may be it can help you start off http://kickstarthadoop.blogspot.in/2011/04/word-count-hadoop-map-reduce-example.html Regards Bejoy KS

Re: Please use the correct way to unsubscribe

2012-08-08 Thread Bejoy KS
have set some email rules based on the old email Ids need to update those to new one. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Tharindu Mathew Date: Wed, 8 Aug 2012 21:52:31 To: ; Sebastian Feher Reply-To: user@hadoop.apache.org Subject: Re

Re: need help asap !!!

2012-08-08 Thread Bejoy KS
e and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com Regards Bejoy KS

Re: text file to sequence file

2012-08-08 Thread Bejoy KS
If you have large number of files and using MapReduce to do the conversion to Sequence Files, set the output format of the MR job as SequenceFileOutputFormat. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Flavio Dias Date: Wed, 8 Aug 2012 09:43:26

Re:

2012-08-08 Thread Bejoy KS
Yes Ashwin it is. Welcome to the community. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: A Ashwin Date: Wed, 8 Aug 2012 10:04:46 To: Reply-To: user@hadoop.apache.org Subject: Hi, Is this the mail id to contact hadoop for any queries. Thanks