How to specify delimiters in MultipleInputPaths

2013-10-31 Thread Inder Pall
I want to use MultipleInputs and use multiple mappers to process different files. Let's say in all mappers i want to use KeyValueTextInputFormat. The challenge is that separator for this input format seems to be set at a job level. So if i have two files where one is COMMA separated and the other

Re: Large-scale collection of logs from multiple Hadoop nodes

2013-08-05 Thread Inder Pall
We have been using a flume like system for such usecases at significantly large scale and it has been working quite well. Would like to hear thoughts/challenges around using zeromq alike systems at good enough scale. inder you are the average of 5 people you spend the most time with On Aug 5,

Clarifications on excludedNodeList in DFSClient

2012-11-19 Thread Inder Pall
Folks, i was wondering if there is any mechanism/logic to move a node back from the excludedNodeList to live nodes to be tried for new block creation. In the current DFSClient code i do not see this. The use-case is if the write timeout is being reduced and certain nodes get aggressively added

Re: Clarifications on excludedNodeList in DFSClient

2012-11-19 Thread Inder Pall
On Mon, Nov 19, 2012 at 3:25 PM, Inder Pall inder.p...@gmail.com wrote: Folks, i was wondering if there is any mechanism/logic to move a node back from the excludedNodeList to live nodes to be tried for new block creation. In the current DFSClient code i do not see this. The use-case

Differences between hflush hsync()

2012-04-12 Thread Inder Pall
Folks, Can some one shed out more technical details than what the javadoc talks about. Also, which one should be used when? -- Thanks, - Inder Tech Platforms @Inmobi Linkedin - http://goo.gl/eR4Ub

Re: Differences between hflush hsync()

2012-04-12 Thread Inder Pall
/common/docs/r0.23.1/api/org/apache/hadoop/fs/FSDataOutputStream.html#hsync() Ticket https://issues.apache.org/jira/browse/HDFS-744 tracks completion of the hsync() feature. On Thu, Apr 12, 2012 at 3:59 PM, Inder Pall inder.p...@gmail.com wrote: Folks, Can some one shed out more technical

Re: Is append allowed in HDFS?

2012-04-09 Thread Inder Pall
Based on what i have tried, after a sync you need to open a new Reader. Please correct if that's not the write semantics. Thanks, - Inder On Mon, Apr 9, 2012 at 4:23 PM, Harsh J ha...@cloudera.com wrote: I'd also like to note that there are some unresolved issues with the append version in

Re: Is append allowed in HDFS?

2012-04-09 Thread Inder Pall
and debug-printing). On Mon, Apr 9, 2012 at 6:05 PM, Inder Pall inder.p...@gmail.com wrote: Based on what i have tried, after a sync you need to open a new Reader. Please correct if that's not the write semantics. Thanks, - Inder On Mon, Apr 9, 2012 at 4:23 PM, Harsh J ha...@cloudera.com

Re: Is append allowed in HDFS?

2012-04-09 Thread Inder Pall
it by a few, for your problem? On Mon, Apr 9, 2012 at 10:34 PM, Inder Pall inder.p...@gmail.com wrote: Yes makes sense. My use-case is more like a producer/consumer and consumer trying to stream data as it arrives. Has anyone hit this before and if so resolved it in a better way. Apologies

Re: Running a job continuously

2011-12-11 Thread Inder Pall
have you looked at kafka. it provides a streaming view of data stream. flume at the moment is getting rewritten as flume ng On Dec 6, 2011 4:28 PM, Praveen Sripati praveensrip...@gmail.com wrote: If the requirement is for real time data processing, using Flume will not suffice as there is a

SymLink related query

2011-12-04 Thread Inder Pall
People, i have a symLink something like test_current in HDFS. When i open this file it has the name of the actual file being pointed to. I am looking for a JAVA api wherein i can get the target file name being pointed to by the symlink? -- Thanks, - Inder Tech Platforms @Inmobi Linkedin -

Re: What's the equivalent of tail -f

2011-11-28 Thread Inder Pall
Thanks Harsh, that's what i was looking for. - inder On Mon, Nov 28, 2011 at 4:48 PM, Harsh J ha...@cloudera.com wrote: If you ask just from a shell POV, then yes, you may use hadoop fs -tail -f file. On Mon, Nov 28, 2011 at 4:17 PM, Inder Pall inder.p...@gmail.com wrote

hdfs behavior

2011-11-28 Thread Inder Pall
People, i am seeing the following - 1. writing to a large file on HDFS 2. tail -f on the same file shows data is streaming. 3. hadoop dfs -ls on the same file shows size as 0. Has anyone experienced this? -- Inder

Re: What does hdfs balancer do after adding more disks to existing datanode.

2011-11-22 Thread Inder Pall
This is an interesting usecase based on my understanding data nodes send block information to name node so if you move the block files around old data node should stop sending and new nodes would start sending. each block is a seperate file. it would be better to try this but i dont think this is

error building libhdfs

2011-11-08 Thread Inder Pall
facing the following if /bin/sh ./libtool --mode=compile --tag=CC gcc -DPACKAGE_NAME=\libhdfs\ -DPACKAGE_TARNAME=\libhdfs\ -DPACKAGE_VERSION=\0.1.0\ -DPACKAGE_STRING=\libhdfs\ 0.1.0\ -DPACKAGE_BUGREPORT=\omal...@apache.org\ -DPACKAGE=\libhdfs\ -DVERSION=\0.1.0\ -DSTDC_HEADERS=1