Hi Peter

AFAIK oozie has a mechanism to achieve this. You can trigger your jobs as
soon as the files are written to a  certain hdfs directory.

On Tue, Sep 25, 2012 at 10:23 PM, Peter Sheridan <
psheri...@millennialmedia.com> wrote:

>  These are log files being deposited by other processes, which we may not
> have control over.
>
>  We don't want multiple processes to write to the same files — we just
> don't want to start our jobs until they have been completely written.
>
>  Sorry for lack of clarity & thanks for the response.
>
>
>  --Pete
>
>   From: Bertrand Dechoux <decho...@gmail.com>
> Reply-To: "user@hadoop.apache.org" <user@hadoop.apache.org>
> Date: Tuesday, September 25, 2012 12:33 PM
> To: "user@hadoop.apache.org" <user@hadoop.apache.org>
> Subject: Re: Detect when file is not being written by another process
>
>  Hi,
>
> Multiple files and aggregation or something like hbase?
>
> Could you tell use more about your context? What are the volumes? Why do
> you want multiple processes to write to the same file?
>
> Regards
>
> Bertrand
>
> On Tue, Sep 25, 2012 at 6:28 PM, Peter Sheridan <
> psheri...@millennialmedia.com> wrote:
>
>>  Hi all.
>>
>>  We're using Hadoop 1.0.3.  We need to pick up a set of large (4+GB)
>> files when they've finished being written to HDFS by a different process.
>>  There doesn't appear to be an API specifically for this.  We had
>> discovered through experimentation that the FileSystem.append() method can
>> be used for this purpose — it will fail if another process is writing to
>> the file.
>>
>>  However: when running this on a multi-node cluster, using that API
>> actually corrupts the file.  Perhaps this is a known issue?  Looking at the
>> bug tracker I see https://issues.apache.org/jira/browse/HDFS-265 and a
>> bunch of similar-sounding things.
>>
>>  What's the right way to solve this problem?  Thanks.
>>
>>
>>  --Pete
>>
>>
>
>
> --
> Bertrand Dechoux
>

Reply via email to