Re: Avro

2012-08-04 Thread Mohit Anchlia
On Sat, Aug 4, 2012 at 11:43 PM, Nitin Kesarwani wrote: > Mohit, > > You can use this patch to suit your need: > https://issues.apache.org/jira/browse/PIG-2579 > > New fields in Avro schema descriptor file need to have a non-null default > value. Hence, using the new schema file, you should be abl

Re: Avro

2012-08-04 Thread Nitin Kesarwani
Mohit, You can use this patch to suit your need: https://issues.apache.org/jira/browse/PIG-2579 New fields in Avro schema descriptor file need to have a non-null default value. Hence, using the new schema file, you should be able to read older data as well. Try it out. It is very straight forward

[ANNOUNCE] - New user@ mailing list for hadoop users in-lieu of (common,hdfs,mapreduce)-user@

2012-08-04 Thread Arun C Murthy
All, Given our recent discussion (http://s.apache.org/hv), the new u...@hadoop.apache.org mailing list has been created and all existing users in (common,hdfs,mapreduce)-user@ have been migrated over. I'm in the process of changing the website to reflect this (HADOOP-8652). Henceforth, ple

Re: 答复: 答复: MapReduce shuffle question

2012-08-04 Thread Satheesh Kumar
Thanks, again, Liyin. On Sat, Aug 4, 2012 at 6:59 AM, 梁李印 wrote: > The optimization you mentioned is reduce-task locality-aware. > Unfortunately, > the current scheduler doesn't consider the reduce task's data locality. So > a > reduce task can be scheduled to any node with free slots. > The fol

fs cache giving me headaches

2012-08-04 Thread Koert Kuipers
nothing has confused me as much in hadoop as FileSystem.close(). any decent java programmer that sees that an object implements Closable writes code like this: Final FileSystem fs = FileSystem.get(conf); try { // do something with fs } finally { fs.close(); } so i started out using hadoop

Maven dependency problems

2012-08-04 Thread Matthias Friedrich
Hi, I'm currently trying to fix Maven dependencies for Crunch and ran into trouble with the POM for hadoop-core 1.0.3. It looks like the Maven dependencies are different from the actual dependencies at runtime. As a result, bugs caused by dependency conflicts won't show up until runtime, makeing

答复: 答复: MapReduce shuffle question

2012-08-04 Thread 梁李印
The optimization you mentioned is reduce-task locality-aware. Unfortunately, the current scheduler doesn't consider the reduce task's data locality. So a reduce task can be scheduled to any node with free slots. The following jira is discussing this problem: https://issues.apache.org/jira/browse/MA

Re: migrate cluster to different datacenter

2012-08-04 Thread Nitin Kesarwani
Given the size of data, there can be several approaches here: 1. Moving the boxes Not possible, as I suppose the data must be needed for 24x7 analytics. 2. Mirroring the data. This is a good solution. However, if you have data being written/removed continuously (if a part of live system), there