Re: HDF5 and Hadoop

2010-05-03 Thread Andrew Nguyen
Chris, Thanks for the heads up! --Andrew On May 3, 2010, at 10:45 AM, Mattmann, Chris A (388J) wrote: > Hi Andrew, > > There has been some work in the Tika [1] project recently on looking at > NetCDF4 [2] and HDF4/5 [3] and extracting metadata/text content from them. > Though this doesn't di

Questions on dfs.block.size

2010-05-03 Thread stan lee
Hi Experts: Is there any method to make the dfs.block.size to take effect on old file before it changes? Or is it meaningful? If I run a job A, would it copy input files to hdfs file system if that input file has been in hdfs file system? If so, then perhaps make dfs.block.size has is meaningful

Re: java.io.FileNotFoundException

2010-05-03 Thread Carlos Eduardo Moreira dos Santos
I tried E:\tmp and also /cygdrive/e/tmp, but the error message keeps the same, except the job ids. I think the file conf/mapred-site.xml is ignored, is it possible (I restarted hdfs after conf changes)? This is the file: mapred.job.tracker hadoop-cemsbr:9001 mapred.child.tmp E:\tmp

Re: problem w/ data load

2010-05-03 Thread Amr Awadallah
yep, hive will work fine if you point it to the .gz file just note though that if this is one large gz file then it will only use one mapper and one reducer, it will not get parallelized. -- amr On 5/3/2010 11:29 AM, Edward Capriolo wrote: On Mon, May 3, 2010 at 2:00 PM, Susanne Lehmann< sus

Re: problem w/ data load

2010-05-03 Thread Edward Capriolo
On Mon, May 3, 2010 at 2:00 PM, Susanne Lehmann < susanne.lehm...@metamarketsgroup.com> wrote: > Hi Tom, > > Yes. I store the file in HDFS with a .gz extension. Do i need to > "tell" somehow Hive that it is a compressed file? > > Best, > Susanne > > PS: Thanks for the tip with the list, I will use

Re: problem w/ data load

2010-05-03 Thread Susanne Lehmann
Hi Tom, Yes. I store the file in HDFS with a .gz extension. Do i need to "tell" somehow Hive that it is a compressed file? Best, Susanne PS: Thanks for the tip with the list, I will use the other list for further questions if necessary. I wasn't sure which one to use. On Mon, May 3, 2010 at 9:5

Re: HDF5 and Hadoop

2010-05-03 Thread Mattmann, Chris A (388J)
Hi Andrew, There has been some work in the Tika [1] project recently on looking at NetCDF4 [2] and HDF4/5 [3] and extracting metadata/text content from them. Though this doesn't directly apply to your question below, it might be worth perhaps looking at how to marry Tika and Hadoop in that rega

HDF5 and Hadoop

2010-05-03 Thread Andrew Nguyen
Does anyone know of any existing work integrating HDF5 (http://www.hdfgroup.org/HDF5/whatishdf5.html) with Hadoop? I don't know much about HDF5 but it was recently brought to my attention as a way to store high-density scientific data. Since I've confirmed that having Hadoop dramatically speed

Re: Custom file formats

2010-05-03 Thread William Kinney
Not sure if anything else exists, but you can easily implement your own RecordReader that gets a FSDataInputStream from the FileSystem for the FileSplit, and then read records from that like you would any other InputStream (with offset, length, byte[], etc). On Thu, Apr 29, 2010 at 5:36 AM, Pete

Re: problem w/ data load

2010-05-03 Thread Tom White
Hi Susanne, Hadoop uses the file extension to detect that a file is compressed. I believe Hive does too. Did you store the compressed file in HDFS with a .gz extension? Cheers, Tom BTW It's best to send Hive questions like these to the hive-user@ list. On Sun, May 2, 2010 at 11:22 AM, Susanne L

Re: Assertions

2010-05-03 Thread Mithila Nagendra
Gianmarco, You might want to increase the heap size. It's a property that can be set: try setting "mapred.child.java.opts" to -Xmx1024M. Mithila On Mon, May 3, 2010 at 8:04 AM, Gianmarco wrote: > Hi all, > is there a way to enable Java assertions inside a map/reduce function? > I tried settin

Assertions

2010-05-03 Thread Gianmarco
Hi all, is there a way to enable Java assertions inside a map/reduce function? I tried setting the -enableassertions switch in hadoop-env.sh using the HADOOP_TASKTRACKER_OPTS variable but it didn't work. I tried also setting a property in mapred-site.xml mapred.child.java.opts -enableassertio

Re: java.io.FileNotFoundException

2010-05-03 Thread Aleksandar Stupar
Hi, I had the same problem. This worked for me: mapred.child.tmp D:\tmp Kind regards, Aleksandar Stupar. From: Carlos Eduardo Moreira dos Santos To: common-user Sent: Sun, May 2, 2010 9:10:03 PM Subject: Re: java.io.FileNotFoundException Yes, I can crea