Setting input paths

2011-04-06 Thread Mark
How can I tell my job to include all the subdirectories and their content of a certain path? My directory structure is as follows: logs/{YEAR}/{MONTH}/{DAY} and I tried setting my input path to 'logs/' using FileInputFormat.addInputPath however I keep receiving the following error:

Re: Setting input paths

2011-04-06 Thread Robert Evans
I believe that opening a directory as a file will result in a file not found. You probably need to set it to a glob, that points to that actual files. Something like /user/root/logs/2011/*/*/* for all entries in 2011, or /user/root/logs/2011/01/*/* if you want to restrict it to just

Re: Setting input paths

2011-04-06 Thread hadoopman
I have a process which is loading data into hive hourly. Loading data hourly isn't a problem however when I load historical data say 24-48 hours I receive the below error msg. In googling I've come across some suggestions that jvm memory needs to be increased. Are there any other options

Re: Setting input paths

2011-04-06 Thread Mark
Ok so the behavior is a little different when using FileInputFormat.addInputPath as opposed to using pig. Ill try the glob. Thanks On 4/6/11 8:41 AM, Robert Evans wrote: I believe that opening a directory as a file will result in a file not found. You probably need to set it to a glob,

Re: Setting input paths

2011-04-06 Thread Harsh Chouraria
Hello Mark and Robert, On Wed, Apr 6, 2011 at 9:55 PM, Mark static.void@gmail.com wrote: On 4/6/11 9:53 AM, Markstatic.void@gmail.com  wrote: How can I tell my job to include all the subdirectories and their content of a certain path? Also worth noting is that in future releases of