How can I tell my job to include all the subdirectories and their
content of a certain path?
My directory structure is as follows: logs/{YEAR}/{MONTH}/{DAY} and I
tried setting my input path to 'logs/' using
FileInputFormat.addInputPath however I keep receiving the following error:
I believe that opening a directory as a file will result in a file not found.
You probably need to set it to a glob, that points to that actual files.
Something like
/user/root/logs/2011/*/*/* for all entries in 2011, or
/user/root/logs/2011/01/*/* if you want to restrict it to just
I have a process which is loading data into hive hourly. Loading data
hourly isn't a problem however when I load historical data say 24-48
hours I receive the below error msg. In googling I've come across some
suggestions that jvm memory needs to be increased. Are there any other
options
Ok so the behavior is a little different when using
FileInputFormat.addInputPath
as opposed to using pig. Ill try the glob.
Thanks
On 4/6/11 8:41 AM, Robert Evans wrote:
I believe that opening a directory as a file will result in a file not found.
You probably need to set it to a glob,
Hello Mark and Robert,
On Wed, Apr 6, 2011 at 9:55 PM, Mark static.void@gmail.com wrote:
On 4/6/11 9:53 AM, Markstatic.void@gmail.com wrote:
How can I tell my job to include all the subdirectories and their
content of a certain path?
Also worth noting is that in future releases of