Just starting to use Nifi and built a flow that implements the following:

unzip -p my.zip *LMTD* | tail -n +2 | gzip --fast | hdfs dfs -put -
/some/hdfs/file

I used the following processor flow:

ExecuteProcess(unzip -p) -> ExecuteStreamCommand(tail -n +2) ->
CompressContent(gzip) -> PutHDFS

Couple questions/observations:

1. I got hung up for awhile on the ExecuteStreamCommand(tail -n +2)
part. I need that to strip the header line off of CSV files. I did not
see a simple way using a specific processor to strip off the first
line of a flow file. Is there a better way? But, I did notice a very
odd behavior of this command. If I configured the command arguments as
"-n +2" (without the quotes and space between the two parts), the
command would result in a "tail -n2" behavior. So, instead of giving
me all EXCEPT the first line, I only got the last 2 lines. However,
using "-n+2" (without the quotes and REMOVING the space) it worked as
expected. I believe with is confusing to the user. Both forms work
perfectly from the bash command line but only one works in Nifi?
Anyone care to comment on this? Should there be an enhancement to
remove this sort of inconsistent behavior?

2. Regarding my need to unzip ONLY one specific file from the zip
files (the one that matches *LMTD*), I did not see a way to do that
using the UnpackContent processor. Seems like it will only unzip the
whole zip file and provide me index numbers for each file unpacked.
This would be quite inefficient in my case because there are a number
of large files inside the zip file and I only need one. So, seems like
I am doing this the preferred way but, being new to Nifi, just wanted
to see if there are any other ideas on how to do this?

Thanks in advance for thoughts on this

Reply via email to