Hello For the unpacking portion are you saying you have a single archive (let's say in zip format) and it contains multiple objects within. You'd like to be able to use UnpackContent but tell it you'd like to skip or include specific items based on a regex or something against the names?
That seems reasonable to do but just wanted to make sure I understood. For now you can put a RouteOnAttribute processor after Unpack and just route to throw away unbundled items you don't care about. You can create a property on that processor called 'stuff-i-dont-want' and the value would be something like ${filename:matches('*stuff-i-dont-want*')}. Thanks Joe On Sun, Oct 25, 2015 at 1:12 AM, Adam Lamar <adamond...@gmail.com> wrote: > Mark, > >> If I configured the command arguments as > "-n +2" (without the quotes and space between the two parts), the > command would result in a "tail -n2" behavior. > > If you look at the tooltip for the Command Arguments property in > ExecuteStreamCommand, you'll see that the arguments need to be delimited by > a semicolon. Maybe try "-n;+2" instead? I'm not sure the exact rules in > NiFi, but I've seen similar behavior with regard to spaces in libraries that > execute processes with command line arguments. > > There probably is a better way to process the CSV, but I'm afraid someone > else will need to comment on that. > >> Seems like it will only unzip the > whole zip file and provide me index numbers for each file unpacked. > > A quick look at the UnpackContent source [1] suggests that there is no way > to filter the filenames inside the zipfile prior to extraction. I agree that > would be a useful feature. Maybe one of the NiFi devs will comment on the > possibility of including it as a feature in the future. > > Cheers, > Adam > > > [1] > https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/UnpackContent.java#L304 > > > > On 10/24/15 9:08 PM, Mark Petronic wrote: >> >> Just starting to use Nifi and built a flow that implements the following: >> >> unzip -p my.zip *LMTD* | tail -n +2 | gzip --fast | hdfs dfs -put - >> /some/hdfs/file >> >> I used the following processor flow: >> >> ExecuteProcess(unzip -p) -> ExecuteStreamCommand(tail -n +2) -> >> CompressContent(gzip) -> PutHDFS >> >> Couple questions/observations: >> >> 1. I got hung up for awhile on the ExecuteStreamCommand(tail -n +2) >> part. I need that to strip the header line off of CSV files. I did not >> see a simple way using a specific processor to strip off the first >> line of a flow file. Is there a better way? But, I did notice a very >> odd behavior of this command. If I configured the command arguments as >> "-n +2" (without the quotes and space between the two parts), the >> command would result in a "tail -n2" behavior. So, instead of giving >> me all EXCEPT the first line, I only got the last 2 lines. However, >> using "-n+2" (without the quotes and REMOVING the space) it worked as >> expected. I believe with is confusing to the user. Both forms work >> perfectly from the bash command line but only one works in Nifi? >> Anyone care to comment on this? Should there be an enhancement to >> remove this sort of inconsistent behavior? >> >> 2. Regarding my need to unzip ONLY one specific file from the zip >> files (the one that matches *LMTD*), I did not see a way to do that >> using the UnpackContent processor. Seems like it will only unzip the >> whole zip file and provide me index numbers for each file unpacked. >> This would be quite inefficient in my case because there are a number >> of large files inside the zip file and I only need one. So, seems like >> I am doing this the preferred way but, being new to Nifi, just wanted >> to see if there are any other ideas on how to do this? >> >> Thanks in advance for thoughts on this > >