Re: Is there a NiFi Design Pattern for file (SFTP) Synchronization?

2016-06-27 Thread Huagen peng
Michael, Let me make an example to understand your requirement. 1. Let's start with no files on the SFTP server, then file A.txt lands, with a timestamp 1467042217000. 2. When ListSFTP runs at timestamp 1467042218000, it finds A.txt, and saves the timestamp 1467042218000. The ensuing FetchSFTP

Escape * and new line character

2016-06-21 Thread Huagen peng
Hi, I need help on escaping characters in a couple of situations: 1. I use the ExecuteStreamCommand to output the content of all the *.txt files in a directory. I would like to use the cat command and I found myself not able to escape the *.txt in the argument. For now I end up calling a

Re: How to use ListenHTTP processor?

2016-06-10 Thread Huagen peng
gt; path > ./ > restlistener.remote.source.host > 127.0.0.1 > restlistener.remote.user.dn > none > User-Agent > curl/7.43.0 > uuid > 5f8d1704-e328-4456-a2eb-4730454ae64c > > > > Are you not seeing similar results? > > Thanks > -Mark > > > > &

How to use ListenHTTP processor?

2016-06-10 Thread Huagen peng
Hi, The ListenHTTP processor has a configuration “HTTP Headers to receive as Attributes (Regex)”. I tried many ways in vain to get some attributes in. Does anyone know how to get attributes directly in? I cannot find any example on it. I can post data in JSON format and use the

Re: How to effectively log the data flow in NiFi?

2016-06-10 Thread Huagen peng
t; I hope this helps! > > -Mark > > [1] > http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#data-provenance > > > On Jun 10, 2016, at 11:58 AM, Huagen peng <huagen.p...@gmail.com> wrote: > > Hi, > > I would like to learn about some better practice

How to effectively log the data flow in NiFi?

2016-06-10 Thread Huagen peng
Hi, I would like to learn about some better practices on logging. Here is what I would imagine in an ideal log for a flow like fetching files from SFTP, processing the files in certain way, and then saving the file to the disk. In the log, I would see that the SFTP step is triggered, with

Re: Controls order of execution in a queue?

2016-06-05 Thread Huagen peng
n. Theresa are a list of prioritizers to choose from. Simply drag > and drop from the list to apply the priority strategy you want for that > connection. You will find FIFO is one of the available options. > > On Jun 5, 2016 9:23 PM, "Huagen peng" <huagen.p...@gmail.com > &l

Controls order of execution in a queue?

2016-06-05 Thread Huagen peng
Hi, I notice that the order of execution in an incoming queue is not FIFO (first in first out). For example, I have a ExecuteStreamCommand processor, which at one point may have more than 20 flowflies waiting. It appears that the processor just randomly select a flowfile from the queue after

Excessive logging

2016-06-04 Thread Huagen peng
Hi, I got excessive logging from my NiFi instance. I suddenly see logging like the following going into nidi-bootstrap.log very fast, like 10g/hour, filling up my disk quickly. What is the cause of this? How to stop it going into the log? I tried to look into the logback.xml file and even

Re: Piping commands in ExecuteStreamCommand/ExecuteProcess

2016-06-03 Thread Huagen peng
pefully someone else will chime in. I’m still > new around here J > > Thanks, > Bryan Rosander > > From: Huagen peng <huagen.p...@gmail.com> > Reply-To: "users@nifi.apache.org" <users@nifi.apache.org> > Date: Friday, June 3, 2016 at 12:38 P

Piping commands in ExecuteStreamCommand/ExecuteProcess

2016-06-03 Thread Huagen peng
Hi, I was trying to get the ExecuteStreamCommand processor to execute one command and then pipe the result right to another Linux command in the same processor. This is how I try to configure the processor: Property Value Command Arguments

How does GetSFTP treat files being loaded?

2016-06-03 Thread Huagen peng
Hi, I need to get files from a SFTP server and then remove the files afterward. GetSFTP seems to be the processor to use. If a user uploads a large file, say 20G, to the server and the GetSFTP processor happens to be running in the middle of the uploading, what is the expected behavior.

Merge multiple flowfiles

2016-06-01 Thread Huagen peng
Hi, In the data flow I am dealing with now, there are multiple (up to 200) logs associated with a given hour. I need to process these fragment hourly logs and then concatenate them into a single file. The approach I am using now has an UpdateAttribute processor to set an arbitrary

OutOfMemoryError from ListSFTP

2016-06-01 Thread Huagen peng
Hi, I tried to use the ListSFTP processor on a server with tens of thousands of files and the processor tried for a longtime and emit an OutOfMemoryError. Can I fix this error by modifying the JVM settings in the conf/bootstrap.conf file? Thanks, Huagen

Re: Date operations

2016-05-31 Thread Huagen peng
toDate("/MM/dd > HH:mm:ss"):toNumber():lt(${now():toNumber():minus(8640))} > > To be changed with the correct format. > > Hope this helps. > > > 2016-05-31 22:09 GMT+02:00 Huagen peng <huagen.p...@gmail.com > <mailto:huagen.p...@gmail.com>&

Re: Wildcard character in the Command Argument field of the ExecuteStreamCommand processor

2016-05-31 Thread Huagen peng
> alopresto.apa...@gmail.com <mailto:alopresto.apa...@gmail.com> > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > >> On May 31, 2016, at 12:08 PM, Huagen peng <huagen.p...@gmail.com >> <mailto:huagen.p...@gmail.com>> wrote: >> >

Date operations

2016-05-31 Thread Huagen peng
Hi, Besides toDate(), now(), and format(), are there any other date operations/manipulations? I want to check if a string representing a datetime is 24 hours before now and also need to advance the datetime, e.g., by an hour. Is ExecuteScript my only option? Thanks, Huagen

Re: Wildcard character in the Command Argument field of the ExecuteStreamCommand processor

2016-05-31 Thread Huagen peng
se “${absolute.path}/${filename}” as the command arguments, > in which case you would not need to set the working directory > > Andy LoPresto > alopre...@apache.org <mailto:alopre...@apache.org> > alopresto.apa...@gmail.com <mailto:alopresto.apa...@gmail.com> > PGP Fingerprint:

Wildcard character in the Command Argument field of the ExecuteStreamCommand processor

2016-05-31 Thread Huagen peng
Hi, I would like to run a md5sum command on all the *.gz files under a certain directory. However, I keep getting this error: md5sum: stat '/tmp/transfer/16-05-22_00/*.gz': No such file or directory I tried quoting the * wild character, adding a . dot or / in front with no avail. Can I do