Thank you matt for this response Yeah it worked😊 On 20-Jun-2017 7:55 AM, "Matt Burgess" <mattyb...@apache.org> wrote:
Prabhu, I'm no Python/Jython master by any means, so I'm sure there's a better way to do this than what I came up with. Along the way I noticed some things about the input data and Jython vs Python: 1) Your "for line in text[1:]:" is skipping the first line, I assume in the "real" data there is a header? 2) The second row of data refers to a leap day (Feb 29) which did not exist in 2015 so it throws an exception. I changed all the months to 03 and kept going 3) Your third row doesn't have any fractional seconds, is this on purpose? I assumed so and tried to provide for that 4) Jython (and Python 2) don't support the %z directive in datetime formats, and %Z refers to a String like a City or Country in that timezone or the friendly name of the timezone, not the +-HHMM value. Also in your data you include only the hour offset, not minutes I came up with a fairly fragile script that seems to work given your input: import datetime import json import java.io from org.apache.commons.io import IOUtils from java.nio.charset import StandardCharsets from org.apache.nifi.processor.io import StreamCallback class PyStreamCallback(StreamCallback): logger = None def __init__(self, log): logger = log pass def process(self, inputStream, outputStream): text = IOUtils.readLines(inputStream, StandardCharsets.UTF_8) for line in text[1:]: cols = line.split(",") df = "%d-%m-%Y %H:%M:%S.%f" trunc_3 = True try: d2 = datetime.datetime.strptime(cols[3][:-3],df) except ValueError: df = "%d-%m-%Y %H:%M:%S" trunc_3 = False d2 = datetime.datetime.strptime(cols[3][:-3],df) if trunc_3: cols[3] = d2.strftime(df)[:-3] else: cols[3] = d2.strftime(df) outputStream.write(",".join(cols) + "\n") flowFile = session.get() if (flowFile != None): flowFile = session.write(flowFile,PyStreamCallback(log)) flowFile = session.putAttribute(flowFile, "filename", flowFile.getAttribute('filename')) session.transfer(flowFile, REL_SUCCESS) Please let me know if I've misunderstood anything, and I will try to fix/improve the script. Regards, Matt On Mon, Jun 19, 2017 at 8:31 AM, prabhu Mahendran <prabhuu161...@gmail.com> wrote: > I'm having one csv which contains lakhs of rows and below is sample lines.., > > 1,Ni,23,28-02-2015 12:22:33.2212-02 > 2,Fi,21,29-02-2015 12:22:34.3212-02 > 3,Us,33,30-03-2015 12:23:35-01 > 4,Uk,34,31-03-2015 12:24:36.332211-02 > I need to get the last column of csv data which is in wrong datetime format. > So I need to get default datetimeformat("YYYY-MM-DD hh:mm:ss[.nnn]") from > last column of the data. > > I have tried the following script to get lines from it and write into flow > file. > > import json > import java.io > from org.apache.commons.io import IOUtils > from java.nio.charset import StandardCharsets > from org.apache.nifi.processor.io import StreamCallback > > class PyStreamCallback(StreamCallback): > def __init__(self): > pass > def process(self, inputStream, outputStream): > text = IOUtils.readLines(inputStream, StandardCharsets.UTF_8) > for line in text[1:]: > outputStream.write(line + "\n") > > flowFile = session.get() > if (flowFile != None): > flowFile = session.write(flowFile,PyStreamCallback()) > flowFile = session.putAttribute(flowFile, "filename", > flowFile.getAttribute('filename')) > session.transfer(flowFile, REL_SUCCESS) > but I am not able to find a way to convert it like below output. > > 1,Ni,23,28-02-2015 12:22:33.221 > 2,Fi,21,29-02-2015 12:22:34.321 > 3,Us,33,30-03-2015 12:23:35 > 4,Uk,34,31-03-2015 12:24:36.332 > I have checked those requirement with my friend(google) and still not able > to find solution. > > Can anyone guide me to convert those input data into my required output?