Thank you matt for this response

Yeah it worked😊
On 20-Jun-2017 7:55 AM, "Matt Burgess" <mattyb...@apache.org> wrote:

Prabhu,

I'm no Python/Jython master by any means, so I'm sure there's a better
way to do this than what I came up with. Along the way I noticed some
things about the input data and Jython vs Python:

1) Your "for line in text[1:]:" is skipping the first line, I assume
in the "real" data there is a header?
2) The second row of data refers to a leap day (Feb 29) which did not
exist in 2015 so it throws an exception. I changed all the months to
03 and kept going
3) Your third row doesn't have any fractional seconds, is this on
purpose? I assumed so and tried to provide for that
4) Jython (and Python 2) don't support the %z directive in datetime
formats, and %Z refers to a String like a City or Country in that
timezone or the friendly name of the timezone, not the +-HHMM value.
Also in your data you include only the hour offset, not minutes

I came up with a fairly fragile script that seems to work given your input:

import datetime
import json
import java.io
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback

class PyStreamCallback(StreamCallback):
  logger = None
  def __init__(self, log):
        logger = log
        pass
  def process(self, inputStream, outputStream):
    text = IOUtils.readLines(inputStream, StandardCharsets.UTF_8)
    for line in text[1:]:
        cols = line.split(",")
        df = "%d-%m-%Y %H:%M:%S.%f"
        trunc_3 = True
        try:
           d2 = datetime.datetime.strptime(cols[3][:-3],df)
        except ValueError:
           df = "%d-%m-%Y %H:%M:%S"
           trunc_3 = False
           d2 = datetime.datetime.strptime(cols[3][:-3],df)
        if trunc_3:
           cols[3] = d2.strftime(df)[:-3]
        else:
           cols[3] = d2.strftime(df)
        outputStream.write(",".join(cols) + "\n")

flowFile = session.get()
if (flowFile != None):
  flowFile = session.write(flowFile,PyStreamCallback(log))
  flowFile = session.putAttribute(flowFile, "filename",
flowFile.getAttribute('filename'))
  session.transfer(flowFile, REL_SUCCESS)


Please let me know if I've misunderstood anything, and I will try to
fix/improve the script.

Regards,
Matt

On Mon, Jun 19, 2017 at 8:31 AM, prabhu Mahendran
<prabhuu161...@gmail.com> wrote:
> I'm having one csv which contains lakhs of rows and below is sample
lines..,
>
> 1,Ni,23,28-02-2015 12:22:33.2212-02
> 2,Fi,21,29-02-2015 12:22:34.3212-02
> 3,Us,33,30-03-2015 12:23:35-01
> 4,Uk,34,31-03-2015 12:24:36.332211-02
> I need to get the last column of csv data which is in wrong datetime
format.
> So I need to get default datetimeformat("YYYY-MM-DD hh:mm:ss[.nnn]") from
> last column of the data.
>
> I have tried the following script to get lines from it and write into flow
> file.
>
> import json
> import java.io
> from org.apache.commons.io import IOUtils
> from java.nio.charset import StandardCharsets
> from org.apache.nifi.processor.io import StreamCallback
>
> class PyStreamCallback(StreamCallback):
>   def __init__(self):
>         pass
>   def process(self, inputStream, outputStream):
>     text = IOUtils.readLines(inputStream, StandardCharsets.UTF_8)
>     for line in text[1:]:
>         outputStream.write(line + "\n")
>
> flowFile = session.get()
> if (flowFile != None):
>   flowFile = session.write(flowFile,PyStreamCallback())
>   flowFile = session.putAttribute(flowFile, "filename",
> flowFile.getAttribute('filename'))
>   session.transfer(flowFile, REL_SUCCESS)
> but I am not able to find a way to convert it like below output.
>
> 1,Ni,23,28-02-2015 12:22:33.221
> 2,Fi,21,29-02-2015 12:22:34.321
> 3,Us,33,30-03-2015 12:23:35
> 4,Uk,34,31-03-2015 12:24:36.332
> I have checked those requirement with my friend(google) and still not able
> to find solution.
>
> Can anyone guide me to convert those input data into my required output?

Reply via email to