Hey Jason,

Thanks for reaching out. That is definitely odd and not something that I’ve 
seen or heard about before.

Are you certain that the data is not being corrupted upstream of the processor? 
I ask because the code for the processor that handles writing out the content 
is pretty straight forward and hasn’t been modified in over 3 years, so I would 
expect to see it happen often if it were a bug in the MergeContent processor 
itself. Any chance that you can create a flow template/sample data that 
recreates the issue? Anything particularly unique about your flow?

Thanks
-Mark


> On Jun 9, 2020, at 6:47 PM, Jason Iannone <bread...@gmail.com> wrote:
> 
> Hi all,
> 
> Within Nifi 1.10.0 we're seeing unexpected behavior with mergecontent. The 
> processor is being fed in many flowfiles with individual JSON records. The 
> records have various field types including a hex-encoded byte[]. We are not 
> trying to merge JSON records themselves but rather consolidate many flowfiles 
> into fewer flowfiles.
> 
> What we're seeing is that a random flowfile is split causing the merge file 
> to be invalid JSON. When running multiple bins we saw the flowfile split 
> across bins.
> 
> Example
> Flowfile 1: {"name": "1", "hexbytes": A10F15B11D14", timestamp: "123456789" }
> Flowfile 2:  {"name": "2", "hexbytes": A10F15D14B11", timestamp: "123456790" 
> } 
> Flowfile 3:  {"name": "3", "hexbytes": A10F15D14B11", timestamp: "123456790" 
> } 
> 
> Merged Result:
> {"name": "1", "hexbyters": A10F15B11D14", timestamp: "123456789" } 
> xbytes": A10F15D14B11", timestamp: "123456790" }  
> {"name": "3", "hexbytes": A10F15D14B11", timestamp: "123456790" } 
> {"name": "3", "h  
> 
> Mergecontent Configuration:
> Concurrent Tasks: 4
> Merge Strategy: Bin-Packing Algorithm
> Merge Format: Binary Concatenation
> Attribute Strategy: Keep Only Common Attributes
> Min. number of entries 1000
> Max number of entries: 20000
> Minimum group size: 10 KB
> Maximum number of bins: 5
> Header, Footer, and Demaractor are not set.
> 
> We then backed off the below to reduce min and max entries, bin to 1, and 
> thread to 1 and still see the same issue.
> 
> Any insights?
> 
> Thanks,
> Jason

Reply via email to