Jens,

I don't quite follow the EXT4 usage on top of VMFS but the point here
is you'll ultimately need to truly understand your underlying storage
system and what sorts of guarantees it is giving you.  If linux/the
jvm/nifi think it has a typical EXT4 type block storage system to work
with it can only be safe/operate within those constraints.  I have no
idea about what VMFS brings to the table or the settings for it.

The sync properties I shared previously might help force the issue of
ensuring a formal sync/flush cycle all the way through the disk has
occurred which we'd normally not do or need to do but again in some
cases offers a stronger guarantee in exchange for performance.

In any case...Mark's path for you here will help identify what we're
dealing with and we can go from there.

I am aware of significant usage of NiFi on VMWare configurations
without issue at high rates for many years so whatever it is here is
likely solvable.

Thanks

On Wed, Oct 27, 2021 at 7:28 AM Jens M. Kofoed <jmkofoed....@gmail.com> wrote:
>
> Hi Mark
>
> Thanks for the clarification. I will implement the script when I return to 
> the office at Monday next week ( November 1st).
> I don’t use NFS, but ext4. But I will implement the script so we can check if 
> it’s the case here. But I think the issue might be after the processors 
> writing content to the repository.
> I have a test flow running for more than 2 weeks without any errors. But this 
> flow only calculate hash and comparing.
>
> Two other flows both create errors. One flow use 
> PutSFTP->FetchSFTP->CryptographicHashContent->compares. The other flow use 
> MergeContent->UnpackContent->CryptographicHashContent->compares. The last 
> flow is totally inside nifi, excluding other network/server issues.
>
> In both cases the CryptographicHashContent is right after a process which 
> writes new content to the repository. But in one case a file in our 
> production flow did calculate a wrong hash 4 times with a 1 minutes delay 
> between each calculation. A few hours later I looped the file back and this 
> time it was OK.
> Just like the case in step 5 and 12 in the pdf file
>
> I will let you all know more later next week
>
> Kind regards
> Jens
>
>
>
> Den 27. okt. 2021 kl. 15.43 skrev Mark Payne <marka...@hotmail.com>:
>
> And the actual script:
>
>
> import org.apache.nifi.flowfile.FlowFile
>
> import java.util.stream.Collectors
>
> Map<String, String> getPreviousHistogram(final FlowFile flowFile) {
>     final Map<String, String> histogram = 
> flowFile.getAttributes().entrySet().stream()
>         .filter({ entry -> entry.getKey().startsWith("histogram.") })
>         .collect(Collectors.toMap({ entry -> entry.key}, { entry -> 
> entry.value }))
>     return histogram;
> }
>
> Map<String, String> createHistogram(final FlowFile flowFile, final 
> InputStream inStream) {
>     final Map<String, String> histogram = new HashMap<>();
>     final int[] distribution = new int[256];
>     Arrays.fill(distribution, 0);
>
>     long total = 0L;
>     final byte[] buffer = new byte[8192];
>     int len;
>     while ((len = inStream.read(buffer)) > 0) {
>         for (int i=0; i < len; i++) {
>             final int val = buffer[i];
>             distribution[val]++;
>             total++;
>         }
>     }
>
>     for (int i=0; i < 256; i++) {
>         histogram.put("histogram." + i, String.valueOf(distribution[i]));
>     }
>     histogram.put("histogram.totalBytes", String.valueOf(total));
>
>     return histogram;
> }
>
> void logHistogramDifferences(final Map<String, String> previous, final 
> Map<String, String> updated) {
>     final StringBuilder sb = new StringBuilder("There are differences in the 
> histogram\n");
>     final Map<String, String> sorted = new TreeMap<>(previous)
>     for (final Map.Entry<String, String> entry : sorted.entrySet()) {
>         final String key = entry.getKey();
>         final String previousValue = entry.getValue();
>         final String updatedValue = updated.get(entry.getKey())
>
>         if (!Objects.equals(previousValue, updatedValue)) {
>             sb.append("Byte Value: ").append(key).append(", Previous Count: 
> ").append(previousValue).append(", New Count: 
> ").append(updatedValue).append("\n");
>         }
>     }
>
>     log.error(sb.toString());
> }
>
>
> def flowFile = session.get()
> if (flowFile == null) {
>     return
> }
>
> final Map<String, String> previousHistogram = getPreviousHistogram(flowFile)
> Map<String, String> histogram = null;
>
> final InputStream inStream = session.read(flowFile);
> try {
>     histogram = createHistogram(flowFile, inStream);
> } finally {
>     inStream.close()
> }
>
> if (!previousHistogram.isEmpty()) {
>     if (previousHistogram.equals(histogram)) {
>         log.info("Histograms match")
>     } else {
>         logHistogramDifferences(previousHistogram, histogram)
>         session.transfer(flowFile, REL_FAILURE)
>         return;
>     }
> }
>
> flowFile = session.putAllAttributes(flowFile, histogram)
> session.transfer(flowFile, REL_SUCCESS)
>
>
>
>
>
>
> On Oct 27, 2021, at 9:43 AM, Mark Payne <marka...@hotmail.com> wrote:
>
> Jens,
>
> For a bit of background here, the reason that Joe and I have expressed 
> interest in NFS file systems is that the way the protocol works, it is 
> allowed to receive packets/chunks of the file out-of-order. So, what happens 
> is let’s say a 1 MB file is being written. The first 500 KB are received. 
> Then instead of the the 501st KB it receives the 503rd KB. What happens is 
> that the size of the file on the file system becomes 503 KB. But what about 
> 501 & 502? Well when you read the data, the file system just returns ASCII 
> NUL characters (byte 0) for those bytes. Once the NFS server receives those 
> bytes, it then goes back and fills in the proper bytes. So if you’re running 
> on NFS, it is possible for the contents of the file on the underlying file 
> system to change out from under you. It’s not clear to me what other types of 
> file system might do something similar.
>
> So, one thing that we can do is to find out whether or not the contents of 
> the underlying file have changed in some way, or if there’s something else 
> happening that could perhaps result in the hashes being wrong. I’ve put 
> together a script that should help diagnose this.
>
> Can you insert an ExecuteScript processor either just before or just after 
> your CryptographicHashContent processor? Doesn’t really matter whether it’s 
> run just before or just after. I’ll attach the script here. It’s a Groovy 
> Script so you should be able to use ExecuteScript with Script Engine = Groovy 
> and the following script as the Script Body. No other changes needed.
>
> The way the script works, it reads in the contents of the FlowFile, and then 
> it builds up a histogram of all byte values (0-255) that it sees in the 
> contents, and then adds that as attributes. So it adds attributes such as:
> histogram.0 = 280273
> histogram.1 = 2820
> histogram.2 = 48202
> histogram.3 = 3820
> …
> histogram.totalBytes = 1780928732
>
> It then checks if those attributes have already been added. If so, after 
> calculating that histogram, it checks against the previous values (in the 
> attributes). If they are the same, the FlowFile goes to ’success’. If they 
> are different, it logs an error indicating the before/after value for any 
> byte whose distribution was different, and it routes to failure.
>
> So, if for example, the first time through it sees 280,273 bytes with a value 
> of ‘0’, and the second times it only sees 12,001 then we know there were a 
> bunch of 0’s previously that were updated to be some other value. And it 
> includes the total number of bytes in case somehow we find that we’re reading 
> too many bytes or not enough bytes or something like that. This should help 
> narrow down what’s happening.
>
> Thanks
> -Mark
>
>
>
> On Oct 26, 2021, at 6:25 PM, Joe Witt <joe.w...@gmail.com> wrote:
>
> Jens
>
> Attached is the flow I was using (now running yours and this one).  Curious 
> if that one reproduces the issue for you as well.
>
> Thanks
>
> On Tue, Oct 26, 2021 at 3:09 PM Joe Witt <joe.w...@gmail.com> wrote:
>>
>> Jens
>>
>> I have your flow running and will keep it running for several days/week to 
>> see if I can reproduce.  Also of note please use your same test flow but use 
>> HashContent instead of crypto hash.  Curious if that matters for any 
>> reason...
>>
>> Still want to know more about your underlying storage system.
>>
>> You could also try updating nifi.properties and changing the following lines:
>> nifi.flowfile.repository.always.sync=true
>> nifi.content.repository.always.sync=true
>> nifi.provenance.repository.always.sync=true
>>
>> It will hurt performance but can be useful/necessary on certain storage 
>> subsystems.
>>
>> Thanks
>>
>> On Tue, Oct 26, 2021 at 12:05 PM Joe Witt <joe.w...@gmail.com> wrote:
>>>
>>> Ignore "For the scenario where you can replicate this please share the 
>>> flow.xml.gz for which it is reproducible."  I see the uploaded JSON
>>>
>>> On Tue, Oct 26, 2021 at 12:04 PM Joe Witt <joe.w...@gmail.com> wrote:
>>>>
>>>> Jens,
>>>>
>>>> We asked about the underlying storage system.  You replied with some info 
>>>> but not the specifics.  Do you know precisely what the underlying storage 
>>>> is and how it is presented to the operating system?  For instance is it 
>>>> NFS or something similar?
>>>>
>>>> I've setup a very similar flow at extremely high rates running for the 
>>>> past several days with no issue.  In my case though I know precisely what 
>>>> the config is and the disk setup is.  Didn't do anything special to be 
>>>> clear but still it is important to know.
>>>>
>>>> For the scenario where you can replicate this please share the flow.xml.gz 
>>>> for which it is reproducible.
>>>>
>>>> Thanks
>>>> Joe
>>>>
>>>> On Sun, Oct 24, 2021 at 9:53 PM Jens M. Kofoed <jmkofoed....@gmail.com> 
>>>> wrote:
>>>>>
>>>>> Dear Joe and Mark
>>>>>
>>>>> I have created a test flow without the sftp processors, which don't 
>>>>> create any errors. Therefore I created a new test flow where I use a 
>>>>> MergeContent and UnpackContent instead of the sftp processors. This keeps 
>>>>> all data internal in NIFI, but force NIFI to write and read new files 
>>>>> totally local.
>>>>> My flow have been running for 7 days and this morning there where 2 files 
>>>>> where the sha256 has been given another has value than original. I have 
>>>>> set this flow up in another nifi cluster only for testing, and the 
>>>>> cluster is not doing anything else. It is using Nifi 1.14.0
>>>>> So I can reproduce issues at different nifi clusters and versions (1.13.2 
>>>>> and 1.14.0) where the calculation of a hash on content can give different 
>>>>> outputs. Is doesn't make any sense, but it happens. In all my cases the 
>>>>> issues happens where the calculations of the hashcontent happens right 
>>>>> after NIFI writes the content to the content repository. I don't know if 
>>>>> there cut be some kind of delay writing the content 100% before the next 
>>>>> processors begin reading the content???
>>>>>
>>>>> Please see attach test flow, and the previous mail with a pdf showing the 
>>>>> lineage of a production file which also had issues. In the pdf check step 
>>>>> 5 and 12.
>>>>>
>>>>> Kind regards
>>>>> Jens M. Kofoed
>>>>>
>>>>>
>>>>> Den tor. 21. okt. 2021 kl. 08.28 skrev Jens M. Kofoed 
>>>>> <jmkofoed....@gmail.com>:
>>>>>>
>>>>>> Joe,
>>>>>>
>>>>>> To start from the last mail :-)
>>>>>> All the repositories has it's own disk, and I'm using ext4
>>>>>> /dev/VG_b/LV_b    /nifiRepo    ext4    defaults,noatime    0 0
>>>>>> /dev/VG_c/LV_c    /provRepo01    ext4    defaults,noatime    0 0
>>>>>> /dev/VG_d/LV_d    /contRepo01    ext4    defaults,noatime    0 0
>>>>>>
>>>>>> My test flow WITH sftp looks like this:
>>>>>> <image.png>
>>>>>> And this flow has produced 1 error within 3 days. After many many loops 
>>>>>> the file fails and went out via the "unmatched" output to  the disabled 
>>>>>> UpdateAttribute, which is doing nothing. Just for keeping the failed 
>>>>>> flowfile in a queue.  I enabled the UpdateAttribute and looped the file 
>>>>>> back to the CryptographicHashContent and now it calculated the hash 
>>>>>> correct again. But in this flow I have a FetchSFTP Process right before 
>>>>>> the Hashing.
>>>>>> Right now my flow is running without the 2 sftp processors, and the last 
>>>>>> 24hours there has been no errors.
>>>>>>
>>>>>> About the Lineage:
>>>>>> Are there a way to export all the lineage data? The export only generate 
>>>>>> a svg file.
>>>>>> This is only for the receiving nifi which is internally calculate 2 
>>>>>> different hashes on the same content with ca. 1 minutes delay. Attached 
>>>>>> is a pdf-document with the lineage, the flow and all the relevant 
>>>>>> Provenance information's for each step in the lineage.
>>>>>> The interesting steps are step 5 and 12.
>>>>>>
>>>>>> Can the issues be that data is not written 100% to disk between step 4 
>>>>>> and 5 in the flow?
>>>>>>
>>>>>> Kind regards
>>>>>> Jens M. Kofoed
>>>>>>
>>>>>>
>>>>>>
>>>>>> Den ons. 20. okt. 2021 kl. 23.49 skrev Joe Witt <joe.w...@gmail.com>:
>>>>>>>
>>>>>>> Jens,
>>>>>>>
>>>>>>> Also what type of file system/storage system are you running NiFi on
>>>>>>> in this case?  We'll need to know this for the NiFi
>>>>>>> content/flowfile/provenance repositories? Is it NFS?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> On Wed, Oct 20, 2021 at 11:14 AM Joe Witt <joe.w...@gmail.com> wrote:
>>>>>>> >
>>>>>>> > Jens,
>>>>>>> >
>>>>>>> > And to further narrow this down
>>>>>>> >
>>>>>>> > "I have a test flow, where a GenerateFlowfile has created 6x 1GB files
>>>>>>> > (2 files per node) and next process was a hashcontent before it run
>>>>>>> > into a test loop. Where files are uploaded via PutSFTP to a test
>>>>>>> > server, and downloaded again and recalculated the hash. I have had one
>>>>>>> > issue after 3 days of running."
>>>>>>> >
>>>>>>> > So to be clear with GenerateFlowFile making these files and then you
>>>>>>> > looping the content is wholly and fully exclusively within the control
>>>>>>> > of NiFI.  No Get/Fetch/Put-SFTP of any kind at all. In by looping the
>>>>>>> > same files over and over in nifi itself you can make this happen or
>>>>>>> > cannot?
>>>>>>> >
>>>>>>> > Thanks
>>>>>>> >
>>>>>>> > On Wed, Oct 20, 2021 at 11:08 AM Joe Witt <joe.w...@gmail.com> wrote:
>>>>>>> > >
>>>>>>> > > Jens,
>>>>>>> > >
>>>>>>> > > "After fetching a FlowFile-stream file and unpacked it back into 
>>>>>>> > > NiFi
>>>>>>> > > I calculate a sha256. 1 minutes later I recalculate the sha256 on 
>>>>>>> > > the
>>>>>>> > > exact same file. And got a new hash. That is what worry’s me.
>>>>>>> > > The fact that the same file can be recalculated and produce two
>>>>>>> > > different hashes, is very strange, but it happens. "
>>>>>>> > >
>>>>>>> > > Ok so to confirm you are saying that in each case this happens you 
>>>>>>> > > see
>>>>>>> > > it first compute the wrong hash, but then if you retry the same
>>>>>>> > > flowfile it then provides the correct hash?
>>>>>>> > >
>>>>>>> > > Can you please also show/share the lineage history for such a flow
>>>>>>> > > file then?  It should have events for the initial hash, second hash,
>>>>>>> > > the unpacking, trace to the original stream, etc...
>>>>>>> > >
>>>>>>> > > Thanks
>>>>>>> > >
>>>>>>> > > On Wed, Oct 20, 2021 at 11:00 AM Jens M. Kofoed 
>>>>>>> > > <jmkofoed....@gmail.com> wrote:
>>>>>>> > > >
>>>>>>> > > > Dear Mark and Joe
>>>>>>> > > >
>>>>>>> > > > I know my setup isn’t normal for many people. But if we only 
>>>>>>> > > > looks at my receive side, which the last mails is about. Every 
>>>>>>> > > > thing is happening at the same NIFI instance. It is the same 3 
>>>>>>> > > > node NIFI cluster.
>>>>>>> > > > After fetching a FlowFile-stream file and unpacked it back into 
>>>>>>> > > > NiFi I calculate a sha256. 1 minutes later I recalculate the 
>>>>>>> > > > sha256 on the exact same file. And got a new hash. That is what 
>>>>>>> > > > worry’s me.
>>>>>>> > > > The fact that the same file can be recalculated and produce two 
>>>>>>> > > > different hashes, is very strange, but it happens. Over the last 
>>>>>>> > > > 5 months it have only happen 35-40 times.
>>>>>>> > > >
>>>>>>> > > > I can understand if the file is not completely loaded and saved 
>>>>>>> > > > into the content repository before the hashing starts. But I 
>>>>>>> > > > believe that the unpack process don’t forward the flow file to 
>>>>>>> > > > the next process before it is 100% finish unpacking and saving 
>>>>>>> > > > the new content to the repository.
>>>>>>> > > >
>>>>>>> > > > I have a test flow, where a GenerateFlowfile has created 6x 1GB 
>>>>>>> > > > files (2 files per node) and next process was a hashcontent 
>>>>>>> > > > before it run into a test loop. Where files are uploaded via 
>>>>>>> > > > PutSFTP to a test server, and downloaded again and recalculated 
>>>>>>> > > > the hash. I have had one issue after 3 days of running.
>>>>>>> > > > Now the test flow is running without the Put/Fetch sftp 
>>>>>>> > > > processors.
>>>>>>> > > >
>>>>>>> > > > Another problem is that I can’t find any correlation to other 
>>>>>>> > > > events. Not within NIFI, nor the server itself or VMWare. If I 
>>>>>>> > > > just could find any other event which happens at the same time, I 
>>>>>>> > > > might be able to force some kind of event to trigger the issue.
>>>>>>> > > > I have tried to force VMware to migrate a NiFi node to another 
>>>>>>> > > > host. Forcing it to do a snapshot and deleting snapshots, but 
>>>>>>> > > > nothing can trigger and error.
>>>>>>> > > >
>>>>>>> > > > I know it will be very very difficult to reproduce. But I will 
>>>>>>> > > > setup multiple NiFi instances running different test flows to see 
>>>>>>> > > > if I can find any reason why it behaves as it does.
>>>>>>> > > >
>>>>>>> > > > Kind Regards
>>>>>>> > > > Jens M. Kofoed
>>>>>>> > > >
>>>>>>> > > > Den 20. okt. 2021 kl. 16.39 skrev Mark Payne 
>>>>>>> > > > <marka...@hotmail.com>:
>>>>>>> > > >
>>>>>>> > > > Jens,
>>>>>>> > > >
>>>>>>> > > > Thanks for sharing the images.
>>>>>>> > > >
>>>>>>> > > > I tried to setup a test to reproduce the issue. I’ve had it 
>>>>>>> > > > running for quite some time. Running through millions of 
>>>>>>> > > > iterations.
>>>>>>> > > >
>>>>>>> > > > I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to 
>>>>>>> > > > the tune of hundreds of MB). I’ve been unable to reproduce an 
>>>>>>> > > > issue after millions of iterations.
>>>>>>> > > >
>>>>>>> > > > So far I cannot replicate. And since you’re pulling the data via 
>>>>>>> > > > SFTP and then unpacking, which preserves all original attributes 
>>>>>>> > > > from a different system, this can easily become confusing.
>>>>>>> > > >
>>>>>>> > > > Recommend trying to reproduce with SFTP-related processors out of 
>>>>>>> > > > the picture, as Joe is mentioning. Either using GetFile/FetchFile 
>>>>>>> > > > or GenerateFlowFile. Then immediately use 
>>>>>>> > > > CryptographicHashContent to generate an ‘initial hash’, copy that 
>>>>>>> > > > value to another attribute, and then loop, generating the hash 
>>>>>>> > > > and comparing against the original one. I’ll attach a flow that 
>>>>>>> > > > does this, but not sure if the email server will strip out the 
>>>>>>> > > > attachment or not.
>>>>>>> > > >
>>>>>>> > > > This way we remove any possibility of actual corruption between 
>>>>>>> > > > the two nifi instances. If we can still see corruption / 
>>>>>>> > > > different hashes within a single nifi instance, then it certainly 
>>>>>>> > > > warrants further investigation but i can’t see any issues so far.
>>>>>>> > > >
>>>>>>> > > > Thanks
>>>>>>> > > > -Mark
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > > On Oct 20, 2021, at 10:21 AM, Joe Witt <joe.w...@gmail.com> wrote:
>>>>>>> > > >
>>>>>>> > > > Jens
>>>>>>> > > >
>>>>>>> > > > Actually is this current loop test contained within a single nifi 
>>>>>>> > > > and there you see corruption happen?
>>>>>>> > > >
>>>>>>> > > > Joe
>>>>>>> > > >
>>>>>>> > > > On Wed, Oct 20, 2021 at 7:14 AM Joe Witt <joe.w...@gmail.com> 
>>>>>>> > > > wrote:
>>>>>>> > > >
>>>>>>> > > > Jens,
>>>>>>> > > >
>>>>>>> > > > You have a very involved setup including other systems (non 
>>>>>>> > > > NiFi).  Have you removed those systems from the equation so you 
>>>>>>> > > > have more evidence to support your expectation that NiFi is doing 
>>>>>>> > > > something other than you expect?
>>>>>>> > > >
>>>>>>> > > > Joe
>>>>>>> > > >
>>>>>>> > > > On Wed, Oct 20, 2021 at 7:10 AM Jens M. Kofoed 
>>>>>>> > > > <jmkofoed....@gmail.com> wrote:
>>>>>>> > > >
>>>>>>> > > > Hi
>>>>>>> > > >
>>>>>>> > > > Today I have another file which have been running through the 
>>>>>>> > > > retry loop one time. To test the processors and the algorithm I 
>>>>>>> > > > added the HashContent processor and also added hashing by SHA-1.
>>>>>>> > > > I file have been going through the system, and both the SHA-1 and 
>>>>>>> > > > SHA-256 are both different than expected. with a 1 minutes delay 
>>>>>>> > > > the file is going back into the hashing content flow and this 
>>>>>>> > > > time it calculates both hashes fine.
>>>>>>> > > >
>>>>>>> > > > I don't believe that the hashing is buggy, but something is very 
>>>>>>> > > > very strange. What can influence the processors/algorithm to 
>>>>>>> > > > calculate a different hash???
>>>>>>> > > > All the input/output claim information is exactly the same. It is 
>>>>>>> > > > the same flow/content file going in a loop. It happens on all 3 
>>>>>>> > > > nodes.
>>>>>>> > > >
>>>>>>> > > > Any suggestions for where to dig ?
>>>>>>> > > >
>>>>>>> > > > Regards
>>>>>>> > > > Jens M. Kofoed
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > > Den ons. 20. okt. 2021 kl. 06.34 skrev Jens M. Kofoed 
>>>>>>> > > > <jmkofoed....@gmail.com>:
>>>>>>> > > >
>>>>>>> > > > Hi Mark
>>>>>>> > > >
>>>>>>> > > > Thanks for replaying and the suggestion to look at the content 
>>>>>>> > > > Claim.
>>>>>>> > > > These 3 pictures is from the first attempt:
>>>>>>> > > > <image.png>   <image.png>   <image.png>
>>>>>>> > > >
>>>>>>> > > > Yesterday I realized that the content was still in the archive, 
>>>>>>> > > > so I could Replay the file.
>>>>>>> > > > <image.png>
>>>>>>> > > > So here are the same pictures but for the replay and as you can 
>>>>>>> > > > see the Identifier, offset and Size are all the same.
>>>>>>> > > > <image.png>   <image.png>   <image.png>
>>>>>>> > > >
>>>>>>> > > > In my flow if the hash does not match my original first 
>>>>>>> > > > calculated hash, it goes into a retry loop. Here are the pictures 
>>>>>>> > > > for the 4th time the file went through:
>>>>>>> > > > <image.png>   <image.png>   <image.png>
>>>>>>> > > > Here the content Claim is all the same.
>>>>>>> > > >
>>>>>>> > > > It is very rare that we see these issues <1 : 1.000.000 files and 
>>>>>>> > > > only with large files. Only once have I seen the error with a 
>>>>>>> > > > 110MB file, the other times the files size are above 800MB.
>>>>>>> > > > This time it was a Nifi-Flowstream v3 file, which has been 
>>>>>>> > > > exported from one system and imported in another. But while the 
>>>>>>> > > > file has been imported it is the same file inside NIFI and it 
>>>>>>> > > > stays at the same node. Going through the same loop of processors 
>>>>>>> > > > multiple times and in the end the CryptographicHashContent 
>>>>>>> > > > calculate a different SHA256 than it did earlier. This should not 
>>>>>>> > > > be possible!!! And that is what concern my the most.
>>>>>>> > > > What can influence the same processor to calculate 2 different 
>>>>>>> > > > sha256 on the exact same content???
>>>>>>> > > >
>>>>>>> > > > Regards
>>>>>>> > > > Jens M. Kofoed
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > > Den tir. 19. okt. 2021 kl. 16.51 skrev Mark Payne 
>>>>>>> > > > <marka...@hotmail.com>:
>>>>>>> > > >
>>>>>>> > > > Jens,
>>>>>>> > > >
>>>>>>> > > > In the two provenance events - one showing a hash of dd4cc… and 
>>>>>>> > > > the other showing f6f0….
>>>>>>> > > > If you go to the Content tab, do they both show the same Content 
>>>>>>> > > > Claim? I.e., do the Input Claim / Output Claim show the same 
>>>>>>> > > > values for Container, Section, Identifier, Offset, and Size?
>>>>>>> > > >
>>>>>>> > > > Thanks
>>>>>>> > > > -Mark
>>>>>>> > > >
>>>>>>> > > > On Oct 19, 2021, at 1:22 AM, Jens M. Kofoed 
>>>>>>> > > > <jmkofoed....@gmail.com> wrote:
>>>>>>> > > >
>>>>>>> > > > Dear NIFI Users
>>>>>>> > > >
>>>>>>> > > > I have posted this mail in the developers mailing list and just 
>>>>>>> > > > want to inform all of our about a very odd behavior we are facing.
>>>>>>> > > > The background:
>>>>>>> > > > We have data going between 2 different NIFI systems which has no 
>>>>>>> > > > direct network access to each other. Therefore we calculate a 
>>>>>>> > > > SHA256 hash value of the content at system 1, before the flowfile 
>>>>>>> > > > and data are combined and saved as a "flowfile-stream-v3" pkg 
>>>>>>> > > > file. The file is then transported to system 2, where the pkg 
>>>>>>> > > > file is unpacked and the flow can continue. To be sure about file 
>>>>>>> > > > integrity we calculate a new sha256 at system 2. But sometimes we 
>>>>>>> > > > see that the sha256 gets another value, which might suggest the 
>>>>>>> > > > file was corrupted. But recalculating the sha256 again gives a 
>>>>>>> > > > new hash value.
>>>>>>> > > >
>>>>>>> > > > ----
>>>>>>> > > >
>>>>>>> > > > Tonight I had yet another file which didn't match the expected 
>>>>>>> > > > sha256 hash value. The content is a 1.7GB file and the Event 
>>>>>>> > > > Duration was "00:00:17.539" to calculate the hash.
>>>>>>> > > > I have created a Retry loop, where the file will go to a Wait 
>>>>>>> > > > process for delaying the file 1 minute and going back to the 
>>>>>>> > > > CryptographicHashContent for a new calculation. After 3 retries 
>>>>>>> > > > the file goes to the retries_exceeded and goes to a disabled 
>>>>>>> > > > process just to be in a queue so I manually can look at it. This 
>>>>>>> > > > morning I rerouted the file from my retries_exceeded queue back 
>>>>>>> > > > to the CryptographicHashContent for a new calculation and this 
>>>>>>> > > > time it calculated the correct hash value.
>>>>>>> > > >
>>>>>>> > > > THIS CAN'T BE TRUE :-( :-( But it is. - Something very very 
>>>>>>> > > > strange is happening.
>>>>>>> > > > <image.png>
>>>>>>> > > >
>>>>>>> > > > We are running NiFi 1.13.2 in a 3 node cluster at Ubuntu 20.04.02 
>>>>>>> > > > with openjdk version "1.8.0_292", OpenJDK Runtime Environment 
>>>>>>> > > > (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit 
>>>>>>> > > > Server VM (build 25.292-b10, mixed mode). Each server is a VM 
>>>>>>> > > > with 4 CPU, 8GB Ram on VMware ESXi, 7.0.2. Each NIFI node is 
>>>>>>> > > > running at different vm physical hosts.
>>>>>>> > > > I have inspected different logs to see if I can find any 
>>>>>>> > > > correlation what happened at the same time as the file is going 
>>>>>>> > > > through my loop, but there are no event/task at that exact time.
>>>>>>> > > >
>>>>>>> > > > System 1:
>>>>>>> > > > At 10/19/2021 00:15:11.247 CEST my file is going through a 
>>>>>>> > > > CryptographicHashContent: SHA256 value: 
>>>>>>> > > > dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
>>>>>>> > > > The file is exported as a "FlowFile Stream, v3" to System 2
>>>>>>> > > >
>>>>>>> > > > SYSTEM 2:
>>>>>>> > > > At 10/19/2021 00:18:10.528 CEST the file is going through a 
>>>>>>> > > > CryptographicHashContent: SHA256 value: 
>>>>>>> > > > f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>>>>> > > > <image.png>
>>>>>>> > > > At 10/19/2021 00:19:08.996 CEST the file is going through the 
>>>>>>> > > > same CryptographicHashContent at system 2: SHA256 value: 
>>>>>>> > > > f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>>>>> > > > At 10/19/2021 00:20:04.376 CEST the file is going through the 
>>>>>>> > > > same a CryptographicHashContent at system 2: SHA256 value: 
>>>>>>> > > > f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>>>>> > > > At 10/19/2021 00:21:01.711 CEST the file is going through the 
>>>>>>> > > > same a CryptographicHashContent at system 2: SHA256 value: 
>>>>>>> > > > f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>>>>> > > >
>>>>>>> > > > At 10/19/2021 06:07:43.376 CEST the file is going through the 
>>>>>>> > > > same a CryptographicHashContent at system 2: SHA256 value: 
>>>>>>> > > > dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
>>>>>>> > > > <image.png>
>>>>>>> > > >
>>>>>>> > > > How on earth can this happen???
>>>>>>> > > >
>>>>>>> > > > Kind Regards
>>>>>>> > > > Jens M. Kofoed
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > > <Repro.json>
>
> <Try_to_recreate_Jens_Challenge.json>
>
>
>

Reply via email to