Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

2021-11-03 Thread Joe Witt
Jens, I think we're at a loss how to help you specifically then for your specific installation. We have attempted to recreate the scenario with no luck. We've offered suggestions on experiments which would help us narrow in but you don't think that will help. At this point we'll probably have

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

2021-11-03 Thread Jens M. Kofoed
Hi Mark All the files in my testflow are 1GB files. But it happens in my production flow with different file sizes. When these issues have happened, I have the flowfile routed to an updateAttribute process which is disabled. Just to keep the file in a queue. Enable the process and sent the

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

2021-11-03 Thread Mark Payne
So what I found interesting about the histogram output was that in each case, the input file was 1 GB. The number of bytes that differed between the ‘good’ and ‘bad’ hashes was something like 500-700 bytes whose values were different. But the values ranged significantly. There was no indication

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

2021-11-03 Thread Joe Witt
Jens, 184 hours (7.6 days) in and zero issues. Will need to turn this off soon but wanted to give a final update. Looks great. Given the information on your system there appears to be something we dont understand related to the virtual file system involved or something. Thanks On Tue, Nov 2,

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

2021-11-01 Thread Jens M. Kofoed
Hi Mark and Joe Yesterday morning I implemented Mark's script in my 2 testflows. One testflow using sftp the other MergeContent/UnpackContent. Both testflow are running at a test cluster with 3 nodes and NIFI 1.14.0 The 1st flow with sftp have had 1 file going into the failure queue after about

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

2021-10-31 Thread Joe Witt
Jen 118 hours in - still goood. Thanks On Fri, Oct 29, 2021 at 10:22 AM Joe Witt wrote: > > Jens > > Update from hour 67. Still lookin' good. > > Will advise. > > Thanks > > On Thu, Oct 28, 2021 at 8:08 AM Jens M. Kofoed wrote: > > > > Many many thanks  Joe for looking into this. My test

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

2021-10-29 Thread Joe Witt
Jens Update from hour 67. Still lookin' good. Will advise. Thanks On Thu, Oct 28, 2021 at 8:08 AM Jens M. Kofoed wrote: > > Many many thanks  Joe for looking into this. My test flow was running for 6 > days before the first error occurred > > Thanks > > > Den 28. okt. 2021 kl. 16.57 skrev

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

2021-10-28 Thread Jens M. Kofoed
Many many thanks  Joe for looking into this. My test flow was running for 6 days before the first error occurred Thanks > Den 28. okt. 2021 kl. 16.57 skrev Joe Witt : > > Jens, > > Am 40+ hours in running both your flow and mine to reproduce. So far > neither have shown any sign of trouble.

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

2021-10-28 Thread Joe Witt
Jens, Am 40+ hours in running both your flow and mine to reproduce. So far neither have shown any sign of trouble. Will keep running for another week or so if I can. Thanks On Wed, Oct 27, 2021 at 12:42 PM Jens M. Kofoed wrote: > > The Physical hosts with VMWare is using the vmfs but the vm

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

2021-10-27 Thread Joe Witt
Jens, I don't quite follow the EXT4 usage on top of VMFS but the point here is you'll ultimately need to truly understand your underlying storage system and what sorts of guarantees it is giving you. If linux/the jvm/nifi think it has a typical EXT4 type block storage system to work with it can

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

2021-10-27 Thread Jens M. Kofoed
Hi Mark Thanks for the clarification. I will implement the script when I return to the office at Monday next week ( November 1st). I don’t use NFS, but ext4. But I will implement the script so we can check if it’s the case here. But I think the issue might be after the processors writing

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

2021-10-27 Thread Mark Payne
And the actual script: import org.apache.nifi.flowfile.FlowFile import java.util.stream.Collectors Map getPreviousHistogram(final FlowFile flowFile) { final Map histogram = flowFile.getAttributes().entrySet().stream() .filter({ entry -> entry.getKey().startsWith("histogram.") })

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

2021-10-27 Thread Mark Payne
Jens, For a bit of background here, the reason that Joe and I have expressed interest in NFS file systems is that the way the protocol works, it is allowed to receive packets/chunks of the file out-of-order. So, what happens is let’s say a 1 MB file is being written. The first 500 KB are

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

2021-10-20 Thread Joe Witt
Jens, Also what type of file system/storage system are you running NiFi on in this case? We'll need to know this for the NiFi content/flowfile/provenance repositories? Is it NFS? Thanks On Wed, Oct 20, 2021 at 11:14 AM Joe Witt wrote: > > Jens, > > And to further narrow this down > > "I have

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

2021-10-20 Thread Joe Witt
Jens, And to further narrow this down "I have a test flow, where a GenerateFlowfile has created 6x 1GB files (2 files per node) and next process was a hashcontent before it run into a test loop. Where files are uploaded via PutSFTP to a test server, and downloaded again and recalculated the

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

2021-10-20 Thread Joe Witt
Jens, "After fetching a FlowFile-stream file and unpacked it back into NiFi I calculate a sha256. 1 minutes later I recalculate the sha256 on the exact same file. And got a new hash. That is what worry’s me. The fact that the same file can be recalculated and produce two different hashes, is very

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

2021-10-20 Thread Jens M. Kofoed
Dear Mark and Joe I know my setup isn’t normal for many people. But if we only looks at my receive side, which the last mails is about. Every thing is happening at the same NIFI instance. It is the same 3 node NIFI cluster. After fetching a FlowFile-stream file and unpacked it back into NiFi I

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

2021-10-20 Thread Mark Payne
Jens, Thanks for sharing the images. I tried to setup a test to reproduce the issue. I’ve had it running for quite some time. Running through millions of iterations. I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to the tune of hundreds of MB). I’ve been unable to reproduce an

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

2021-10-19 Thread Mark Payne
Jens, In the two provenance events - one showing a hash of dd4cc… and the other showing f6f0…. If you go to the Content tab, do they both show the same Content Claim? I.e., do the Input Claim / Output Claim show the same values for Container, Section, Identifier, Offset, and Size? Thanks