Re: Content Repository Cleanup Schedule

2016-06-15 Thread Ricky Saltzer
Hey Mark - Yeah, agreed. I'm moving some of the 15+ day old files out just because this is kind of an emergency. Yeah, that's not exactly "normal", but I have a new pipeline that batches up errors and the FlowFile is basically 0-bytes with attribute information regarding the error so they can be

Re: Content Repository Cleanup Schedule

2016-06-15 Thread Mark Payne
Deleting the old files could certainly cause some problems. The weird thing is that it shows that you have 10,000+ FlowFiles, each of which is 0 bytes. Is that normal for your flow? Could you try running the following against your content repo: find . -size +1M find . | wc -l Curious how

Re: Content Repository Cleanup Schedule

2016-06-15 Thread Ricky Saltzer
Is it safe to manually remove some of the older files in the repository to avoid our disk from filling up? On Wed, Jun 15, 2016 at 4:55 PM, Ricky Saltzer wrote: > Just a reminder, I just today noticed the "archive.enabled" option was > false and changed it to true. > > $

Re: Content Repository Cleanup Schedule

2016-06-15 Thread Ricky Saltzer
Just a reminder, I just today noticed the "archive.enabled" option was false and changed it to true. $ find . -type f -ls | grep archive | wc -l 0 On Wed, Jun 15, 2016 at 4:53 PM, Mark Payne wrote: > OK, thanks. It doesn't appear that it believes there is anything to >

Re: Content Repository Cleanup Schedule

2016-06-15 Thread Mark Payne
OK, thanks. It doesn't appear that it believes there is anything to reclaim. Can you try going to your content repository and running: find . -type f -ls | grep archive Curious as to how much data it has archived. > On Jun 15, 2016, at 4:48 PM, Ricky Saltzer wrote: > > Oh

Re: Content Repository Cleanup Schedule

2016-06-15 Thread Ricky Saltzer
Oh sorry! Trying again [1] https://gist.githubusercontent.com/rickysaltzer/b00196a3881c052df9b38b418722cd02/raw/279a1bc8c60530426732eb7b653de1f3f74574e2/gistfile1.txt On Wed, Jun 15, 2016 at 4:38 PM, Ricky Saltzer wrote: > I should also mention, I just realized that our

Re: Content Repository Cleanup Schedule

2016-06-15 Thread Mark Payne
It is definitely best to try to keep those in sync, but that won't affect this, as the NCM isn't involved in the nodes' internal maintenance, etc. > On Jun 15, 2016, at 4:38 PM, Ricky Saltzer wrote: > > I should also mention, I just realized that our worker nodes are on

Re: Content Repository Cleanup Schedule

2016-06-15 Thread Mark Payne
Ricky - can't get to that URL, unfortunately. Tells me "This site can't be reached". May be easier to just copy & paste those particular threads here. Thanks -Mark > On Jun 15, 2016, at 4:36 PM, Ricky Saltzer wrote: > > Looks like the threads are parked and waiting [1] >

Re: Content Repository Cleanup Schedule

2016-06-15 Thread Ricky Saltzer
I should also mention, I just realized that our worker nodes are on 0.5.1, and for some reason I missed updating the master from 0.4.0. I'm sure that is not helping. On Wed, Jun 15, 2016 at 4:36 PM, Ricky Saltzer wrote: > Looks like the threads are parked and waiting [1] > >

Re: Content Repository Cleanup Schedule

2016-06-15 Thread Ricky Saltzer
Looks like the threads are parked and waiting [1] [1] http://github.mtv.cloudera.com/gist/ricky/7a5d89f2eeba58e2206d/raw/0e2b446ca049a8b5f27298c700ac709772d2847c/gistfile1.txt On Wed, Jun 15, 2016 at 4:33 PM, Joe Witt wrote: > thanks Ricky - then please take a look at

Re: Content Repository Cleanup Schedule

2016-06-15 Thread Ricky Saltzer
Hey Joe - The NiFi web UI currently reads as: Active threads: 3 Queued: 10,173 / 0 bytes Connected nodes: 2 / 2 Stats last refreshed: 13:31:28 PDT On Wed, Jun 15, 2016 at 4:29 PM, Joe Witt wrote: > And the data remains? If so that is an interesting data point I > think.

Re: Content Repository Cleanup Schedule

2016-06-15 Thread Mark Payne
I do agree. Unfortunately, I was a bit off, apparently, when I said "an issue a while back." It turns out that the ticket was 1726 [1], which was fixed in 0.6.1. To determine if this is what is biting you, could you do a thread-dump (bin/nifi.sh dump thread-dump.txt) and then look in that file

Re: Content Repository Cleanup Schedule

2016-06-15 Thread Joe Witt
And the data remains? If so that is an interesting data point I think. So to mark's point how much data do you have queued up actively in the flow then on that nodes? Number of objects you mention is 3273 files corresponding to 825GB in the content repository. Does NiFi see those 825GB worth

Re: Content Repository Cleanup Schedule

2016-06-15 Thread Ricky Saltzer
I have two nodes in clustered mode. I have the other node that isn't filling up as my primary. I've actually already restarted nifi on the node which has the large repository a few times. On Wed, Jun 15, 2016 at 4:22 PM, Joe Witt wrote: > Ricky, > > If you restart nifi and

Re: Content Repository Cleanup Schedule

2016-06-15 Thread Joe Witt
Ricky, If you restart nifi and then find that it cleans those things up I believe then it is related to the defects corrected in the 0.5/0.6 timeframe. Is restarting an option for you at this time. You agree mark? Thanks Joe On Wed, Jun 15, 2016 at 4:21 PM, Ricky Saltzer

Re: Content Repository Cleanup Schedule

2016-06-15 Thread Ricky Saltzer
Hey Mark - Thanks for the quick reply! This is our production system so it's unfortunately running 0.4.0. There are currently 3273 files, with some files dating back to May 18th. The content repository itself is 825G. Ricky On Wed, Jun 15, 2016 at 4:17 PM, Mark Payne

Re: Content Repository Cleanup Schedule

2016-06-15 Thread Mark Payne
Hey Ricky The reclaim process is pretty much continuous. What version of NiFi are you running? I know there was an issue with this a while back that caused it not to cleanup properly. Also, how much data & how many FlowFiles do you have queued up in your flow? Data won't be archived or

Content Repository Cleanup Schedule

2016-06-15 Thread Ricky Saltzer
Hey guys - I recently discovered I didn't have my "archive.enabled" option set to true after my disk filled up to 95%. I enabled it and then set the retention period to 12 hours and 50% (default values). However, after restarting NiFi, I am not seeing any disk space reclaimed. I'm curious, is