Hi all,
we identified some sub-optimal behaviour in the way linear store returns files 
to Empty File Pool (EFP) in one use case. I would like to get some feedback 
from the community if it would be valuable to implement some better way of 
returning the files to EFP.

Currently, the only way to return a journal file to EFP during broker runtime 
is a check after every 100th dequeue on that queue/journal: while the _oldest_ 
journal file has no valid enqueue record, it is returned to EFP and removed 
from the journal. This has one surprising consequence when purging a queue with 
many messages:

- you enqueue e.g. 1000099 messages to journal files numbered from 1 to let say 
100
- you dequeue all of them:
  - dequeueing a message means writing to the journal a dequeue record and 
discarding the enqueue record. This means, dequeueing 1000000 messages means 
creating let say 10 new journal files (numbered 101 to 110) filled by dequeue 
events only
  - every 100th dequeue event checks if the oldest journal file can be returned 
to EFP. Dequeueing millionth message sees 100th journal file still has 99 
remaining enqueues, so it does not return it to EFP. While the next 9 or 10 
journal files with just dequeue records are not checked at all
- at the end, we have an empty queue backed up by 10 journal files. While one 
file would be sufficient.

Note the above is _not_ a journal file leak, the store only postpones moving 
the files to EFP until next 100th dequeue is coming. So it only increases disk 
psace utilization, somehow.

Also note the use case assumes the journal gets many dequeue events in a row 
(with no or very few enqueues in between), and no enqueue+dequeue activity 
follows for a longer time (as in the scenario above, next dequeue would move 10 
journal files to EFP). In other cases, no (or at most one) journal file moving 
to EFP is postponed by some time.

It would be possible to implement some time-based trigger that will - for 
example - checks all journal files in all journals if they can be returned to 
EFP. The question is how much valuable it would be (compared to adding some 
complexity to the code). My own attitude is "disk space is cheap, don't 
implement it", but if somebody has some solid use case where such feature would 
be much appreciated, please respond.


Kind regards,
Pavel Moravec



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
For additional commands, e-mail: users-h...@qpid.apache.org

Reply via email to