On 3/25/20 1:11 PM, Jan Kiszka wrote: > On 25.03.20 16:00, Tom Rini wrote: >> On Wed, Mar 25, 2020 at 07:32:30AM +0100, Jan Kiszka wrote: >>> On 20.03.20 19:21, Tom Rini wrote: >>>> On Mon, Mar 16, 2020 at 08:09:53PM +0100, Jan Kiszka wrote: >>>>> Hi all, >>>>> >>>>> => ls mmc 0:1 /usr/lib/linux-image-4.9.11-1.3.0-dirty >>>>> CACHE: Misaligned operation at range [bdfff998, bdfffd98] >>>>> CACHE: Misaligned operation at range [bdfff998, bdfffd98] >>>>> CACHE: Misaligned operation at range [bdfff998, bdfffd98] >>>>> CACHE: Misaligned operation at range [bdfff998, bdfffd98] >>>>> invalid extent block >>>>> >>>>> I'm using master (50be9f0e1ccc) on the MCIMX7SABRE, defconfig. >>>>> >>>>> What could this be? The filesystem is fine from Linux POV. >>>> >>>> Use tune2fs -l and see if there's any new'ish features enabled that we >>>> need some sort of check-and-reject for would be my first guess. >>>> >>> >>> Here are the reported feature flags: >>> >>> has_journal ext_attr resize_inode dir_index filetype extent 64bit >>> flex_bg >>> sparse_super large_file huge_file dir_nlink extra_isize metadata_csum >> >> Of that, only metadata_csum means that you can't write to that image, >> but you're just trying to read and that should be fine. Can you go back >> in time a little and see if this problem persists or if it's been >> introduced of late? Or recreate it on other platforms/SoCs? Thanks! >> > > Bisected, regression of d5aee659f217 ("fs: ext4: cache extent data"). > Reverting this commit over master resolves the issue. > > Any idea what could be wrong? What I noticed is that the extent has a > zeroed magic when things go wrong, so maybe it is falsely considered to > be cached?
This is puzzling. I took another look at that patch and I don't see anything wrong. My guess would be: - Some unrelated memory corruption bug was exposed simply because this patch uses dynamic memory or stack slightly differently than before. - Something writes to the cached block, whereas the cache code assumes the buffer is read-only. The cache metadata exists on the stack and so only lasts for the duration of read_allocated_block() or ext4fs_read_file(), so there's no issue with re-using the cache across different devices, or persisting across an ext4 write operation or anything like that. Is this easy to reproduce; is there a small disk image that shows the problem?