weirdness with linux loop devices and mounts

An investigation into the causes of data corruption when a block device is written to while also being mounted, and a request for help if anyone has Linux internals knowledge.

On Linux, "loop devices" are a type of block device that can have their backing data be a regular file or another device, as opposed to normal block devices that are backed by a physical device like a hard drive. The name "loop device" should not be confused with the word "loopback", which is only used in the context of networking, such as a "loopback interface".

Having a block device that's backed by a regular file can be very useful. It is not possible to mount a regular file as a filesystem, even if the bytes of the file are exactly the same as a disk, but it is possible to mount a loop device as a filesystem. This is frequently used for operations such as creating a filesystem with the mkfs family of utilities, or editing a disk image before using it on some physical media.

In my specific case that prompted this article, I was using a regular file as a disk image for QEMU, but also wanted to mount a partition in it as a filesystem. The process of doing so is fairly simple:

losetup --partscan /dev/loop0 qemu.img
mount --mkdir --read-write /dev/loop0p1 /mnt/qemu -o umask=0000

Those previous commands first create a loop device on /dev/loop0 that has a backing file of qemu.img, and then mounts the first partition on the device to the mount point /mnt/qemu, with 777 permissions.

the problems

Where my problems began was when I wanted to iterate faster, editing files on the mount point and then doing operations in QEMU that read from those files and wrote back to the disk. Edits done to the filesystem exposed by the mount were showing up properly in QEMU, but when QEMU was doing disk operations, they were not being reflected on the mount point. In fact I was frequently seeing data corruption when trying to edit the disk image in QEMU while it was mounted.

In hindsight, this is somewhat obvious. It is always advised to unmount any block devices before operating on them to avoid data corruption, and this is effectively the same scenario. But at the time, I was not thinking about that, since I assumed that the loop device was directly connected to the file. The TL;DR of this post then is as follows:

always unmount block devices when operating on the backing block device or data, and this includes loop devices, which are just a special type of block device.

However my journey to figure this out was not that simple, and in doing so I found some interesting information that I thought would be fun to share.

Observation 1: writes to the backing file are not reflected in the loop device

Once I noticed the discrepancy was that the mount point didnt seem to have the same data as the backing file, I first checked what the raw bytes of the file were. I found that qemu.img had the expected data, but /dev/loop0 had the old data. I found this quite surprising, as this violated the assumption I made that the loop device was simply a "block view" of the backing file or something similar. The kernel was clearly holding some sort of cached view of the block device as a whole in memory, and then writing it to the backing file when changes were made. Here I tried several options to losetup such as direct-io, but none made a difference. Nothing could cause the loop device to update when the file updated. This sort of put a halt to my plans, as I really relied on being able to write to the backing file at the same time. However, after sharing my discovery with a friend, she suggested to point QEMU at the loop device instead of the backing file. Unfortunately this resulted in

Observation 2: writes to a block device are not reflected in a mount point

This was slightly less surprising than Observation 1, as I knew that mounts were more complicated than simply viewing bytes of a file. I have implemented the FAT16 filesystem myself, and in doing so I know there's a lot of state involved and it would not be workable to have to re-fetch everything from disk every time any read or write happened.

Workarounds?

Unfortunately there do not seem to be any useful workarounds to these problems. The general issue is that a "higher level" view keeps its own state and it does not expect a "lower level" to get out of sync. If a device is mounted, it must only be accessed via the mount point. If there's a loop device backed by a file, it must only be accessed via the loop device. While you might think that the sync command would cause higher level views to update, that seems to not be the case. sync only synchronizes the higher level views down on to the lower level data, which will cause data corruption if the backing data is modified in an incompatible way.

The closest thing to a workaround I have found is unmounting the block device and removing the loop device before writing to the backing file, and then remounting. This is not helpful to me however, as this process causes any files that were open from the mount point to become invalid, meaning after every remount I have to re-open every file I cared about.

If anyone has any information about a way to forcibly synchronize a loop device from its backing file or a filesystem from its block device, please let me know, that information would be greatly appreciated.