An investigation into the causes of data corruption when a block device is written to while also being mounted, and a request for help if anyone has Linux internals knowledge.
On Linux, "loop devices" are a type of block device that can have their backing data be a regular file or another device, as opposed to normal block devices that are backed by a physical device like a hard drive. The name "loop device" should not be confused with the word "loopback", which is only used in the context of networking, such as a "loopback interface".
Having a block device that's backed by a regular file can
be very useful. It is not possible to mount a regular file
as a filesystem, even if the bytes of the file are exactly
the same as a disk, but it is possible to mount a loop
device as a filesystem. This is frequently used for
operations such as creating a filesystem with the
mkfs
family of utilities, or editing a disk
image before using it on some physical media.
In my specific case that prompted this article, I was using a regular file as a disk image for QEMU, but also wanted to mount a partition in it as a filesystem. The process of doing so is fairly simple:
losetup --partscan /dev/loop0 qemu.img
mount --mkdir --read-write /dev/loop0p1 /mnt/qemu -o umask=0000
Those previous commands first create a loop device on
/dev/loop0
that has a backing file of
qemu.img
, and then mounts the first partition
on the device to the mount point /mnt/qemu
,
with 777 permissions.
Where my problems began was when I wanted to iterate faster, editing files on the mount point and then doing operations in QEMU that read from those files and wrote back to the disk. Edits done to the filesystem exposed by the mount were showing up properly in QEMU, but when QEMU was doing disk operations, they were not being reflected on the mount point. In fact I was frequently seeing data corruption when trying to edit the disk image in QEMU while it was mounted.
In hindsight, this is somewhat obvious. It is always advised to unmount any block devices before operating on them to avoid data corruption, and this is effectively the same scenario. But at the time, I was not thinking about that, since I assumed that the loop device was directly connected to the file. The TL;DR of this post then is as follows:
However my journey to figure this out was not that simple, and in doing so I found some interesting information that I thought would be fun to share.
Once I noticed the discrepancy was that the mount point
didnt seem to have the same data as the backing file, I
first checked what the raw bytes of the file were. I found
that qemu.img
had the expected data, but
/dev/loop0
had the old data. I found this quite
surprising, as this violated the assumption I made that the
loop device was simply a "block view" of the backing file or
something similar. The kernel was clearly holding some sort
of cached view of the block device as a whole in memory, and
then writing it to the backing file when changes were made.
Here I tried several options to losetup
such as
direct-io
, but none made a difference. Nothing
could cause the loop device to update when the file updated.
This sort of put a halt to my plans, as I really relied on
being able to write to the backing file at the same time.
However, after sharing my discovery with a friend, she
suggested to point QEMU at the loop device instead of the
backing file. Unfortunately this resulted in
This was slightly less surprising than Observation 1, as I knew that mounts were more complicated than simply viewing bytes of a file. I have implemented the FAT16 filesystem myself, and in doing so I know there's a lot of state involved and it would not be workable to have to re-fetch everything from disk every time any read or write happened.
Unfortunately there do not seem to be any useful
workarounds to these problems. The general issue is that a
"higher level" view keeps its own state and it does not
expect a "lower level" to get out of sync. If a device is
mounted, it must only be accessed via the mount point. If
there's a loop device backed by a file, it must only be
accessed via the loop device. While you might think that the
sync
command would cause higher level views to
update, that seems to not be the case. sync
only synchronizes the higher level views down on to the
lower level data, which will cause data corruption if the
backing data is modified in an incompatible way.
The closest thing to a workaround I have found is unmounting the block device and removing the loop device before writing to the backing file, and then remounting. This is not helpful to me however, as this process causes any files that were open from the mount point to become invalid, meaning after every remount I have to re-open every file I cared about.
If anyone has any information about a way to forcibly synchronize a loop device from its backing file or a filesystem from its block device, please let me know, that information would be greatly appreciated.