Re: [Xen-devel] [PATCH 11/14] fuzz/x86_emulate: Make input more compact
On 08/25/2017 06:59 PM, Andrew Cooper wrote: > On 25/08/17 17:43, George Dunlap wrote: >> At the moment, AFL reckons that for any given input, 87% of it is >> completely irrelevant: that is, it can change it as much as it wants >> but have no impact on the result of the test; and yet it can't remove >> it. >> >> This is largely because we interpret the blob handed to us as a large >> struct, including CR values, MSR values, segment registers, and a full >> cpu_user_regs. >> >> Instead, modify our interpretation to have a "set state" stanza at the >> front. Begin by reading a byte; if it is lower than a certain >> threshold, set some state according to what byte it is, and repeat. >> Continue until the byte is above a certain threshold. >> >> This allows AFL to compact any given test case much smaller; to the >> point where now it reckons there is not a single byte of the test file >> which becomes irrelevant. Testing have shown that this option both >> allows AFL to reach coverage much faster, and to have a total coverage >> higher than with the old format. >> >> Make this an option (rather than a unilateral change) to enable >> side-by-side performance comparison of the old and new formats. >> >> Signed-off-by: George Dunlap> > I continue to think this is a bad idea. You are taking a genuine > problem and adding a complicated algorithm to try and fool alf, rather > than fixing the problem. > > The reason 87% of input is irrelevant is because it really is. The > input state is full of 64bit values being used for a one or two bits > which we ever look at. My take on it is this. The state struct is very large -- I think it's around 500 bytes without the FPU state, and over 1k with it. Nearly every bit of the state has *some* instruction that interacts with it. But in order for AFL to notice that, it has to find a combination such that the instruction and the state actually interact. For any given tuple, changing the vast majority of the state will have absolutely no effect -- meaning that AFL's fuzzing is almost entirely wasted. AFL will spend a huge amount of time fuzzing bits of the state struct that cannot possibly have an effect on the outcome, since the instuctions in the test case don't interact with those bits at all. With the "compact" input interpretation, once AFL finds a tuple that correspond, changing any of the three is pretty much guaranteed to have some effect; finding a set that correspond should be much easier, and in any case fuzzing it should be a lot faster. As such, I don't think I'm trying to "fool" AFL: I'm just giving it tools to search more effectively. > The solution to this problem is remove the irrelevant information from > fuzz_corpus. I already started doing this with the alf-fast work for > the Xen 4.9 release, but I've basically been doing security work ever > since and haven't had time to continue it. > > For the record, this hunk is how I intended to continue the work: > > diff --git a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c > b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c > index 74e8c85..dafe435 100644 > --- a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c > +++ b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c > @@ -24,7 +24,27 @@ > /* Layout of data expected as fuzzing input. */ > struct fuzz_corpus > { > -unsigned long cr[5]; > +/* %cr0 */ > +bool pe:1; > +bool mp:1; > +bool em:1; > +bool ts:1; > +bool pg:1; > + > +/* %cr4 */ > +bool vme:1; > +bool pvi:1; > +bool tsd:1; > +bool osfxsr:1; > +bool osxmmexcpt:1; > +bool umip:1; > +bool fsgsbase:1; > +bool osxsave:1; > + > +/* EFER */ > +bool sce:1; > +bool lme:1; I'm 98% sure that the handful of bits in cr0 and cr4 aren't the problem. AFL isnt' looking at *bits* in the cr0 register, it's looking at that as an interger, and I'm sure that it notices, "I add X to this and it changes stuff, so it must be used." The problem is these: > uint64_t msr[MSR_INDEX_MAX]; > struct cpu_user_regs regs; > struct segment_register segments[SEG_NUM]; And in particular this one: char fxsave[512] __attribute__((aligned(16))); Your proposal doesn't change the fact that for any given test case, the vast majority of state won't interact with the instructions it contains at all. I've still got the scripts and analysis stuff, so if you send me a patch you think is better, I'm happy to run it through and add it to the comparison. In any case, prediction based on expert intuition is good, but empirical evidence is better*. The graph I sent clearly shows that the 'compact' format, with unlimited number of instructions, generates far more map coverage in far less time than any other combination. Unless you can find some flaw in my methodology, or an alternate interpretation of the data, I think we should go with the
Re: [Xen-devel] [PATCH 11/14] fuzz/x86_emulate: Make input more compact
On 25/08/17 17:43, George Dunlap wrote: > At the moment, AFL reckons that for any given input, 87% of it is > completely irrelevant: that is, it can change it as much as it wants > but have no impact on the result of the test; and yet it can't remove > it. > > This is largely because we interpret the blob handed to us as a large > struct, including CR values, MSR values, segment registers, and a full > cpu_user_regs. > > Instead, modify our interpretation to have a "set state" stanza at the > front. Begin by reading a byte; if it is lower than a certain > threshold, set some state according to what byte it is, and repeat. > Continue until the byte is above a certain threshold. > > This allows AFL to compact any given test case much smaller; to the > point where now it reckons there is not a single byte of the test file > which becomes irrelevant. Testing have shown that this option both > allows AFL to reach coverage much faster, and to have a total coverage > higher than with the old format. > > Make this an option (rather than a unilateral change) to enable > side-by-side performance comparison of the old and new formats. > > Signed-off-by: George DunlapI continue to think this is a bad idea. You are taking a genuine problem and adding a complicated algorithm to try and fool alf, rather than fixing the problem. The reason 87% of input is irrelevant is because it really is. The input state is full of 64bit values being used for a one or two bits which we ever look at. The solution to this problem is remove the irrelevant information from fuzz_corpus. I already started doing this with the alf-fast work for the Xen 4.9 release, but I've basically been doing security work ever since and haven't had time to continue it. For the record, this hunk is how I intended to continue the work: diff --git a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c index 74e8c85..dafe435 100644 --- a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c +++ b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c @@ -24,7 +24,27 @@ /* Layout of data expected as fuzzing input. */ struct fuzz_corpus { -unsigned long cr[5]; +/* %cr0 */ +bool pe:1; +bool mp:1; +bool em:1; +bool ts:1; +bool pg:1; + +/* %cr4 */ +bool vme:1; +bool pvi:1; +bool tsd:1; +bool osfxsr:1; +bool osxmmexcpt:1; +bool umip:1; +bool fsgsbase:1; +bool osxsave:1; + +/* EFER */ +bool sce:1; +bool lme:1; + uint64_t msr[MSR_INDEX_MAX]; struct cpu_user_regs regs; struct segment_register segments[SEG_NUM]; @@ -50,6 +70,9 @@ struct fuzz_state /* Emulation ops, some of which are disabled based on corpus->options. */ struct x86_emulate_ops ops; + +unsigned long cr0, cr2, cr3, cr4, cr8; +uint64_t efer; }; /* Which drops loads of useless bits out of AFLs view. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH 11/14] fuzz/x86_emulate: Make input more compact
At the moment, AFL reckons that for any given input, 87% of it is completely irrelevant: that is, it can change it as much as it wants but have no impact on the result of the test; and yet it can't remove it. This is largely because we interpret the blob handed to us as a large struct, including CR values, MSR values, segment registers, and a full cpu_user_regs. Instead, modify our interpretation to have a "set state" stanza at the front. Begin by reading a byte; if it is lower than a certain threshold, set some state according to what byte it is, and repeat. Continue until the byte is above a certain threshold. This allows AFL to compact any given test case much smaller; to the point where now it reckons there is not a single byte of the test file which becomes irrelevant. Testing have shown that this option both allows AFL to reach coverage much faster, and to have a total coverage higher than with the old format. Make this an option (rather than a unilateral change) to enable side-by-side performance comparison of the old and new formats. Signed-off-by: George Dunlap--- I'll reply to this e-mail with a graph of some tests I ran. CC: Ian Jackson CC: Wei Liu CC: Andrew Cooper CC: Jan Beulich --- tools/fuzz/x86_instruction_emulator/afl-harness.c | 13 +++- tools/fuzz/x86_instruction_emulator/fuzz-emul.c | 94 --- 2 files changed, 97 insertions(+), 10 deletions(-) diff --git a/tools/fuzz/x86_instruction_emulator/afl-harness.c b/tools/fuzz/x86_instruction_emulator/afl-harness.c index 79f8aec653..12b3765dcc 100644 --- a/tools/fuzz/x86_instruction_emulator/afl-harness.c +++ b/tools/fuzz/x86_instruction_emulator/afl-harness.c @@ -4,6 +4,7 @@ #include #include #include +#include extern int LLVMFuzzerInitialize(int *argc, char ***argv); extern int LLVMFuzzerTestOneInput(const uint8_t *data_p, size_t size); @@ -12,6 +13,8 @@ extern unsigned int fuzz_minimal_input_size(void); #define INPUT_SIZE 4096 static uint8_t input[INPUT_SIZE]; +extern bool opt_compact; + int main(int argc, char **argv) { size_t size; @@ -22,13 +25,17 @@ int main(int argc, char **argv) setbuf(stdin, NULL); setbuf(stdout, NULL); +opt_compact = true; + while ( 1 ) { enum { OPT_MIN_SIZE, +OPT_COMPACT, }; static const struct option lopts[] = { { "min-input-size", no_argument, NULL, OPT_MIN_SIZE }, +{ "compact", required_argument, NULL, OPT_COMPACT }, { 0, 0, 0, 0 } }; int c = getopt_long_only(argc, argv, "", lopts, NULL); @@ -43,8 +50,12 @@ int main(int argc, char **argv) exit(0); break; +case OPT_COMPACT: +opt_compact = atoi(optarg); +break; + case '?': -printf("Usage: %s $FILE [$FILE...] | [--min-input-size]\n", argv[0]); +printf("Usage: %s [--compact=0|1] $FILE [$FILE...] | [--min-input-size]\n", argv[0]); exit(-1); break; diff --git a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c index 89d1714125..48b02f2bf6 100644 --- a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c +++ b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c @@ -53,6 +53,15 @@ struct fuzz_state }; #define DATA_OFFSET offsetof(struct fuzz_state, corpus) +bool opt_compact; + +unsigned int fuzz_minimal_input_size(void) +{ +if ( opt_compact ) +return sizeof(unsigned long) + 1; +else +return DATA_OFFSET + 1; +} static inline int davail(struct fuzz_state *s, size_t size) { @@ -647,9 +656,81 @@ static void setup_state(struct x86_emulate_ctxt *ctxt) { struct fuzz_state *s = ctxt->data; -/* Fuzz all of the state in one go */ -if (!dread(s, s, DATA_OFFSET)) -exit(-1); +if ( !opt_compact ) +{ +/* Fuzz all of the state in one go */ +if (!dread(s, s, DATA_OFFSET)) +exit(-1); +return; +} + +/* Modify only select bits of state */ + +/* Always read 'options' */ +if ( !dread(s, >options, sizeof(s->options)) ) +return; + +while(1) { +uint16_t offset; + +/* Read 16 bits to decide what bit of state to modify */ +if ( !dread(s, , sizeof(offset)) ) +return; + +/* + * Then decide if it's "pointing to" different bits of the + * state + */ + +/* cr[]? */ +if ( offset < 5 ) +{ +if ( !dread(s, s->cr + offset, sizeof(*s->cr)) ) +return; +printf("Setting CR %d to %lx\n", offset, s->cr[offset]); +continue; +} + +offset -= 5; + +/* msr[]? */ +if ( offset < MSR_INDEX_MAX ) +{ +if (