Re: [Qemu-devel] Profiling Qemu for speed?

2005-04-18 Thread Daniel J Guinan
This conversation, below, is very interesting. It is precisely this part of QEMU that fascinates me and potentially holds the most promise for performance gains. I have even imagined using a genetic algorithm to discover optimal block-sizes and instruction re-ordering and whatnot. This could be

Re: [Qemu-devel] Profiling Qemu for speed?

2005-04-18 Thread Ian Rogers
There are some code sequences that are quite common, for example compare followed by branch. A threaded decoder tends to look like: ... // do some work load instruction mask out opcode address_of_decoder = load decoder_lookupopcode goto *address_of_decoder but if you say compare and branch are

Re: [Qemu-devel] Profiling Qemu for speed?

2005-04-18 Thread Daniel J Guinan
I was up until 3:00am studying Qemu, and I came to the conclusion that it doesn't make sense to try speeding up the output code, at least not yet. A peephole optimizer or hand-coded sequences made to handle common combinations of instructions would lead to the problems discussed here:

Re: [Qemu-devel] Profiling Qemu for speed?

2005-04-18 Thread Ian Rogers
Christian MICHON wrote: I did months ago gcc/FDO with a xp/lite installation as a repetitive task :) I did not improve the timings after all the effort. could this be down to the tables used to find the translators/generators? are they constant? is it possible to make them amenable to

Re: [Qemu-devel] Profiling Qemu for speed?

2005-04-17 Thread André Braga
The problem with table lookups (I'm assuming you're talking about function pointer vectors) is that they *destroy* spatial locality of reference that you could otherwise attain by having series of if-then-else instructions and some clever instruction prefetching mechanism on modern processors...

Re: [Qemu-devel] Profiling Qemu for speed?

2005-04-17 Thread Karl Magdsick
Ideally, we could force gcc to implement switch statements as indirect jumps with jump tables inline with the code. However, this may not be possible. I think Nathaniel was just saying that gcc is likely generating several hundred sequential if-else blocks for large switch statements. This