[Qemu-devel] profiling qemu

2012-02-14 Thread Artyom Tarasenko
On a x86_64 host a sparc64 emulation feels quite slower than sparc32. I tried to find out what can be optimized and here are some questions. First of all, it's not clear how to do it in the current git: build-prof $ ../qemu/configure --target-list=sparc64-softmmu --enable-gprof

Re: [Qemu-devel] profiling qemu

2012-02-14 Thread Lluís Vilanova
Artyom Tarasenko writes: [...] QEMU 1.0.50 monitor - type 'help' for more information (qemu) profile unknown command: 'profile' (qemu) info profile async time 38505498320 (38.505) qemu time 35947093161 (35.947) Is there a way to find out more? Command info jit also has some information

Re: [Qemu-devel] profiling qemu

2012-02-14 Thread Laurent Desnogues
2012/2/14 Lluís Vilanova vilan...@ac.upc.edu: Artyom Tarasenko writes: [...] Here it looks like compute_all_sub and compute_all_sub_xcc are good candidates for optimizing: together they take the same amount of time as cpu_sparc_exec. I guess both operations would be trivial in the x86_64

Re: [Qemu-devel] profiling qemu

2012-02-14 Thread Artyom Tarasenko
2012/2/14 Lluís Vilanova vilan...@ac.upc.edu: Artyom Tarasenko writes: [...] QEMU 1.0.50 monitor - type 'help' for more information (qemu) profile unknown command: 'profile' (qemu) info profile async time  38505498320 (38.505) qemu time   35947093161 (35.947) Is there a way to find out

Re: [Qemu-devel] profiling qemu

2012-02-14 Thread Laurent Desnogues
On Tue, Feb 14, 2012 at 4:15 PM, Artyom Tarasenko atar4q...@gmail.com wrote: 2012/2/14 Laurent Desnogues laurent.desnog...@gmail.com: 2012/2/14 Lluís Vilanova vilan...@ac.upc.edu: Artyom Tarasenko writes: [...] Here it looks like compute_all_sub and compute_all_sub_xcc are good candidates

Re: [Qemu-devel] profiling qemu

2012-02-14 Thread Lluís Vilanova
Artyom Tarasenko writes: [...] Here it looks like compute_all_sub and compute_all_sub_xcc are good candidates for optimizing: together they take the same amount of time as cpu_sparc_exec. I guess both operations would be trivial in the x86_64 assembler. What would be the best strategy to make

Re: [Qemu-devel] profiling qemu

2012-02-14 Thread Artyom Tarasenko
2012/2/14 Laurent Desnogues laurent.desnog...@gmail.com: 2012/2/14 Lluís Vilanova vilan...@ac.upc.edu: Artyom Tarasenko writes: [...] Here it looks like compute_all_sub and compute_all_sub_xcc are good candidates for optimizing: together they take the same amount of time as cpu_sparc_exec. I

Re: [Qemu-devel] profiling qemu

2012-02-14 Thread Blue Swirl
2012/2/14 Lluís Vilanova vilan...@ac.upc.edu: Artyom Tarasenko writes: [...] QEMU 1.0.50 monitor - type 'help' for more information (qemu) profile unknown command: 'profile' (qemu) info profile async time  38505498320 (38.505) qemu time   35947093161 (35.947) Is there a way to find out

Re: [Qemu-devel] Profiling Qemu for speed?

2005-04-18 Thread Daniel J Guinan
This conversation, below, is very interesting. It is precisely this part of QEMU that fascinates me and potentially holds the most promise for performance gains. I have even imagined using a genetic algorithm to discover optimal block-sizes and instruction re-ordering and whatnot. This could be

Re: [Qemu-devel] Profiling Qemu for speed?

2005-04-18 Thread Ian Rogers
There are some code sequences that are quite common, for example compare followed by branch. A threaded decoder tends to look like: ... // do some work load instruction mask out opcode address_of_decoder = load decoder_lookupopcode goto *address_of_decoder but if you say compare and branch are

Re: [Qemu-devel] Profiling Qemu for speed?

2005-04-18 Thread Daniel J Guinan
I was up until 3:00am studying Qemu, and I came to the conclusion that it doesn't make sense to try speeding up the output code, at least not yet. A peephole optimizer or hand-coded sequences made to handle common combinations of instructions would lead to the problems discussed here:

Re: [Qemu-devel] Profiling Qemu for speed?

2005-04-18 Thread Ian Rogers
Christian MICHON wrote: I did months ago gcc/FDO with a xp/lite installation as a repetitive task :) I did not improve the timings after all the effort. could this be down to the tables used to find the translators/generators? are they constant? is it possible to make them amenable to

Re: [Qemu-devel] Profiling Qemu for speed?

2005-04-17 Thread André Braga
The problem with table lookups (I'm assuming you're talking about function pointer vectors) is that they *destroy* spatial locality of reference that you could otherwise attain by having series of if-then-else instructions and some clever instruction prefetching mechanism on modern processors...

Re: [Qemu-devel] Profiling Qemu for speed?

2005-04-17 Thread Karl Magdsick
Ideally, we could force gcc to implement switch statements as indirect jumps with jump tables inline with the code. However, this may not be possible. I think Nathaniel was just saying that gcc is likely generating several hundred sequential if-else blocks for large switch statements. This