On a x86_64 host a sparc64 emulation feels quite slower than sparc32.
I tried to find out what can be optimized and here are some questions.
First of all, it's not clear how to do it in the current git:
build-prof $ ../qemu/configure --target-list=sparc64-softmmu
--enable-gprof
Artyom Tarasenko writes:
[...]
QEMU 1.0.50 monitor - type 'help' for more information
(qemu) profile
unknown command: 'profile'
(qemu) info profile
async time 38505498320 (38.505)
qemu time 35947093161 (35.947)
Is there a way to find out more?
Command info jit also has some information
2012/2/14 Lluís Vilanova vilan...@ac.upc.edu:
Artyom Tarasenko writes:
[...]
Here it looks like compute_all_sub and compute_all_sub_xcc are
good candidates for optimizing: together they take the same amount of
time as cpu_sparc_exec. I guess both operations would be trivial in
the x86_64
2012/2/14 Lluís Vilanova vilan...@ac.upc.edu:
Artyom Tarasenko writes:
[...]
QEMU 1.0.50 monitor - type 'help' for more information
(qemu) profile
unknown command: 'profile'
(qemu) info profile
async time 38505498320 (38.505)
qemu time 35947093161 (35.947)
Is there a way to find out
On Tue, Feb 14, 2012 at 4:15 PM, Artyom Tarasenko atar4q...@gmail.com wrote:
2012/2/14 Laurent Desnogues laurent.desnog...@gmail.com:
2012/2/14 Lluís Vilanova vilan...@ac.upc.edu:
Artyom Tarasenko writes:
[...]
Here it looks like compute_all_sub and compute_all_sub_xcc are
good candidates
Artyom Tarasenko writes:
[...]
Here it looks like compute_all_sub and compute_all_sub_xcc are
good candidates for optimizing: together they take the same amount of
time as cpu_sparc_exec. I guess both operations would be trivial in
the x86_64 assembler. What would be the best strategy to make
2012/2/14 Laurent Desnogues laurent.desnog...@gmail.com:
2012/2/14 Lluís Vilanova vilan...@ac.upc.edu:
Artyom Tarasenko writes:
[...]
Here it looks like compute_all_sub and compute_all_sub_xcc are
good candidates for optimizing: together they take the same amount of
time as cpu_sparc_exec. I
2012/2/14 Lluís Vilanova vilan...@ac.upc.edu:
Artyom Tarasenko writes:
[...]
QEMU 1.0.50 monitor - type 'help' for more information
(qemu) profile
unknown command: 'profile'
(qemu) info profile
async time 38505498320 (38.505)
qemu time 35947093161 (35.947)
Is there a way to find out
This conversation, below, is very interesting. It is precisely this
part of QEMU that fascinates me and potentially holds the most promise
for performance gains. I have even imagined using a genetic algorithm
to discover optimal block-sizes and instruction re-ordering and
whatnot. This could be
There are some code sequences that are quite common, for example compare
followed by branch. A threaded decoder tends to look like:
... // do some work
load instruction
mask out opcode
address_of_decoder = load decoder_lookupopcode
goto *address_of_decoder
but if you say compare and branch are
I was up until 3:00am studying Qemu, and I came to the conclusion
that
it doesn't make sense to try speeding up the output code, at least
not
yet. A peephole optimizer or hand-coded sequences made to handle
common
combinations of instructions would lead to the problems discussed
here:
Christian MICHON wrote:
I did months ago gcc/FDO with a xp/lite installation as a repetitive task :)
I did not improve the timings after all the effort.
could this be down to the tables used to find the
translators/generators? are they constant? is it possible to make them
amenable to
The problem with table lookups (I'm assuming you're talking about
function pointer vectors) is that they *destroy* spatial locality of
reference that you could otherwise attain by having series of
if-then-else instructions and some clever instruction prefetching
mechanism on modern processors...
Ideally, we could force gcc to implement switch statements as indirect
jumps with jump tables inline with the code. However, this may not be
possible.
I think Nathaniel was just saying that gcc is likely generating
several hundred sequential if-else blocks for large switch statements.
This
14 matches
Mail list logo