Hi,

today I tried to use SSE operations. But I couldn't get what I want.

Here is a simple code:

program SSETestsSimple;

{$mode objfpc}{$H+}

{$ALIGN 16}

uses
  SysUtils,
  MMX;

type
  TSSESingle = array[0..3] of Single;

{$ASMMODE ATT}

procedure SSEAdd(const v1, v2: TSSESingle; out v_r: TSSESingle); // assembler;
begin
writeln(Format('@v1: %s, @v2: %s, @v_r: %s', [HexStr(Addr(v1)), HexStr(Addr(v2)), HexStr(Addr(v_r))]));
  asm
    movups v_r, %xmm0 // Just for testing
    movups v2, %xmm0  // Just for testing
    movups v1, %xmm0
//    addps  v2, %xmm0
    movups %xmm0, v_r
  end;
end;

var
  A,
  B,
  C: TSSESingle;
  I: Integer;
begin
  writeln('is_sse_cpu: ', BoolToStr(is_sse_cpu, True));

  for I := Low(A) to High(A) do begin
    A[I] := 2 * I;
    B[I] := 3 * I;
  end;

  writeln('Doing SSE in main program...');
writeln(Format('@A: %s, @B: %s, @C: %s', [HexStr(Addr(A)), HexStr (Addr(B)), HexStr(Addr(C))]));
  asm
    movups A, %xmm0
    addps  B, %xmm0
    movups %xmm0, C
  end;
  writeln('Works');

  writeln('Doing SSE in subroutine...');
  SSEAdd(A, B, C);
  writeln('Works');

  for I := Low(C) to High(C) do
    write(C[I]:10:1);
  writeln;

  readln;
end.

I have three questions:

1.) Florian, in the mailing archive I found an anwser from 2004 in which you say said that 'FPC doesn't align the stack properly to sixteen byte boundaries currently.'. It still seems to be a problem also with {$ALIGN 16}, right?

2.) When I create global variables so that I can influence the position of the variables by adding dummy variables (which is not needed anymore in this simple example) and I specify these variables as const and out parameters so the global variables will be used directly (see the output of the addresses) the assembler lines in the main program work but in SSEAdd() the line addps creates a SIGSEGV. Can sobody tell me why it doesn't work in the subroutine?

With addps line in SSEAdd():

(gdb) run
Starting program: /home/mm/Development/SSETests/ssetestssimple
is_sse_cpu: True
Doing SSE in main program...
@A: 08085BE0, @B: 08085BF0, @C: 08085C00
Works
Doing SSE in subroutine...
@v1: 08085BE0, @v2: 08085BF0, @v_r: 08085C00

Program received signal SIGSEGV, Segmentation fault.
SSEADD (V1={0, 2, 4, 6}, V2={0, 3, 6, 9}, V_R={0, 5, 10, 15})
    at ssetestssimple.pas:23
23          addps  v2, %xmm0

Btw: Without addps line in SSEAdd() when exiting the program (after hitting Enter):

(gdb) run
Starting program: /home/mm/Development/SSETests/ssetestssimple
is_sse_cpu: True
Doing SSE in main program...
@A: 080AE290, @B: 080AE2A0, @C: 080AE2B0
Works
Doing SSE in subroutine...
@v1: 080AE290, @v2: 080AE2A0, @v_r: 080AE2B0
Works
       0.0       5.0      10.0      15.0


Program received signal SIGSEGV, Segmentation fault.
0x08049a70 in fpc_ansistr_decr_ref ()

3.) When I try to use the assembler keyword for SSEAdd() I get the compile error 'Asm: [movups xmmreg,reg32] invalid combination of opcode and operands'. Why it doesn't work in this way?

I'm using
[EMAIL PROTECTED]:~/Development/SSETests$ fpc -i
Free Pascal Compiler version 2.2.0

Compiler Date      : 2007/12/29
Compiler CPU Target: i386

...

Supported FPU instruction sets:
  SOFT
  X87
  SSE
  SSE2
  SSE3

...

under Linux on a Core 2 Duo machine.

Regards

Michael
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Reply via email to