Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mg::interp Different register allocation #194

Open
Tracked by #178
blackgeorge-boom opened this issue Jul 27, 2022 · 2 comments
Open
Tracked by #178

mg::interp Different register allocation #194

blackgeorge-boom opened this issue Jul 27, 2022 · 2 comments
Assignees

Comments

@blackgeorge-boom
Copy link
Collaborator

blackgeorge-boom commented Jul 27, 2022

void simple(int n)
{
    return;
}

static void interp(void *oz, int mm1, int mm2, int mm3,
                   void *ou, int n1, int n2, int n3);

int main()
{
    double u[10];

    interp(&u[0], 1, 1, 1, &u[0], 1, 1, 1);
}

static void interp(void *oz, int mm1, int mm2, int mm3,
                   void *ou, int n1, int n2, int n3)
{
  int (*z)[mm2][mm1] = (int (*)[mm2][mm1])oz;
  int (*u)[n2][n1] = (int (*)[n2][n1])ou;

  int i3 = 3, i2 = 2, i1 = 1;

  simple(1);
  for (i1 = 0; i1 < mm1-1; i1++) {
    u[i3][i2][i1] = z[i3][i2][i1];
  }
}
@blackgeorge-boom
Copy link
Collaborator Author

The architectures use the CSRs differently:

AArch64:

  5010e0: f3 03 01 2a                  	mov	w19, w1                 ; mm1
;   int (*u)[n2][n1] = (int (*)[n2][n1])ou;
  5010e4: f4 03 05 2a                  	mov	w20, w5                 ; n1

X86:

  5010c0:	8b 5d 10             	mov    ebx,DWORD PTR [rbp+0x10] ; n2
  5010c3:	45 89 cf             	mov    r15d,r9d                 ; n1
  5010c6:	4c 89 45 b0          	mov    QWORD PTR [rbp-0x50],r8
...
  5010eb:	e8 30 ff ff ff       	call   501020 <simple>
  5010f0:	49 89 d8             	mov    r8,rbx                   ; n2
  5010f3:	48 8b 5d 90          	mov    rbx,QWORD PTR [rbp-0x70] ; mm1

@blackgeorge-boom
Copy link
Collaborator Author

I think the problem is that X86 uses the same register as destination and operand (two-address operations):

288B	  undef %31.sub_32bit:gr64_with_sub_8bit = MOV32rm %fixed-stack.1, 1, $noreg, 0, $noreg :: (dereferenceable load 4 from %ir.n2.addr, align 16)
320B	  undef %3.sub_32bit:gr64_with_sub_8bit = MOV32rr %9:gr32
...
608B	  %22:gr64_with_sub_8bit = nuw IMUL64rr %22:gr64_with_sub_8bit(tied-def 0), %1:gr64_with_sub_8bit, implicit-def dead $eflags
640B	  %31:gr64_with_sub_8bit = nuw IMUL64rr %31:gr64_with_sub_8bit(tied-def 0), %3:gr64_with_sub_8bit, implicit-def dead $eflags
...
1008B	  %33:gr64 = nsw IMUL64rr %33:gr64(tied-def 0), %31:gr64_with_sub_8bit, implicit-def dead $eflags

So, it decides to use first a CSR for n2, so that the value will survive the whole live range.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant