Extend coverage of aarch64 lifter, including SIMD #1546

DukMastaaa · 2022-08-03T01:18:25Z

File structure

SIMD instructions have been implemented in files under plugins/arm/semantics with the aarch64-simd- prefix,
in the aarch64 package. This is done as bap only looks in the top level of the semantics folder,
so adding a subdirectory simd won't be recognised.
For FP instructions soon to be implemented, this approach (with prefix aarch64-fp) will also be used.

`nth-reg-in-group` primitive

For instructions in the CASP and LDn families, LLVM gives BAP register groups like X0_X1 or Q0_Q1_Q2 which used to require a large switch statement like the following to extract the actual register:

(defun first-reg-in-group (r-pair)
  (case (symbol r-pair)
    'X0_X1 X0
    'X1_X2 X1
    ;; ...
    'X30_X31 X30))

A new Primus Lisp primitive, (nth-reg-in-group sym n), returns the nth register in a register group passed in as a symbol, sym.
For example, (nth-reg-in-group 'D0_D1_D2_D3 2) returns D2.
(:warning: there is a slight problem with the implementation, please see the notes on CASP below)

Non-SIMD Instructions

There are a lot of instructions implemented in this PR; these are sufficient to fully lift the cntlm binary (cross-compiled for aarch64) except two FMOV variants. This was tested using the --print-missing option to bap disassemble (#1410).

Instructions added are listed here, with some containing the BIL code hidden under a collapsible menu.

Arithmetic

ADDS*ri, ADDS*rs, ADD*rx, ADDXrx64

Instruction: adds x0, x1, x2
Opcode: 20 00 02 ab
{
  #3 := R2 + R1
  NF := 63:63[#3]
  VF := 63:63[R1] & 63:63[R2] & ~63:63[#3] | ~63:63[R1] & ~63:63[R2] &
    63:63[#3]
  ZF := #3 = 0
  CF := 63:63[R1] & 63:63[R2] | 63:63[R2] & ~63:63[#3] | 63:63[R1] &
    ~63:63[#3]
  R0 := #3
}

SUBXr*, SUBSXrx, SUBSXrx64

Similar to ADDS.

UMADDLrr, SMADDLrr, UMSUBLrr, SMSUBLrr

Instruction: umaddl x0, w1, w2, x3
Opcode: 20 0c a2 9b
{
  R0 := R3 + extend:64[31:0[R1] * 31:0[R2]]
}

The rest are similar.

UMULHrr

Instruction: umulh x0, x1, x2
Opcode: 20 7c c2 9b
{
  R0 := high:64[pad:128[R1] * pad:128[R2]]
}

ADR

Instruction: adr x0, 0xABCD
Opcode: 60 5e 05 30
{
  mem := mem with [R0, el]:u64 <- 0xABCD
}

Atomic

CASP family ⚠️

This uses the load-acquire and store-release intrinsics as described in #1458.

Instruction: caspal x0, x1, x2, x3, [x4]
Opcode: 82 fc 60 48
{
  #0 := mem[R4, el]:u128
  #1 := low:64[#0]
  #2 := high:64[#0]
  call(intrinsic:load-acquire)
  #4 := #0 = (R1.R0)
  if (#4) {
    call(intrinsic:store-release)
    mem := mem with [R4, el]:u128 <- R3.R2
  }
  R0 := #1
  R1 := #2
}

(:warning:)
The nth-reg-in-group primitive is also used to extract the registers in the xa_xb pairs.
However, its implementation prevents the following expression from reifying correctly:

(concat
  (nth-reg-in-group 'X0_X1 0)
  (nth-reg-in-group 'X0_X1 1))

The expected result is X0.X1, but printing out the result with msg gives 0x30000000000000004.
As a temporary workaround, a helper function (register-pair-concat r-pair) has been defined, containing a large switch statement with cases for each 'Xa_Xb, but this is not ideal.
Some advice on how to resolve this would be much appreciated.

Data movement

BIL code has not been provided for most instructions in this category due to the amount of instructions and the minute differences between them.

Loads:

LDR*ro*, LDR*pre, LDR*post, LDR*ui
LDRBBro*, LDRBBpre, LDRBBpost
LDRHHro*, LDRHHpre, LDRHHpost, LDRHHui
rest of LDP*pre and LDP*post, LDP*i
LDRSWui, LDRSWro*
LDURBBi, LDURHHi
LDURSB*i, LDURSH*i, LDURSWi
LDUR*i

Stores:

STR*ro*, STR*pre, STR*post
STRHHui
STRBBro*, STRBBpre, STRBBpost
STP*pre, STP*post, STP*i
STURHHi, STURBBi

Other:

EXTR*rri

Instruction: extr x0, x1, x2, #5
Opcode: 20 14 c2 93
{
  R0 := 68:5[R1.R2]
}

Logical

ANDS*ri, ANDS*rs

Instruction: ands x0, x1, x2, LSL #2
Opcode: 20 08 02 ea
{
  #3 := R2 << 2
  #4 := R1 & #3
  NF := 63:63[#4]
  ZF := #4 = 0
  CF := 0
  VF := 0
  R0 := #4
}

BIC*r, BICS*rs

Instruction: bic x0, x1, x2, asr #4
Opcode: 20 10 a2 8a
{
  #3 := R2 ~>> 4
  #4 := ~#3
  R0 := R1 & #4
}

REV*r, REV16*r, REV32Xr

Note that REV16*r etc. reverses the bytes within each container of size 16.

Instruction: rev x0, x1
Opcode: 20 0c c0 da
{
  R0 :=
    7:0[R1].15:8[R1].23:16[R1].31:24[R1].39:32[R1].47:40[R1].55:48[R1].63:56[R1]
}

Instruction: rev16 w0, w1
Opcode: 20 04 c0 5a
{
  R0 := high:32[R0].23:16[R1].31:24[R1].7:0[R1].15:8[R1]
}

ASRV*r, LSRV*r, LSLV*r, RORV*r

Nothing special about these.

RBIT*r

Instruction: rbit w0, w1
Opcode: 20 00 c0 5a
{
  R0 :=
    0:0[R1].1:1[R1].2:2[R1].3:3[R1].4:4[R1].5:5[R1].6:6[R1].7:7[R1].8:8[R1].9:9[R1].10:10[R1].11:11[R1].12:12[R1].13:13[R1].14:14[R1].15:15[R1].16:16[R1].17:17[R1].18:18[R1].19:19[R1].20:20[R1].21:21[R1].22:22[R1].23:23[R1].24:24[R1].25:25[R1].26:26[R1].27:27[R1].28:28[R1].29:29[R1].30:30[R1].31:31[R1]
}

Special

BRK

This passes the label argument to a software-breakpoint intrinsic.

Instruction: brk 0xABCD
Opcode: a0 79 35 d4
{
  intrinsic:x0 := 0xABCD
  call(intrinsic:software-breakpoint)
}

SIMD Instructions

We use . to indicate one of B, H, S, D, Q instead of * to avoid name conflicts with existing non-SIMD macros.

Arithmetic

Here, we just reuse * to also indicate some number for element count or element size.

ADDv*i*, SUBv*i*, MULv*i*

Note: + has a higher precedence in the textual representation than ., so although the spacing in the BIL output below is misleading, the output is correct.

Instruction: add v0.8h, v1.8h, v2.8h
Opcode: 20 84 62 4e
{
  V0 := 127:112[V1] + 127:112[V2].111:96[V1] + 111:96[V2].95:80[V1] +
    95:80[V2].79:64[V1] + 79:64[V2].63:48[V1] + 63:48[V2].47:32[V1] +
    47:32[V2].31:16[V1] + 31:16[V2].15:0[V1] + 15:0[V2]
}

The rest are similar and only differ in the binary operation.

Loads

This PR implements all of the SIMD load instructions; see the PR diff for a full list.
Instructions with interesting BIL output are listed below.

LDNP.i

As an instruction with non-temporal properties, LDNP relaxes the order of its memory accesses. This is represented as a call to a 'non-temporal-hint' intrinsic where the address is passed as a parameter.

Instruction: ldnp s0, s1, [x3, 4]
Opcode: 60 84 40 2c
{
  intrinsic:x0 := R3 + 4
  call(intrinsic:non-temporal-hint)
  V0 := high:96[V0].mem[R3 + 4, el]:u32
  intrinsic:x0 := R3 + 8
  call(intrinsic:non-temporal-hint)
  V1 := high:96[V1].mem[R3 + 8, el]:u32
}

LD..v._POST (e.g. ld2 {v0.4s, v1.4s}, [x2], x3)

This instruction family receives register groups from LLVM like CASP.
The BIL code separates each memory access individually to accurately model the interleaving done by the processor.
This may not be ideal for generated code size -- advice on making such levels of detail toggleable would be appreciated.

Instruction: ld2 {v0.4s, v1.4s}, [x2], x3
Opcode: 40 88 c3 4c
{
  #1 := mem[R2, el]:u32
  #3 := mem[R2 + 4, el]:u32
  #5 := #1.mem[R2 + 8, el]:u32
  #7 := #3.mem[R2 + 0xC, el]:u32
  #9 := #5.mem[R2 + 0x10, el]:u32
  #11 := #7.mem[R2 + 0x14, el]:u32
  #13 := #9.mem[R2 + 0x18, el]:u32
  #15 := #11.mem[R2 + 0x1C, el]:u32
  V0 := #13
  V1 := #15
  R2 := R2 + R3
}

Similar expansions apply to the rest of the LDn family.

Logical

ANDv*i*, EORv*i*, NOTv*i*, ORRv*i*, ORNv*i*

These are done on the whole register Vn.

Instruction: not v0.16b, v1.16b
Opcode: 20 58 20 6e
{
  V0 := ~V1
}

Misc. movement

INSvi32gpr, INSvi32lane

The implementation uses bitmasks and bit shifts to insert the vector elements, but could equivalently use extract and concat.
Please advise if this is preferred.

Instruction: ins v0.s[1], v1.s[1]
Opcode: 20 24 0c 6e
{
  #1 := 63:32[V1]
  #5 := V0 & 0xFFFFFFFFFFFFFFFF00000000FFFFFFFF
  #6 := #5 | 0xFFFFFFFF00000000 & pad:128[#1] << 0x20
  V0 := #6
}

Instruction: ins v0.s[1], w1
Opcode: 20 1c 0c 4e
{
  #3 := V0 & 0xFFFFFFFFFFFFFFFF00000000FFFFFFFF
  #4 := #3 | 0xFFFFFFFF00000000 & pad:128[31:0[R1]] << 0x20
  V0 := #4
}

MOVIv*i*, MOVIv*b_ns

Instruction: movi v0.4h, 0xAB
Opcode: 60 85 05 0f
{
  V0 := 0xAB00AB00AB00AB
}

EXTv*i*

This is implemented literally as described in the ISA with extract after concat.

Instruction: ext v0.16b, v1.16b, v2.16b, 3
Opcode: 20 18 02 6e
{
  V0 := 151:24[V2.V1]
}

Store

Most of these have nearly identical implementations to the non-SIMD STP variants.

STR.ro*, STR.pre, STR.post, STR.ui

For STR.ro*:

Instruction: str q0, [x1, x2]
Opcode: 20 68 a2 3c
{
  mem := mem with [R1 + R2, el]:u128 <- V0
}

For STR.post (pre is similar):

Instruction: str q0, [x1], 123
Opcode: 20 b4 87 3c
{
  mem := mem with [R1, el]:u128 <- V0
  R1 := R1 + 0x7B
}

For STR.ui:

Instruction: str q0, [x1, 0xAB]
Opcode: 20 b0 8a 3c
{
  mem := mem with [R1 + 0xAB, el]:u128 <- V0
}

STP.pre, STP.post, STP.i

Instruction: stp q0, q1, [x2], #16
Opcode: 40 84 80 ac
{
  #3 := R2
  mem := mem with [#3, el]:u128 <- V0
  mem := mem with [#3 + 0x10, el]:u128 <- V1
  R2 := #3 + 0x10
}

STUR.i

Instruction: stur q0, [x1, 0xAB]
Opcode: 20 b0 8a 3c
{
  mem := mem with [R1 + 0xAB, el]:u128 <- V0
}

Stur instructions

…UQ-PAC/bap into implement-missing-aarch64-insns

separated into category files

LLVM can't seem to disassemble ARMv8.4 instructions like RMIF, SETF8 and SETF16. Also, CFINV gets turned into MSR (register) but LLVM returns ill-formed asm...? I've commented this in aarch64-pstate.lisp.

i typed is_zero with underscore instead of primitive is-zero

documentation added for macros and helper functions.

…UQ-PAC/bap into implement-missing-aarch64-insns

llvm mnemonics most likely incorrect, will investigate why bap's llvm doesn't disassemble these insns

i've used ` bap mc --cpu=cortex-a55 --triple=aarch64` to get the llvm mnemonic, but will need to talk to ivan about lisp context and specifying generic armv8.x instead of a specific cpu

…arch64-insns

…UQ-PAC/bap into implement-missing-aarch64-insns

Miscellaneous fixes and adding instructions fix: replace lognot with lnot LDURHH, LDURSB, LDURSH, LDURSW RBIT (and reverse-bits helper) UMSUBL,SMSUBL,UMADDL,SMADDL

Implemented all LD (multiple structres), LD (single structures), LD.R…

packages form a flat namespace (and seem to need a flat file hierarchy as well) we'll just use the aarch64-simd- prefix as a replacement for folders

will need to find out why primitive doesn't work for concat

function overloads are not nice sometimes

andrewj-brown · 2022-11-22T05:49:53Z

Added a few more arithmetic instructions.

ADC*r, ADCS*r

Instruction: adc x0, x1, x2
Opcode: 20 00 02 9a
{
  R0 := extend:64[CF] + R2 + R1
}

Instruction: adcs x0, x1, x2
Opcode: 20 00 02 ba
{
  #0 := R1 + R2 + extend:64[CF]
  NF := 63:63[#0]
  VF := CF & 63:63[R2] & ~63:63[#0] | ~CF & ~63:63[R2] & 63:63[#0]
  ZF := #0 = 0
  CF := CF & 63:63[R2] | 63:63[R2] & ~63:63[#0] | CF & ~63:63[#0]
  R0 := #0
}

SBC*r, SBCS*r

Instruction: sbc x0, x1, x2
Opcode: 20 00 02 da
{
  R0 := extend:64[CF] + ~R2 + R1
}

Instruction: sbcs x0, x1, x2
Opcode: 20 00 02 fa
{
  #0 := R1 + ~R2 + extend:64[CF]
  NF := 63:63[#0]
  VF := CF & 63:63[~R2] & ~63:63[#0] | ~CF & ~63:63[~R2] & 63:63[#0]
  ZF := #0 = 0
  CF := CF & 63:63[~R2] | 63:63[~R2] & ~63:63[#0] | CF & ~63:63[#0]
  R0 := #0
}

CLZ*r

Instruction: clz x0, x1
Opcode: 20 10 c0 da
{
  #1 := R1
  #1 := #1 | #1 >> 1
  #1 := #1 | #1 >> 2
  #1 := #1 | #1 >> 4
  #1 := #1 | #1 >> 8
  #1 := #1 | #1 >> 0x10
  #1 := #1 | #1 >> 0x20
  #1 := ~#1
  #2 := #1
  #2 := #2 - (#2 >> 1 & 0x5555555555555555)
  #2 := (#2 & 0x3333333333333333) + (#2 >> 2 & 0x3333333333333333)
  #2 := #2 + (#2 >> 4) & 0xF0F0F0F0F0F0F0F
  R0 := #2 * 0x101010101010101 >> 0x38
}

SUBSWrx

Instruction: subs w0, w1, w2, SXTB#0
Opcode: 20 80 22 6b
{
  #3 := 1 + extend:64[~(31:0[R2] << 0x20)] + extend:64[31:0[R1]]
  NF := 63:63[#3]
  VF := 31:31[R1] & 31:31[~(31:0[R2] << 0x20)] & ~63:63[#3] | ~31:31[R1] &
    ~31:31[~(31:0[R2] << 0x20)] & 63:63[#3]
  ZF := #3 = 0
  CF := 31:31[R1] & 31:31[~(31:0[R2] << 0x20)] | 31:31[~(31:0[R2] << 0x20)] &
    ~63:63[#3] | 31:31[R1] & ~63:63[#3]
  R0 := high:32[R0].#3
}

A lot of these are extensions/alternative versions of pre-existing macros. I haven't shown the differences between x- and w-versions.

DukMastaaa and others added 30 commits February 4, 2022 01:53

Add some bitvector functions and assert-msg macro

7278902

Add immediate decoding and logical immediate insns

86ed4d4

Add barrier instructions via the special primitive

51ad92c

add MADD, MSUB, SDIV, UDIV and conditional select

cf95551

Fix zeros and ones bitvector constructors

d2199a8

organise instructions into categories

98b94c7

move ADRP to integer arithmetic category

7abc695

remove private access from arm-bits functions

5d42b6d

Added stur instructions and fixed bug in condition-holds macro

7d6df06

Bug fixes for STUR insns

3286a9d

Merge pull request #2 from UQ-PAC/stur-instructions

6bb8453

Stur instructions

condense repeated definitions into macros

42b9e44

Merge branch 'implement-missing-aarch64-insns' of https://github.com/…

e565ccb

…UQ-PAC/bap into implement-missing-aarch64-insns

implement TBZ and TBNZ

be91fcd

separated into files

2e379ca

Merge pull request #3 from UQ-PAC/insn-cleanup

2fb075f

separated into category files

rename aarch64 files and make aarch64-helper

522bb94

add some processor state instructions

6d133e5

LLVM can't seem to disassemble ARMv8.4 instructions like RMIF, SETF8 and SETF16. Also, CFINV gets turned into MSR (register) but LLVM returns ill-formed asm...? I've commented this in aarch64-pstate.lisp.

fix typo in pstate instructions

e3a7da8

i typed is_zero with underscore instead of primitive is-zero

implement rotate-left

f4fb588

add missing documentation and rename for clarity

7bd7dbb

documentation added for macros and helper functions.

Merge branch 'master' into implement-missing-aarch64-insns

85f8648

apply upstream changes from BinaryAnalysisPlatform#1454

0231a76

Merge branch 'implement-missing-aarch64-insns' of https://github.com/…

cc4027d

…UQ-PAC/bap into implement-missing-aarch64-insns

implement CAS and friends for X registers

28dc108

llvm mnemonics most likely incorrect, will investigate why bap's llvm doesn't disassemble these insns

correct CAS mnemonic and code

469d134

i've used ` bap mc --cpu=cortex-a55 --triple=aarch64` to get the llvm mnemonic, but will need to talk to ivan about lisp context and specifying generic armv8.x instead of a specific cpu

fix typos in CAS instruction code

381d627

Merge branch 'BinaryAnalysisPlatform:master' into implement-missing-a…

0ff0df2

…arch64-insns

Merge branch 'implement-missing-aarch64-insns' of https://github.com/…

fa75ba9

…UQ-PAC/bap into implement-missing-aarch64-insns

fix CFINV, RMIF, SETF8 and SETF16 using BinaryAnalysisPlatform#1461

ed080e3

ailrst and others added 29 commits July 20, 2022 13:44

Merge branch 'aarch64-pull-request-2' into add-missing-empty-bin

4ca0f1c

Merge pull request #19 from UQ-PAC/add-missing-empty-bin

1cb9933

Miscellaneous fixes and adding instructions fix: replace lognot with lnot LDURHH, LDURSB, LDURSH, LDURSW RBIT (and reverse-bits helper) UMSUBL,SMSUBL,UMADDL,SMADDL

Merge pull request #20 from UQ-PAC/implement-LD-insns

7d8e630

Implemented all LD (multiple structres), LD (single structures), LD.R…

extract simd instructions into package

721917a

limit comment length

1f5d65a

Separate simd data movement instructions

5dd6371

remove simd folder which doesn't get read by bap

698d7eb

packages form a flat namespace (and seem to need a flat file hierarchy as well) we'll just use the aarch64-simd- prefix as a replacement for folders

finish STUR.i instructions

96dc059

move reverse-bits and helper to bits.lisp

a324ed4

add comment to bitvec-to-symbol

40841c3

use nth-reg-in-group in CASPord* except concat

e5cf036

will need to find out why primitive doesn't work for concat

Fix bug in REVnWr implementation

328914e

function overloads are not nice sometimes

fix comment length and LLVM code for BIC

3a007ab

Implemented all SIMD load instructions

92c68b0

Merged aarch64-pull-request-2 to implement-LD-insns

82a4262

Fixed intrinsic usage in LDNP

cdaf4fc

Minor comment changes

be36805

Merge branch 'master' into aarch64-pull-request-2

3250199

add ADC instruction & setup repo

a1f0852

fix flag behaviour with adc/adcs

feb8dc1

switch macros

3b7b928

add sbc and sbcs

666d9b5

switch SBCS to clear-base as SUBS, use "-" with SBC

851d126

fix brackets

d1a25f3

switch off clear-base

7ede5a8

fixes

4091cfd

add SUBSWrx and finish SBC, SBCS, ADC, ADCS

e8d35c9

added CLZXr and CLZWr

abb7f40

Merge arithmetic-pr into aarch64-pull-request-2

acfdc10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend coverage of aarch64 lifter, including SIMD #1546

Extend coverage of aarch64 lifter, including SIMD #1546

DukMastaaa commented Aug 3, 2022

andrewj-brown commented Nov 22, 2022

Extend coverage of aarch64 lifter, including SIMD #1546

Are you sure you want to change the base?

Extend coverage of aarch64 lifter, including SIMD #1546

Conversation

DukMastaaa commented Aug 3, 2022

File structure

nth-reg-in-group primitive

Non-SIMD Instructions

Arithmetic

Atomic

Data movement

Logical

Special

SIMD Instructions

Arithmetic

Loads

Logical

Misc. movement

Store

andrewj-brown commented Nov 22, 2022

`nth-reg-in-group` primitive