[AArch64] Instr. with groups `HasNEON`, don't have `HasNEONorSME` and similar assigned. #2343

grahamwoodward · 2024-04-26T10:46:37Z

Running an AWS Ubuntu aarch64 VM
Linux ip-10-252-39-126 6.5.0-1018-aws #18~22.04.1-Ubuntu SMP Fri Apr 5 17:56:39 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux

Questions	Answers
OS/arch/bits	Ubuntu AArch64.
Architecture	armv8
Source of Capstone	`git clone`
Version/git commit	`24d99a9`

I'm using Capstone in a performance tool I'm trying to write (relating to retired instructions). It's only for aarch64 and I have a simple test benchmark that I'm compiling like so

aarch64-linux-gnu-gcc vectorise2.c -o vectorise2 -O2 -ftree-vectorize -march=armv8.6-a

and

aarch64-linux-gnu-gcc vectorise2.c -o vectorise2 -O2 -ftree-vectorize -march=armv8.5-a+sve

When compiling with armv8.6-a and running the test benchmark I'm then running my tool and asking it to show HasNEON and HasNEONorSME and it appears to give me the details indicating that the retired/decoded instruction is HasNEON but not HasNEONorSME. If it's in the group/feature HasNEON, then why not HasNEONorSME? Both are those are true?

Similarly, when compiler with 8.5-a+sve, then I see a zero count for HasSVE however I see a >0 count for HasSVEorSME...again that doesn't make sense...if the retired instruction is in the group/feature HasSVEorSME, then surely I should see a count for HasSME or a count for HasSVE?

The text was updated successfully, but these errors were encountered:

Rot127 · 2024-04-26T11:46:26Z

Likely another faulty definition in the LLVM tablegen files.
I am currently updating to LLVM 18 (#2298) and there are plenty of changes to AArch64 and SME/SVE. So it might be fixed already there.

grahamwoodward · 2024-04-26T11:53:40Z

@Rot127 Is that a commit I can pull down to my clone?

Rot127 · 2024-04-26T11:57:03Z

Not yet. Some disassembly tests still fail. But if you provide me with some test cases I would be thankful. This way I can check them before for you in the LLVM code and report back if it is indeed a LLVM bug (will take a while until it is fixed I assume) or something in CS.

grahamwoodward · 2024-04-26T12:10:38Z

Sure....my test case is simply

#define LEN 1024
unsigned long long a[LEN], b[LEN], c[LEN];

int main()
{
    while(1) {
        int i = 0;
        for (; i < LEN; i++)
            c[i] = a[i] + b[i];
    }
    return 0;
}

and then compile like
aarch64-linux-gnu-gcc vectorise2.c -o vectorise2 -O2 -ftree-vectorize -march=armv8.6-a
and
aarch64-linux-gnu-gcc vectorise2.c -o vectorise2 -O2 -ftree-vectorize -march=armv8.5-a+sve

An objdump shows instructions such as
628: 4ee18400 add v0.2d, v0.2d, v1.2d
and
638: 04e10000 add z0.d, z0.d, z1.d
respectively for the two compilations.

My perf tool is configuring Capstone like so

  122   /* Initialise Capstone, which we use to disassemble the instruction */
  123   if (cs_open(CS_ARCH_AARCH64, CS_MODE_ARM, &handle) != CS_ERR_OK) {
  124       fprintf(stderr, "Failed to initialise Capstone! Aborting.\n");
  125       exit(1);
  126   }
  127   cs_option(handle, CS_OPT_DETAIL, CS_OPT_ON);
  128   cs_option(handle, CS_OPT_SKIPDATA, CS_OPT_ON);

My tool is using perf events to sample retired instructions and I use Capstone to disassemble the instructions and tell me about the groups/features.

In the first case for instruction
4ee18400 add v0.2d, v0.2d, v1.2d

it appears to have a group count of 1 and that group is HasNEON...my first question is why isn't it also in the HasNEONorSME group?

In the second case for instruction
04e10000 add z0.d, z0.d, z1.d

It appears to have a group count of 1 but not HasSME...it is instead HasSMEorSVE...so why this way and why only 1 group again?

Thanks, hope that helps

grahamwoodward · 2024-04-26T13:27:41Z

Not yet. Some disassembly tests still fail. But if you provide me with some test cases I would be thankful. This way I can check them before for you in the LLVM code and report back if it is indeed a LLVM bug (will take a while until it is fixed I assume) or something in CS.

what version is being used at present? I have llvm-14 installed and also a build of llvm-18 and the -14 version struggles with this

echo 0x00, 0x0c, 0xe2, 0x25, | llvm-mc -triple=aarch64 -disassemble -mattr=+all
'+all' is not a recognized feature for this target (ignoring feature)
        .text
<stdin>:1:1: warning: invalid instruction encoding
0x00, 0x0c, 0xe2, 0x25,

(note even without -mattr=+all it fails with the encoding) but my local build of 18 is OK

echo 0x00, 0x0c, 0xe2, 0x25, | ./llvm-mc -triple=aarch64 -disassemble -mattr=+all
        .text
        whilelo p0.d, w0, w2

Rot127 · 2024-04-27T07:15:44Z

The v5 branch is based on LLVM-7. Current next branch uses LLVM-16 for ARM, PPC, AArch64 (Alpha: LLVM-3) and the others still LLVM-7. I update ARM, PPC, AArch64 over the next weeks to LLVM-18.

it appears to have a group count of 1 and that group is HasNEON...my first question is why isn't it also in the HasNEONorSME group?

The add instructions are defined here if I am not mistaken. And as you can see, it has only the HasNEON group assigned. Not the HasNEONorSVE. Because we generate the Capstone code from these definitions in the td files we also inherit all the flaws which come with hand-written things.

The underlying problem is, that the current AArch64 definitions need to assign groups like HasNEONorSME manually (just as it has been done in the file above). It is not derived dynamically from the fact, that it already has HasNEON assigned. It is common to find these kind of issues in hand-written td files, unfortunately.

I'll fix it with #2298 in our LLVM fork by checking during generation.

grahamwoodward · 2024-04-27T07:21:56Z

This could be true for other aarch64 instructions?

Rot127 · 2024-04-27T07:29:57Z

Yes, there is a good chance that every instruction which should be of one of the following groups, isn't assigned to them:
HasSVEorSME, HasSVE2orSME, HasSVE2orSME2, HasSVE2p1_or_HasSME, HasSVE2p1_or_HasSME2, HasSVE2p1_or_HasSME2p1, HasNEONorSME

Rot127 · 2024-04-27T07:30:46Z

But we can fix this in our LLVM fork pretty easily. If we detect during code generation that a instruction is part of HasSVE, we can just emit HasSVEorSME as well.

grahamwoodward · 2024-04-27T07:51:51Z

But we can fix this in our LLVM fork pretty easily. If we detect during code generation that a instruction is part of HasSVE, we can just emit HasSVEorSME as well.

do you have an ETA for v6?

Rot127 · 2024-04-27T08:06:23Z

#2298 though should be done in two or three weeks. If you need to use it before, check the issues listed in the PR and if they affect you.

An ETA for v6 not really. If you want to use Capstone for AArch64 definitely use the next branch.
The todo list for v6 release is still quite long, I doubt it can be released within 6 months.

grahamwoodward · 2024-04-27T08:07:13Z

#2298 though should be done in two or three weeks. If you need to use it before, check the issues listed in the PR and if they affect you.

An ETA for v6 not really. If you want to use Capstone for AArch64 definitely use the next branch. The todo list for v6 release is still quite long, I doubt it can be released within 6 months.

"If you want to use Capstone for AArch64 definitely use the next branch." pretty sure I'm using the next branch

Rot127 · 2024-04-27T08:09:45Z

Yes, I just wanted to point it out again, the difference between current next and the v5 release is huge.
Also, the work towards v6 is mostly sponsored by RizinOrg, so I won't "just stop".

grahamwoodward · 2024-04-27T08:17:23Z

Yes, I just wanted to point it out again, the difference between current next and the v5 release is huge. Also, the work towards v6 is mostly sponsored by RizinOrg, so I won't "just stop".

right so even though I'm using next, I don't have the above commit? Can I pull/check that commit out or in my tool link to it as the git submodule?

Rot127 · 2024-04-27T08:21:49Z

#2298 is the PR which updates the AArch64 module to the state of LLVM-18. When it is done it will be merged into next.

But it isn't done yet. Some instructions don't get disassembled correctly and need to be fixed. So it won't be useful to you in the current state. I can let you know, when it is in a usable state. Than you can use the branch of the PR.

grahamwoodward · 2024-04-27T12:20:09Z

@Rot127 excellent thanks very much

grahamwoodward · 2024-04-27T17:55:07Z

The v5 branch is based on LLVM-7. Current next branch uses LLVM-16 for ARM, PPC, AArch64 (Alpha: LLVM-3) and the others still LLVM-7. I update ARM, PPC, AArch64 over the next weeks to LLVM-18.

it appears to have a group count of 1 and that group is HasNEON...my first question is why isn't it also in the HasNEONorSME group?

The add instructions are defined here if I am not mistaken. And as you can see, it has only the HasNEON group assigned. Not the HasNEONorSVE. Because we generate the Capstone code from these definitions in the td files we also inherit all the flaws which come with hand-written things.

How did you work out that that's the add?

The underlying problem is, that the current AArch64 definitions need to assign groups like HasNEONorSME manually (just as it has been done in the file above). It is not derived dynamically from the fact, that it already has HasNEON assigned. It is common to find these kind of issues in hand-written td files, unfortunately.

I'll fix it with #2298 in our LLVM fork by checking during generation.

Can you explain more about the generation and how it works? Is that a capstone thing?

Rot127 · 2024-04-28T06:12:11Z

How did you work out that that's the add?

In AArch64InstrFormats.td the basic instructions formats are defined. In this case all vector instructions with a common encoding which have three vector registers (BaseSIMDThreeSameVector).

With a debugger I checked in Capstone what the LLVM internal ID is which it gave (one of the) the instructions from above: add v0.2d, v0.2d, v1.2d = ADDv2i64.

Guessing that it inherits it, I checked where this class is inherited and contains a portion of the LLVM name:

> grep -rnI "BaseSIMDThreeSameVector" llvm/lib/Target/AArch64/ | grep v2i64
llvm/lib/Target/AArch64/AArch64InstrFormats.td:5878:  def v2i64 : BaseSIMDThreeSameVector<1, U, 0b111, opc, V128,

Checking AArch64InstrFormats.td:5878 it shows that it defines a multiclass (SIMDThreeSameVector) there, which defines multiple instructions at once (one of them the v2i64 variant). Grepping for this aggin shows us that indeed an add instruction of this type wass defined.

> grep -rnI " SIMDThreeSameVector" llvm/lib/Target/AArch64/ | grep ADD
llvm/lib/Target/AArch64/AArch64InstrInfo.td:1452:defm FCADD : SIMDThreeSameVectorComplexHSD<1, 0b111, complexrotateopodd,
llvm/lib/Target/AArch64/AArch64InstrInfo.td:5170:defm ADD     : SIMDThreeSameVector<0, 0b10000, "add", add>;
...

In short, you have to manually the inheritance paths and have a feeling how these definitions are commonly written. SO you can make good guesses where to start looking.

Can you explain more about the generation and how it works? Is that a capstone thing?

This is documented here and in the README.md of our LLVM fork.

grahamwoodward · 2024-04-29T07:21:11Z

@Rot127 this might be a daft question but why isn't LLVM included as a git submodule that's then cloned/checked out when you initially clone Capstone? Wouldn't that mean Capstone always uses the latest LLVM?

Rot127 · 2024-04-29T07:50:36Z

The TableGen backends are heavily modified to emit C code. LLVM backends only emit C++ code. So having our fork and rebasing in on top of LLVM is easier and gives the possibility of gettings PRs etc.

grahamwoodward · 2024-04-29T08:17:25Z

The TableGen backends are heavily modified to emit C code. LLVM backends only emit C++ code. So having our fork and rebasing in on top of LLVM is easier and gives the possibility of gettings PRs etc.

would it not be better if LLVM themselves maintained the TG to omit C code? presumably they don't want to?

Rot127 · 2024-04-29T08:49:21Z

We proposed this idea, but there was no real feedback or interest. At least none which translated to actions.
Also because it would require to redesign and rewrite a lot of the backends (see the discussion here: https://reviews.llvm.org/D138323). And because for LLVM internal stuff they work just fine, the priority is understandably very low.
I will push the topic again when I find time to build an argument. Because the problem is more general than just "emit C code".

Rot127 · 2024-05-16T09:04:36Z

@grahamwoodward So it turned out I was partially wrong. The groups are not missing due to faulty definitions, but because they mean something else.

According to AArch64InstrInfo.td (where the groups are defined), these HasXorY features are there because:

A subset of SVE(2) instructions are legal in Streaming SVE execution mode,
they should be enabled if either has been specified.

So an instruction gets this group only if it can be legally executed with SVE and SME.

Now, to answer your two questions:

it appears to have a group count of 1 and that group is HasNEON...my first question is why isn't it also in the HasNEONorSME group?

The naming here is a little unlucky for our use case. In LLVM the Has... naming makes sense, because an assembler checks if a feature is present (target "has" a feature).

So it does Feature_A ==assemble==> instruction_X. But what we did here was Feature_A ==implies==> Feature_A_or_B.
Which is not the original meaning of the flags.

It appears to have a group count of 1 but not HasSME...it is instead HasSMEorSVE...so why this way and why only 1 group again?

Although this is logically valid, I'd prefer to not add this. An instruction can be defined for SME or SVE and work with the other feature enabled.
But adding both HasSME and HasSVE as flags does:

Duplicate information
Implies that the instruction is defined in both ISA extensions (which doesn't need to be the case). And LLVM doesn't tell, in which extension they are defined. Doing this just adds unnecessary manual work (which we try to avoid).

In general I think it is trivial for the user to just add a few switch/match cases to check those features. They are not that many in the end.

I hope this resolves the issue. Going to close this now. But will open another issue about documenting the groups we have. And how they should be interpreted.

grahamwoodward · 2024-05-16T10:06:53Z

@grahamwoodward So it turned out I was partially wrong. The groups are not missing due to faulty definitions, but because they mean something else.

According to AArch64InstrInfo.td (where the groups are defined), these HasXorY features are there because:

A subset of SVE(2) instructions are legal in Streaming SVE execution mode,
they should be enabled if either has been specified.

So an instruction gets this group only if it can be legally executed with SVE and SME.

Now, to answer your two questions:

it appears to have a group count of 1 and that group is HasNEON...my first question is why isn't it also in the HasNEONorSME group?

The naming here is a little unlucky for our use case. In LLVM the Has... naming makes sense, because an assembler checks if a feature is present (target "has" a feature).

So it does Feature_A ==assemble==> instruction_X. But what we did here was Feature_A ==implies==> Feature_A_or_B. Which is not the original meaning of the flags.

It appears to have a group count of 1 but not HasSME...it is instead HasSMEorSVE...so why this way and why only 1 group again?

Although this is logically valid, I'd prefer to not add this. An instruction can be defined for SME or SVE and work with the other feature enabled. But adding both HasSME and HasSVE as flags does:

Duplicate information

Implies that the instruction is defined in both ISA extensions (which doesn't need to be the case). And LLVM doesn't tell, in which extension they are defined. Doing this just adds unnecessary manual work (which we try to avoid).

In general I think it is trivial for the user to just add a few switch/match cases to check those features. They are not that many in the end.

I hope this resolves the issue. Going to close this now. But will open another issue about documenting the groups we have. And how they should be interpreted.

Right so basically I misunderstood what those "HasXorY" groups meant?

Just because an instruction is NEON or is SVE or is SME...it doesn't mean it'll always be in HasNEON and any HasNEONor<insert other group>

What did you mean by

In general I think it is trivial for the user to just add a few switch/match cases to check those features. They are not that many in the end.
?

Rot127 · 2024-05-17T11:52:53Z

Just because an instruction is NEON or is SVE or is SME...it doesn't mean it'll always be in HasNEON and any HasNEONor

Yes, an instruction which is HasNEON is not necessarily in HasNEONorSME.

What did you mean by

In your use case you described, you can also just check for HasNEONorSME to get the information, whether an instruction is executable with an enabled NEON extension.

Rot127 added bug AArch64 Arch LLVM-generated-files LLVM Anything LLVM related labels Apr 26, 2024

Rot127 mentioned this issue Apr 26, 2024

AArch64 update to LLVM 18 #2298

Draft

7 tasks

Rot127 changed the title ~~Group/feature "issue"~~ [AArch64] Instr. with groups HasNEON, don't have HasNEONorSME and similar assigned. Apr 27, 2024

Rot127 added this to the v6 milestone Apr 27, 2024

Rot127 closed this as completed May 16, 2024

Rot127 added wontfix and removed bug labels May 16, 2024

Rot127 mentioned this issue May 16, 2024

Add documentation about LLVM imported FEATURE groups. #2359

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AArch64] Instr. with groups `HasNEON`, don't have `HasNEONorSME` and similar assigned. #2343

[AArch64] Instr. with groups `HasNEON`, don't have `HasNEONorSME` and similar assigned. #2343

grahamwoodward commented Apr 26, 2024

Rot127 commented Apr 26, 2024 •

edited

grahamwoodward commented Apr 26, 2024

Rot127 commented Apr 26, 2024

grahamwoodward commented Apr 26, 2024

grahamwoodward commented Apr 26, 2024

Rot127 commented Apr 27, 2024

grahamwoodward commented Apr 27, 2024

Rot127 commented Apr 27, 2024

Rot127 commented Apr 27, 2024

grahamwoodward commented Apr 27, 2024

Rot127 commented Apr 27, 2024 •

edited

grahamwoodward commented Apr 27, 2024

Rot127 commented Apr 27, 2024

grahamwoodward commented Apr 27, 2024

Rot127 commented Apr 27, 2024

grahamwoodward commented Apr 27, 2024

grahamwoodward commented Apr 27, 2024

Rot127 commented Apr 28, 2024

grahamwoodward commented Apr 29, 2024

Rot127 commented Apr 29, 2024 •

edited

grahamwoodward commented Apr 29, 2024

Rot127 commented Apr 29, 2024

Rot127 commented May 16, 2024

grahamwoodward commented May 16, 2024

Rot127 commented May 17, 2024

[AArch64] Instr. with groups HasNEON, don't have HasNEONorSME and similar assigned. #2343

[AArch64] Instr. with groups HasNEON, don't have HasNEONorSME and similar assigned. #2343

Comments

grahamwoodward commented Apr 26, 2024

Rot127 commented Apr 26, 2024 • edited

grahamwoodward commented Apr 26, 2024

Rot127 commented Apr 26, 2024

grahamwoodward commented Apr 26, 2024

grahamwoodward commented Apr 26, 2024

Rot127 commented Apr 27, 2024

grahamwoodward commented Apr 27, 2024

Rot127 commented Apr 27, 2024

Rot127 commented Apr 27, 2024

grahamwoodward commented Apr 27, 2024

Rot127 commented Apr 27, 2024 • edited

grahamwoodward commented Apr 27, 2024

Rot127 commented Apr 27, 2024

grahamwoodward commented Apr 27, 2024

Rot127 commented Apr 27, 2024

grahamwoodward commented Apr 27, 2024

grahamwoodward commented Apr 27, 2024

Rot127 commented Apr 28, 2024

grahamwoodward commented Apr 29, 2024

Rot127 commented Apr 29, 2024 • edited

grahamwoodward commented Apr 29, 2024

Rot127 commented Apr 29, 2024

Rot127 commented May 16, 2024

grahamwoodward commented May 16, 2024

Rot127 commented May 17, 2024

[AArch64] Instr. with groups `HasNEON`, don't have `HasNEONorSME` and similar assigned. #2343

[AArch64] Instr. with groups `HasNEON`, don't have `HasNEONorSME` and similar assigned. #2343

Rot127 commented Apr 26, 2024 •

edited

Rot127 commented Apr 27, 2024 •

edited

Rot127 commented Apr 29, 2024 •

edited