DirectX Intermediate Language

Introduction

This document presents the design of the DirectX Intermediate Language (DXIL) for GPU shaders. DXIL is intended to support a direct mapping of the HLSL programming language into Low-Level Virtual Machine Intermediate Representation (LLVM IR), suitable for consumption in GPU drivers. This version of the specification is based on LLVM 3.7 in the use of metadata syntax.

Prior to being converted into the low-level DXIL IR, a higher level IR is generated by codegen which is then transformed into DXIL by the optimizer. This lowers high-level constructs, such as user-defined types, multi-dimensional arrays, matrices, and vectors into simpler abstractions more suitable for fast JIT-ing in the driver compilers. DXIL is derived from LLVM IR.

LLVM is quickly becoming a de facto standard in modern compilation technology. The LLVM framework offers several distinct features, such as a vibrant ecosystem, complete compilation framework, modular design, and reasonable documentation. We can leverage these to achieve two important objectives.

First, unification of shader compilation tool chain. DXIL is a contract between IR producers, such as compilers for HLSL and other domain-specific languages, and IR consumers, such as IHV driver JIT compilers or offline XBOX shader compiler. In addition, the design provides for conversion of the legacy HLSL IL, called DXBC IL in this document, to DXIL.

Second, leveraging the LLVM ecosystem. Microsoft will publicly document DXIL to attract domain language implementers and spur innovation. Using LLVM-based IR offers reduced entry costs for small teams, simply because small teams are likely to use LLVM and Clang as their main compilation framework. We will provide DXIL validator to check consistency of generated DXIL.

The following diagram shows how some of these components tie together:

HLSL   Other shading langs  DSL          DXBC IL
+      +                    +            +
|      |                    |            |
v      v                    v            v
Clang  Clang                Other Tools  dxbc2dxil
+      +                    +            +
|      |                    |            |
v      v                    v            |
+------+--------------------+---------+  |
|          High level IR              |  |
+-------------------------------------+  |
                  |                      |
                  |                      |
                  v                      |
              Optimizer <-----+ Linker   |
              +      ^             +     |
              |      |             |     |
              |      |             |     |
 +------------v------+-------------v-----v-------+
 |              Low level IR (DXIL)              |
 +------------+----------------------+-----------+
              |                      |
              v                      v
      Driver Compiler             Validator

The dxbc2dxil element in the diagram is a component that converts existing DXBC shader byte code into DXIL. The Optimizer element is a component that consumes the high level IR, verifies it is valid, optimizes it, and produces a valid DXIL form. The Validator element is a public component that verifies and signs DXIL. The Linker is a component that combines precompiled DXIL libraries with the entry function to produce a valid shader.

DXIL does not support the following HLSL features that were present in prior implementations.

Shader models 9 and below. Microsoft may implement 10level9 shader models via DXIL capability tiers.
Effects.
HLSL interfaces.
Shader compression/decompression.
Partial precision. Half data type should be used instead.
min10float type. Half data type should be used instead.
HLSL uniform parameter qualifier.
Current fxc legacy compatibility mode for old shader models (e.g., c-register binding).
PDB. Debug Information annotations are used instead.
Compute shader model cs_4_0.
DXBC label, call, fcall constructs.

The following principles are used to ease reuse with LLVM components and aid extensibility.

DXIL uses a subset of LLVM IR constructs that makes sense for HLSL.
No modifications to the core LLVM IR; i.e., no new instructions or fundamental types.
Additional information is conveyed via metadata, LLVM intrinsics or external functions.
Name prefixes: 'llvm.dx.', 'llvm.dxil.', 'dx.', and 'dxil.' are reserved.

Versioning

There are three versioning mechanisms in DXIL shaders: shader model, DXIL version, and LLVM bitcode version.

At a high-level, the shader model describes the target execution model and environment.

DXIL defines the rules for expressing Direct3D shader programs using a subset of standard LLVM IR. LLVM IR has three equivalent forms: human-readable, binary (bitcode), and in-memory. DXIL programs are encoded using a subset of LLVM IR bitcode format. This document uses only human-readable form to describe DXIL.

DXIL versioning allows for changes to the rules over time. The LLVM bitcode version is currently fixed at LLVM 3.7 for all DXIL versions.

A given DXIL version can support up to the latest shader model defined at the time that DXIL version was finalized. However, the DXIL version for a shader is typically set based on the shader model to ensure that any device supporting that particular shader model will be able to interpret the DXIL properly, without needing to know about any newer DXIL versions.

Shader Model (SM)

The shader model in DXIL is similar to DXBC shader model. A shader model specifies the execution model, the set of capabilities that shader instructions can use and the constraints that a shader program must adhere to.

The shader model is specified as a named metadata in DXIL:

!dx.shaderModel = !{ !0 }
!0 = !{ !"<shadelModelName>", i32 <major>, i32 <minor> }

The following values of <shaderModelName>, <major>, <minor> are supported:

Shader Tyoe	shaderModelName	Minimum major, minor
Vertex shader (VS)	vs	6, 0
Hull shader (HS)	hs	6, 0
Domain shader (DS)	ds	6, 0
Geometry shader (GS)	gs	6, 0
Pixel shader (PS)	ps	6, 0
Compute shader (CS)	cs	6, 0
Mesh shader (MS)	ms	6, 5
Amplification shader (AS)	as	6, 5
DXIL library	lib	6, 3

The DXIL validator ensures that DXIL conforms to the specified shader model.

DXIL version

The primary mechanism to evolve HLSL capabilities is through shader models. However, DXIL version is reserved for additional flexibility of future extensions. There are two currently defined versions: 1.0 and 1.1.

DXIL version has major and minor versions that are specified as named metadata:

!dx.version = !{ !0 }
!0 = !{ i32 <major>, i32 <minor> }

DXIL version must be declared exactly once per LLVM module (translation unit) and is valid for the entire module.

DXIL will evolve in a manner that retains backward compatibility.

DXIL 1.1 Changes

Main two features that were introduced for DXIL1.1 (Shader Model 6.1) are view instancing and barycentric coordinates. Specifically, there are following changes to the DXIL representation.

New Intrinsics - AttributeAtVertex, ViewID
New System Generated Value - SV_Barycentrics
New Container Part - ILDN

DXIL 1.2 Changes

RawBufferLoad and RawBufferStore DXIL operations for ByteAddressBuffer and StructuredBuffer
Denorm mode as a function attribute for float32 "fp32-denorm-mode"=<value>

LLVM Bitcode version

The current version of DXIL is based on LLVM bitcode v3.7. This encoding is necessarily implied by something outside the DXIL module.

General Issues

An important goal is to enable HLSL to be closer to a strict subset of C/C++. This has implications for DXIL design and future hardware feature requests outlined below.

Terminology

Resource refers to one of the following:

SRV - shader resource view (read-only)
UAV - unordered access view (read-write)
CBV - constant buffer view (read-only)
Sampler

Intrinsics typically refer to operations missing in the core LLVM IR. DXIL represents HLSL built-in functions (also called intrinsics) not as LLVM intrinsics, but rather as external function calls.

DXIL abstraction level

DXIL has level of abstraction similar to a 'scalarized' DXBC. DXIL is a lower level IR amenable to fast and robust JIT-ing in driver compilers.

In particular, the following passes are performed to lower the HLSL abstractions down to DXIL:

optimize function parameter copies
inline functions
allocate and transform shader signatures
lower matrices, optimizing intermediate storage
linearize multi-dimensional arrays and user-defined type accesses
scalarize vectors

Scalar IR

DXIL operations work with scalar quantities. Several scalar quantities may be grouped together in a struct to represent several return values, which is used for memory operations, e.g., load/store, sample, etc., that benefit from access coalescing.

Metadata, resource declarations, and debugging info may contain vectors to more closely convey source code shape to tools and debuggers.

Future versions of IR may contain vectors or grouping hints for less-than-32-bit quantities, such as half and i16.

Memory accesses

DXIL conceptually aligns with DXBC in how different memory types are accessed. Out-of-bounds behavior and various restrictions are preserved.

Indexable thread-local and groupshared variables are represented as variables and accessed via LLVM C-like pointers.

Swizzled resources, such as textures, have opaque memory layouts from a DXIL point of view. Accesses to these resources are done via intrinsics.

There are two layouts for constant buffer memory: (1) legacy, matching DXBC's layout and (2) linear layout. SM6 DXIL uses intrinsics to read cbuffer for either layout.

Shader signatures require packing and are located in a special type of memory that cannot be viewed as linear. Accesses to signature values are done via special intrinsics in DXIL. If a signature parameter needs to be passed to a function, a copy is created first in threadlocal memory and the copy is passed to the function.

Typed buffers represent memory with in-flight data conversion. Typed buffer load/store/atomics are done via special functions in DXIL with element-granularity indexing.

The following pointer types are supported:

Non-indexable thread-local variables.
Indexable thread-local variables (DXBC x-registers).
Groupshared variables (DXBC g-registers).
Device memory pointer.
Constant-buffer-like memory pointer.

The type of DXIL pointer is differentiated by LLVM addrspace construct. The HLSL compiler will make the best effort to infer the exact pointer addrspace such that a driver compiler can issue the most efficient instruction.

A pointer can come into being in a number of ways:

Global Variables.
AllocaInst.
Synthesized as a result of some pointer arithmetic.

DXIL uses 32-bit pointers in its representation.

Out-of-bounds behavior

Indexable thread-local accesses are done via LLVM pointer and have C-like OOB semantics. Groupshared accesses are done via LLVM pointer too. The origin of a groupshared pointer must be a single TGSM allocation. If a groupshared pointer uses in-bound GEP instruction, it should not OOB. The behavior for an OOB access for in-bound pointer is undefined. For groupshared pointer from regular GEP, OOB will has same behavior as DXBC. Loads return 0 for OOB accesses; OOB stores are silently dropped.

Resource accesses keeps the same out-of-bounds behavior as DXBC. Loads return 0 for OOB accesses; OOB stores are silently dropped.

OOB pointer accesses in SM6.0 and later have undefined (C-like) behavior. LLVM memory optimization passes can be used to optimize such accesses. Where out-of-bound behavior is desired, intrinsic functions are used to access memory.

Memory access granularity

Intrinsic and resource accesses may imply a wider access than requested by an instruction. DXIL defines memory accesses for i1, i16, i32, i64, f16, f32, f64 on thread local memory, and i32, f32, f64 for memory I/O (that is, groupshared memory and memory accessed via resources such as CBs, UAVs and SRVs).

Number of virtual values

There is no limit on the number of virtual values in DXIL. The IR is guaranteed to be in an SSA form. For optimized shaders, the optimizer will run -mem2reg LLVM pass as well as perform other memory to register promotions if profitable.

Control-flow restrictions

The DXIL control-flow graph must be reducible, as checked by T1-T2 test. DXIL does not preserve structured control flow of DXBC. Preserving structured control-flow property would impose significant burden on third-party tools optimizing to DXIL via LLVM, reducing appeal of DXIL.

DXIL allows fall-through for switch label blocks. This is a difference from DXBC, in which the fall-through is prohibited.

DXIL will not support the DXBC label and call instructions; LLVM functions can be used instead (see below). The primary uses for these are (1) HLSL interfaces, which are not supported, and (2) outlining of case-bodies in a switch statement annotated with [call], which is not a scenario of interest.

Functions

Instead of DXBC labels/calls, DXIL supports functions and call instructions. Recursion is not allowed; DXIL validator enforces this.

The functions are regular LLVM functions. Parameters can be passed by-value or by-reference. The functions are to facilitate separate compilation for big, complex shaders. However, driver compilers are free to inline functions as they see fit.

In DXIL, only two string function attributes are permitted: 'waveops-include-helper-lanes' and 'fp32-denorm-mode'.

The attribute 'waveops-include-helper-lanes' is utilized to indicate that wave operations should consider helper lanes as active lanes.

'fp32-denorm-mode' is employed to define the denorm mode for the function. The possible values for this attribute can be 'any', 'preserve', or 'ftz'.

Identifiers

DXIL identifiers must conform to LLVM IR identifier rules.

Identifier mangling rules are the ones used by Clang 3.7 with the HLSL target.

The following identifier prefixes are reserved:

dx., dxil.
llvm.dx., llvm.dxil.

Address Width

DXIL will use only 32-bit addresses for pointers. Byte offsets are also 32-bit.

Shader restrictions

There is no support for the following in DXIL:

recursion
exceptions
indirect function calls and dynamic dispatch

Entry points

The dx.entryPoints metadata specifies a list of entry point records, one for each entry point. Libraries could specify more than one entry point per module but currently exist outside the DXIL specification; the other shader models must specify exactly one entry point.

For example:

define void @"\01?myfunc1@@YAXXZ"() #0 { ... }
define float @"\01?myfunc2@@YAMXZ"() #0 { ... }

!dx.entryPoints = !{ !1, !2 }

!1 = !{ void  ()* @"\01?myfunc1@@YAXXZ", !"myfunc1", !3, null, null }
!2 = !{ float ()* @"\01?myfunc2@@YAMXZ", !"myfunc2", !5, !6, !7 }

Each entry point metadata record specifies:

reference to the entry point function global symbol
unmangled name
list of signatures
list of resources
list of tag-value pairs of shader capabilities and other properties

A 'null' value specifies absence of a particular node.

Shader capabilities are properties that are additional to properties dictated by shader model. The list is organized as pairs of i32 tag, followed immediately by the value itself.

Hull shader representation

The hull shader is represented as two functions, related via metadata: (1) control point phase function, which is the entry point of the hull shader, and (2) patch constant phase function.

For example:

!dx.entryPoints = !{ !1 }
!1 = !{ void ()* @"ControlPointFunc", ..., !2 }  ; shader entry record
!2 = !{ !"HS", !3 }
!3 = !{ void ()* @"PatchConstFunc", ... }        ; additional hull shader state

The patch constant function represents original HLSL computation, and is not separated into fork and join phases, as it is the case in DXBC. The driver compiler may perform such separation if this is profitable for the target GPU.

In DXBC to DXIL conversion, the original patch constant function cannot be recovered during DXBC-to-DXIL conversion. Instead, instructions of each fork and join phases are 'wrapped' by a loop that iterates the corresponding number of phase-instance-count iterations. Thus, fork/join instance ID becomes the loop induction variable. LoadPatchConstant intrinsic (see below) represents load from DXBC vpc register.

The following table summarizes the names of intrinsic functions to load inputs and store outputs of hull and domain shaders. CP stands for Control Point, PC - for Patch Constant.

Operation

Control Point (Hull)

Patch Constant

Domain

Store Input CP Load Input CP Store Output CP Load Output CP Store PC Load PC Store Output Vertex

LoadInput StoreOutput

LoadInput

LoadOutputControlPoint StorePatchConstant LoadPatchConstant

LoadInput

LoadPatchConstant StoreOutput

LoadPatchConstant function in PC stage is generated only by DXBC-to-DXIL converter, to access DXBC vpc registers. HLSL compiler produces IR that references LLVM IR values directly.

Type System

Most of LLVM type system constructs are legal in DXIL.

Primitive Types

The following types are supported:

void
metadata
i1, i8, i16, i32, i64
half, float, double

SM6.0 assumes native hardware support for i32 and float types.

i8 is supported only in a few intrinsics to signify masks, enumeration constant values, or in metadata. It's not supported for memory access or computation by the shader.

HLSL min12int, min16int and min16uint data types are mapped to i16.

half and i16 are treated as corresponding DXBC min-presicion types (min16float, min16int/min16uint) in SM6.0.

The HLSL compiler optimizer treats half, i16 and i8 data as data types natively supported by the hardware; i.e., saturation, range clipping, INF/NaN are done according to the IEEE standard. Such semantics allow the optimizer to reuse LLVM optimization passes.

Hardware support for doubles in optional and is guarded by RequiresHardwareDouble CAP bit.

Hardware support for i64 is optional and is guarded by a CAP bit.

Vectors

HLSL vectors are scalarized. They do not participate in computation; however, they may be present in declarations to convey original variable layout to tools, debuggers, and reflection.

Future DXIL may add support for <2 x half> and <2 x i16> vectors or hints for packing related half and i16 quantities.

Matrices

Matrices are lowered to vectors, and are not referenced by instructions. They may be present in declarations to convey original variable layout to tools, debuggers, and reflection.

Arrays

Instructions may reference only 1D arrays of primitive types. However, complex arrays, e.g., multidimensional arrays or user-defined types, may be present to convey original variable layout to tools, debuggers, and reflection.

User-defined types

Original HLSL UDTs are lowered and are not referenced by instructions. However, they may be present in declarations to convey original variable layout to tools, debuggers, and reflection. Some resource operations return 'grouping' UDTs that group several return values; such UDTs are immediately 'decomposed' into components that are then consumed by other instructions.

Type conversions

Explicit conversions between types are supported via LLVM instructions.

Precise qualifier

By default, all floating-point HLSL operations are considered 'fast' or non-precise. HLSL and driver compilers are allowed to refactor such operations. Non-precise LLVM instructions: fadd, fsub, fmul, fdiv, frem, fcmp are marked with 'fast' math flags.

HLSL precise type qualifier requires that all operations contributing to the value be IEEE compliant with respect to optimizations. The /Gis compiler switch implicitly declares all variables and values as precise.

Precise behavior is represented in LLVM instructions: fadd, fsub, fmul, fdiv, frem, fcmp by not having 'fast' math flags set. Each relevant call instruction that contributes to computation of a precise value is annotated with dx.precise metadata that indicates that it is illegal for the driver compiler to perform IEEE-unsafe optimizations.

Type annotations

User-defined types are annotated in DXIL to 'attach' additional properties to structure fields. For example, DXIL may contain type annotations of structures and funcitons for reflection purposes:

namespace MyNameSpace {
  struct MyType {
      float field1;
      int2 field2;
  };
}

float main(float col : COLOR) : SV_Target {
  .....
}

!dx.typeAnnotations = !{!3, !7}
!3 = !{i32 0, %"struct.MyNameSpace::MyType" undef, !4}
!4 = !{i32 12, !5, !6}
!5 = !{i32 6, !"field1", i32 3, i32 0, i32 7, i32 9}
!6 = !{i32 6, !"field2", i32 3, i32 4, i32 7, i32 4}
!7 = !{i32 1, void (float, float*)* @"main", !8}
!8 = !{!9, !11, !14}
!9 = !{i32 0, !10, !10}
!10 = !{}
!11 = !{i32 0, !12, !13}
!12 = !{i32 4, !"COLOR", i32 7, i32 9}
!13 = !{i32 0}
!14 = !{i32 1, !15, !13}
!15 = !{i32 4, !"SV_Target", i32 7, i32 9}
!16 = !{null, !"lib.no::entry", null, null, null}

The type/field annotation metadata hierarchy recursively mimics LLVM type hierarchy. dx.typeAnnotations is a metadata of type annotation nodes, where each node represents type annotation of a certain type:

!dx.typeAnnotations = !{!3, !7}

For each type annotation node, the first value represents the type of the annotation:

!3 = !{i32 0, %"struct.MyNameSpace::MyType" undef, !4}
!7 = !{i32 1, void (float, float*)* @"main", !8}

Idx	Type
0	Structure Annotation
1	Function Annotation

The second value represents the name, the third is a corresponding type metadata node.

Structure Annotation starts with the size of the structure in bytes, followed by the list of field annotations:

!4 = !{i32 12, !5, !6}
!5 = !{i32 6, !"field1", i32 3, i32 0, i32 7, i32 9}
!6 = !{i32 6, !"field2", i32 3, i32 4, i32 7, i32 4}

Field Annotation is a series of pairs with tag number followed by its value. Field Annotation pair is defined as follows

Idx	Type
0	SNorm
1	UNorm
2	Matrix
3	Buffer Offset
4	Semantic String
5	Interpolation Mode
6	Field Name
7	Component Type
8	Precise

Function Annotation is a series of parameter annotations:

!7 = !{i32 1, void (float, float*)* @"main", !8}
!8 = !{!9, !11, !14}

Each Parameter Annotation contains Input/Output type, field annotation, and semantic index:

!9 = !{i32 0, !10, !10}
!10 = !{}
!11 = !{i32 0, !12, !13}
!12 = !{i32 4, !"COLOR", i32 7, i32 9}
!13 = !{i32 0}
!14 = !{i32 1, !15, !13}
!15 = !{i32 4, !"SV_Target", i32 7, i32 9}

Shader Properties and Capabilities

Additional shader properties are specified via tag-value pair list, which is the last element in the entry function description record.

Shader Flags

Shaders have additional flags that covey their capabilities via tag-value pair with tag kDxilShaderFlagsTag (0), followed by an i64 bitmask integer. The bits have the following meaning:

Bit	Description
0	Disable shader optimizations
1	Disable math refactoring
2	Shader uses doubles
3	Force early depth stencil
4	Enable raw and structured buffers
5	Shader uses min-precision, expressed as half and i16
6	Shader uses double extension intrinsics
7	Shader uses MSAD
8	All resources must be bound for the duration of shader execution
9	Enable view port and RT array index from any stage feeding rasterizer
10	Shader uses inner coverage
11	Shader uses stencil
12	Shader uses intrinsics that access tiled resources
13	Shader uses relaxed typed UAV load formats
14	Shader uses Level9 comparison filtering
15	Shader uses up to 64 UAVs
16	Shader uses UAVs
17	Shader uses CS4 raw and structured buffers
18	Shader uses Rasterizer Ordered Views
19	Shader uses wave intrinsics
20	Shader uses int64 instructions

Geometry Shader

Geometry shader properties are specified via tag-value pair with tag kDxilGSStateTag (1), followed by a list of GS properties. The format of this list is the following.

Idx	Type	Description
0	i32	Input primitive (InputPrimitive enum value).
1	i32	Max vertex count.
2	i32	Primitive topology for stream 0 (PrimitiveTopology enum value).
3	i32	Primitive topology for stream 1 (PrimitiveTopology enum value).
4	i32	Primitive topology for stream 2 (PrimitiveTopology enum value).
5	i32	Primitive topology for stream 3 (PrimitiveTopology enum value).

Domain Shader

Domain shader properties are specified via tag-value pair with tag kDxilDSStateTag (2), followed by a list of DS properties. The format of this list is the following.

Idx	Type	Description
0	i32	Tessellator domain (TessellatorDomain enum value).
1	i32	Input control point count.

Hull Shader

Hull shader properties are specified via tag-value pair with tag kDxilHSStateTag (3), followed by a list of HS properties. The format of this list is the following.

Idx	Type	Description
0	MDValue	Patch constant function (global symbol).
1	i32	Input control point count.
2	i32	Output control point count.
3	i32	Tessellator domain (TessellatorDomain enum value).
4	i32	Tessellator partitioning (TessellatorPartitioning enum value).
5	i32	Tessellator output primitive (TessellatorOutputPrimitive enum value).
6	float	Max tessellation factor.

Compute Shader

Compute shader has the following tag-value properties.

Tag	Value	Description
kDxilNumThreadsTag(4)	MD list: (i32, i32, i32)	Number of threads (X,Y,Z) for compute shader.
kDxilWaveSizeTag	MD list: (i32)	Wave size the shader is compatible with (optional).

Shader Parameters and Signatures

This section formalizes how HLSL shader input and output parameters are expressed in DXIL.

HLSL signatures and semantics

Formal parameters of a shader entry function in HLSL specify how the shader interacts with the graphics pipeline. Input parameters, referred to as an input signature, specify values received by the shader. Output parameters, referred to as an output signature, specify values produced by the shader. The shader compiler maps HLSL input and output signatures into DXIL specifications that conform to hardware constraints outlined in the Direct3D Functional Specification. DXIL specifications are also called signatures.

Signature mapping is a complex process, as there are many constraints. All signature parameters must fit into a finite space of N 4x32-bit registers. For efficiency reasons, parameters are packed together in a way that does not violate specification constraints. The process is called signature packing. Most signatures are tightly packed; however, the VS input signature is not packed, as the values are coming from the Input Assembler (IA) stage rather than the graphics pipeline. Alternately, the PS output signature is allocated to align the SV_Target semantic index with the output register index.

Each HLSL signature parameter is defined via C-like type, interpolation mode, and semantic name and index. The type defines parameter shape, which may be quite complex. Interpolation mode adds to the packing constraints, namely that parameters packed together must have compatible interpolation modes. Semantics are extra names associated with parameters for the following purposes: (1) to specify whether a parameter is as a special System Value (SV) or not, (2) to link parameters to IA or StreamOut API streams, and (3) to aid debugging. Semantic index is used to disambiguate parameters that use the same semantic name, or span multiple rows of the register space.

SV semantics add specific meanings and constraints to associated parameters. A parameter may be supplied by the hardware, and is then known as a System Generated Value (SGV). Alternatively, a parameter may be interpreted by the hardware and is then known as System Interpreted Value (SIV). SGVs and SIVs are pipeline-stage dependent; moreover, some participate in signature packing and some do not. Non-SV semantics always participate in signature packing.

Most System Generated Values (SGV) are loaded using special Dxil intrinsic functions, rather than loading the input from a signature. These usually will not be present in the signature at all. Their presence may be detected by the declaration and use of the special instrinsic function itself. The exceptions to this are notible. In one case they are present and loaded from the signature instead of a special intrinsic because they must be part of the packed signature potentially passed from the prior stage, allowing the prior stage to override these values, such as for SV_PrimitiveID and SV_IsFrontFace that may be written in the the Geometry Shader. In another case, they identify signature elements that still contribute to DXBC signature for informational purposes, but will only use the special intrinsic function to read the value, such as for SV_PrimitiveID for GS input and SampleIndex for PS input.

The classification of behavior for various system values in various signature locations is described in a table organized by SemanticKind and SigPointKind. The SigPointKind is a new classification that uniquely identifies each set of parameters that may be input or output for each entry point. For each combination of SemanticKind and SigPointKind, there is a SemanticInterpretationKind that defines the class of treatment for that location.

Each SigPointKind also has a corresponding element allocation (or packing) behavior called PackingKind. Some SigPointKinds do not result in a signature at all, which corresponds to the packing kind of PackingKind::None.

Signature Points are enumerated as follows in the SigPointKind

ID	SigPoint	Related	ShaderKind	PackingKind	SignatureKind	Description
0	VSIn	Invalid	Vertex	InputAssembler	Input	Ordinary Vertex Shader input from Input Assembler
1	VSOut	Invalid	Vertex	Vertex	Output	Ordinary Vertex Shader output that may feed Rasterizer
2	PCIn	HSCPIn	Hull	None	Invalid	Patch Constant function non-patch inputs
3	HSIn	HSCPIn	Hull	None	Invalid	Hull Shader function non-patch inputs
4	HSCPIn	Invalid	Hull	Vertex	Input	Hull Shader patch inputs - Control Points
5	HSCPOut	Invalid	Hull	Vertex	Output	Hull Shader function output - Control Point
6	PCOut	Invalid	Hull	PatchConstant	PatchConstOrPrim	Patch Constant function output - Patch Constant data passed to Domain Shader
7	DSIn	Invalid	Domain	PatchConstant	PatchConstOrPrim	Domain Shader regular input - Patch Constant data plus system values
8	DSCPIn	Invalid	Domain	Vertex	Input	Domain Shader patch input - Control Points
9	DSOut	Invalid	Domain	Vertex	Output	Domain Shader output - vertex data that may feed Rasterizer
10	GSVIn	Invalid	Geometry	Vertex	Input	Geometry Shader vertex input - qualified with primitive type
11	GSIn	GSVIn	Geometry	None	Invalid	Geometry Shader non-vertex inputs (system values)
12	GSOut	Invalid	Geometry	Vertex	Output	Geometry Shader output - vertex data that may feed Rasterizer
13	PSIn	Invalid	Pixel	Vertex	Input	Pixel Shader input
14	PSOut	Invalid	Pixel	Target	Output	Pixel Shader output
15	CSIn	Invalid	Compute	None	Invalid	Compute Shader input
16	MSIn	Invalid	Mesh	None	Invalid	Mesh Shader input
17	MSOut	Invalid	Mesh	Vertex	Output	Mesh Shader vertices output
18	MSPOut	Invalid	Mesh	Vertex	PatchConstOrPrim	Mesh Shader primitives output
19	ASIn	Invalid	Amplification	None	Invalid	Amplification Shader input

Semantic Interpretations are as follows (SemanticInterpretationKind)

ID	Name	Description
0	NA	Not Available
1	SV	Normal System Value
2	SGV	System Generated Value (sorted last)
3	Arb	Treated as Arbitrary
4	NotInSig	Not included in signature (intrinsic access)
5	NotPacked	Included in signature, but does not contribute to packing
6	Target	Special handling for SV_Target
7	TessFactor	Special handling for tessellation factors
8	Shadow	Shadow element must be added to a signature for compatibility
8	ClipCull	Special packing rules for SV_ClipDistance or SV_CullDistance

Semantic Interpretations for each SemanticKind at each SigPointKind are as follows

Semantic	VSIn	VSOut	PCIn	HSIn	HSCPIn	HSCPOut	PCOut	DSIn	DSCPIn	DSOut	GSVIn	GSIn	GSOut	PSIn	PSOut	CSIn	MSIn	MSOut	MSPOut	ASIn
Arbitrary	Arb	Arb	NA	NA	Arb	Arb	Arb	Arb	Arb	Arb	Arb	NA	Arb	Arb	NA	NA	NA	Arb	Arb	NA
VertexID	SV	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
InstanceID	SV	Arb	NA	NA	Arb	Arb	NA	NA	Arb	Arb	Arb	NA	Arb	Arb	NA	NA	NA	NA	NA	NA
Position	Arb	SV	NA	NA	SV	SV	Arb	Arb	SV	SV	SV	NA	SV	SV	NA	NA	NA	SV	NA	NA
RenderTargetArrayIndex	Arb	SV	NA	NA	SV	SV	Arb	Arb	SV	SV	SV	NA	SV	SV	NA	NA	NA	NA	SV	NA
ViewPortArrayIndex	Arb	SV	NA	NA	SV	SV	Arb	Arb	SV	SV	SV	NA	SV	SV	NA	NA	NA	NA	SV	NA
ClipDistance	Arb	ClipCull	NA	NA	ClipCull	ClipCull	Arb	Arb	ClipCull	ClipCull	ClipCull	NA	ClipCull	ClipCull	NA	NA	NA	ClipCull	NA	NA
CullDistance	Arb	ClipCull	NA	NA	ClipCull	ClipCull	Arb	Arb	ClipCull	ClipCull	ClipCull	NA	ClipCull	ClipCull	NA	NA	NA	ClipCull	NA	NA
OutputControlPointID	NA	NA	NA	NotInSig	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
DomainLocation	NA	NA	NA	NA	NA	NA	NA	NotInSig	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
PrimitiveID	NA	NA	NotInSig	NotInSig	NA	NA	NA	NotInSig	NA	NA	NA	Shadow	SGV	SGV	NA	NA	NA	NA	SV	NA
GSInstanceID	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NotInSig	NA	NA	NA	NA	NA	NA	NA	NA
SampleIndex	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	Shadow _41	NA	NA	NA	NA	NA	NA
IsFrontFace	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	SGV	SGV	NA	NA	NA	NA	NA	NA
Coverage	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NotInSig _50	NotPacked _41	NA	NA	NA	NA	NA
InnerCoverage	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NotInSig _50	NA	NA	NA	NA	NA	NA
Target	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	Target	NA	NA	NA	NA	NA
Depth	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NotPacked	NA	NA	NA	NA	NA
DepthLessEqual	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NotPacked _50	NA	NA	NA	NA	NA
DepthGreaterEqual	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NotPacked _50	NA	NA	NA	NA	NA
StencilRef	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NotPacked _50	NA	NA	NA	NA	NA
DispatchThreadID	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NotInSig	NotInSig	NA	NA	NotInSig
GroupID	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NotInSig	NotInSig	NA	NA	NotInSig
GroupIndex	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NotInSig	NotInSig	NA	NA	NotInSig
GroupThreadID	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NotInSig	NotInSig	NA	NA	NotInSig
TessFactor	NA	NA	NA	NA	NA	NA	TessFactor	TessFactor	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
InsideTessFactor	NA	NA	NA	NA	NA	NA	TessFactor	TessFactor	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
ViewID	NotInSig _61	NA	NotInSig _61	NotInSig _61	NA	NA	NA	NotInSig _61	NA	NA	NA	NotInSig _61	NA	NotInSig _61	NA	NA	NotInSig	NA	NA	NA
Barycentrics	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NotPacked _61	NA	NA	NA	NA	NA	NA
ShadingRate	NA	SV _64	NA	NA	SV _64	SV _64	NA	NA	SV _64	SV _64	SV _64	NA	SV _64	SV _64	NA	NA	NA	NA	SV	NA
CullPrimitive	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NotInSig	NA	NA	NA	NA	NotPacked	NA
StartVertexLocation	NotInSig _68	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
StartInstanceLocation	NotInSig _68	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA

Below is a vertex shader example that is used for illustration throughout this section:

struct Foo {
  float a;
  float b[2];
};

struct VSIn {
  uint    vid     : SV_VertexID;
  float3  pos     : Position;
  Foo     foo[3]  : SemIn1;
  float   f       : SemIn10;
};

struct VSOut
{
  float   f       : SemOut1;
  Foo     foo[3]  : SemOut2;
  float4  pos     : SV_Position;
};

void main(in  VSIn  In,    // input  signature
          out VSOut Out)   // output signature
{
  ...
}

Signature packing must be efficient. It should use as few registers as possible, and the packing algorithm should run in reasonable time. The complication is that the problem is NP complete, and the algorithm needs to resort to using a heuristic.

While the details of the packing algorithm are not important at the moment, it is important to outline some concepts related to how a packed signature is represented in DXIL. Packing is further complicated by the complexity of parameter shapes induced by the C/C++ type system. In the example above, fields of Out.foo array field are actually arrays themselves, strided in memory. Allocating such strided shapes efficiently is hard. To simplify packing, the first step is to break user-defined (struct) parameters into constituent components and to make strided arrays contiguous. This preparation step enables the algorithm to operate on dense rectangular shapes, which we call signature elements. The output signature in the example above has the following elements: float Out_f, float Out_foo_a[3], float Out_foo_b[2][3], and float4 pos. Each element is characterized by the number of rows and columns. These are 1x1, 3x1, 6x1, and 1x4, respectively. The packing algorithm reduces to fitting these elements into Nx4 register space, satisfying all packing-compatibility constraints.

Signature element record

Each signature element is represented in DXIL as a metadata record.

For above example output signature, the element records are as follows:

;  element ID, semantic name, etype, sv, s.idx, interp,  rows, cols, start row, col, ext. list
!20 = !{i32 6, !"SemOut",      i8 0, i8 0, !40,   i8 2, i32 1, i8 1, i32 1,    i8 2, null}
!21 = !{i32 7, !"SemOut",      i8 0, i8 0, !41,   i8 2, i32 3, i8 1, i32 1,    i8 1, null}
!22 = !{i32 8, !"SemOut",      i8 0, i8 0, !42,   i8 2, i32 6, i8 1, i32 1,    i8 0, null}
!23 = !{i32 9, !"SV_Position", i8 0, i8 3, !43,   i8 2, i32 1, i8 4, i32 0,    i8 0, null}

A record contains the following fields.

Idx	Type	Description
0	i32	Unique signature element record ID, used to identify the element in operations.
1	String metadata	Semantic name.
2	i8	ComponentType (enum value).
3	i8	SemanticKind (enum value).
4	Metadata	Metadata list that enumerates all semantic indexes of the flattened parameter.
5	i8	InterpolationMode (enum value).
6	i32	Number of element rows.
7	i8	Number of element columns.
8	i32	Starting row of element packing location.
9	i8	Starting column of element packing location.
10	Metadata	Metadata list of additional tag-value pairs; can be 'null' or empty.

Semantic name system values always start with 'S', 'V', '_' , and it is illegal to start a user semantic with this prefix. Non-SVs can be ignored by drivers. Debug layers may use these to help validate signature compatibility between stages.

The last metadata list is used to specify additional properties and future extensions.

Signature record metadata

A shader typically has two signatures: input and output, while domain shader has an additional patch constant signature. The signatures are composed of signature element records and are attached to the shader entry metadata. The examples below clarify metadata details.

Vertex shader HLSL

Here is the HLSL of the above vertex shader. The semantic index assignment is explained in section below:

struct Foo
{
  float a;
  float b[2];
};

struct VSIn
{
  uint    vid     : SV_VertexID;
  float3  pos     : Position;
  Foo     foo[3]  : SemIn1;
    // semantic index assignment:
    // foo[0].a     : SemIn1
    // foo[0].b[0]  : SemIn2
    // foo[0].b[1]  : SemIn3
    // foo[1].a     : SemIn4
    // foo[1].b[0]  : SemIn5
    // foo[1].b[1]  : SemIn6
    // foo[2].a     : SemIn7
    // foo[2].b[0]  : SemIn8
    // foo[2].b[1]  : SemIn9
  float   f       : SemIn10;
};

struct VSOut
{
  float   f       : SemOut1;
  Foo     foo[3]  : SemOut2;
    // semantic index assignment:
    // foo[0].a     : SemOut2
    // foo[0].b[0]  : SemOut3
    // foo[0].b[1]  : SemOut4
    // foo[1].a     : SemOut5
    // foo[1].b[0]  : SemOut6
    // foo[1].b[1]  : SemOut7
    // foo[2].a     : SemOut8
    // foo[2].b[0]  : SemOut9
    // foo[2].b[1]  : SemOut10
  float4  pos     : SV_Position;
};

void main(in  VSIn  In,    // input  signature
          out VSOut Out)   // output signature
{
  ...
}

The input signature is packed to be compatible with the IA stage. A packing algorithm must assign the following starting positions to the input signature elements:

Input element	Rows	Columns	Start row
uint VSIn.vid	1	1	0
float3 VSIn.pos	1	3	1
float VSIn.foo.a[3]	3	1	2
float VSIn.foo.b[6]	6	1	5
float VSIn.f	1	1	11

A reasonable packing algorithm would assign the following starting positions to the output signature elements:

Input element	Rows	Columns	Start row	Start column
uint VSOut.f	1	1	1	2
float VSOut.foo.a[3]	3	1	1	1
float VSOut.foo.b[6]	6	1	1	0
float VSOut.pos	1	4	0	0

Semantic index assignment

Semantic index assignment in DXIL is exactly the same as for DXBC. Semantic index assignment, abbreviated s.idx above, is a consecutive enumeration of all fields under the same semantic name as if the signature were packed for the IA stage. That is, given a complex signature element, e.g., VSOut's foo[3] with semantic name SemOut and starting index 2, the element is flattened into individual fields: foo[0].a, foo[0].b[0], ..., foo[2].b[1], and the fields receive consecutive semantic indexes 2, 3, ..., 10, respectively. Semantic-index pairs are used to set up the IA stage and to capture values of individual signature registers via the StreamOut API.

DXIL for VS signatures

The corresponding DXIL metadata is presented below:

!dx.entryPoints = !{ !1 }
!1 = !{ void @main(), !"main", !2, null, null }
; Signatures: In,   Out,  Patch Constant (optional)
!2 = !{       !3,   !4,   null }

; Input signature (packed accordiong to IA rules)
!3 = !{ !10, !11, !12, !13, !14 }
; element idx, semantic name, etype, sv, s.idx, interp,  rows, cols, start row, col, ext. list
!10 = !{i32 1, !"SV_VertexID", i8 0, i8 1, !30,  i32 0, i32 1, i8 1, i32 0,    i8 0, null}
!11 = !{i32 2, !"Position",    i8 0, i8 0, !30,  i32 0, i32 1, i8 3, i32 1,    i8 0, null}
!12 = !{i32 3, !"SemIn",       i8 0, i8 0, !32,  i32 0, i32 3, i8 1, i32 2,    i8 0, null}
!13 = !{i32 4, !"SemIn",       i8 0, i8 0, !33,  i32 0, i32 6, i8 1, i32 5,    i8 0, null}
!14 = !{i32 5, !"SemIn",       i8 0, i8 0, !34,  i32 0, i32 1, i8 1, i32 11,   i8 0, null}
; semantic index assignment:
!30 = !{ i32 0 }
!32 = !{ i32 1, i32 4, i32 7 }
!33 = !{ i32 2, i32 3, i32 5, i32 6, i32 8, i32 9 }
!34 = !{ i32 10 }

; Output signature (tightly packed according to pipeline stage packing rules)
!4 = !{ !20, !21, !22, !23 }
;  element ID, semantic name, etype, sv, s.idx, interp,  rows, cols, start row, col, ext. list
!20 = !{i32 6, !"SemOut",      i8 0, i8 0, !40,  i32 2, i32 1, i8 1, i32 1,    i8 2, null}
!21 = !{i32 7, !"SemOut",      i8 0, i8 0, !41,  i32 2, i32 3, i8 1, i32 1,    i8 1, null}
!22 = !{i32 8, !"SemOut",      i8 0, i8 0, !42,  i32 2, i32 6, i8 1, i32 1,    i8 0, null}
!23 = !{i32 9, !"SV_Position", i8 0, i8 3, !43,  i32 2, i32 1, i8 4, i32 0,    i8 0, null}
; semantic index assignment:
!40 = !{ i32 1 }
!41 = !{ i32 2, i32 5, i32 8 }
!42 = !{ i32 3, i32 4, i32 6, i32 7, i32 9, i32 10 }
!43 = !{ i32 0 }

Hull shader example

A hull shader (HS) is defined by two entry point functions: control point (CP) function to compute control points, and patch constant (PC) function to compute patch constant data, including the tessellation factors. The inputs to both functions are the input control points for an entire patch, and therefore each element may be indexed by row and, in addition, is indexed by vertex.

Here is an HS example entry point metadata and signature list:

; !105 is extended parameter list containing reference to HS State:
!101 = !{ void @HSMain(), !"HSMain", !102, null, !105 }
; Signatures: In,   Out,  Patch Constant
!102 = !{     !103, !104, !204 }

The entry point record specifies: (1) CP function HSMain as the main symbol, and (2) PC function via optional metadata node !105.

CP-input signature describing one input control point:

!103 = !{ !110, !111 }
;  element ID, semantic name, etype, sv, s.idx, interp,  rows, cols, start row, col, ext. list
!110= !{i32 1, !"SV_Position", i8 0, i8 3, !130, i32 0, i32 1, i8 4, i32 0,    i8 0, null}
!111= !{i32 2, !"array",       i8 0, i8 0, !131, i32 0, i32 4, i8 3, i32 1,    i8 0, null}
; semantic indexing for flattened elements:
!130 = !{ i32 0 }
!131 = !{ i32 0, i32 1, i32 2, i32 3 }

Note that SV_OutputControlPointID and SV_PrimitiveID input elements are SGVs loaded through special Dxil intrinsics, and are not present in the signature at all. These have a semantic interpretation of SemanticInterpretationKind::NotInSig.

CP-output signature describing one output control point:

!104 = !{ !120, !121 }
;  element ID, semantic name, etype, sv, s.idx, interp,  rows, cols, start row, col, ext. list
!120= !{i32 3, !"SV_Position", i8 0, i8 3, !130, i32 0, i32 1, i8 4, i32 0,    i8 0, null}
!121= !{i32 4, !"array",       i8 0, i8 0, !131, i32 0, i32 4, i8 3, i32 1,    i8 0, null}

Hull shaders require an extended parameter that defines extra state:

; extended parameter HS State
!105 = !{ i32 3, !201 }

; HS State record defines patch constant function and other properties
; Patch Constant Function, in CP count, out CP count, tess domain, tess part, out prim, max tess factor
!201 = !{  void @PCMain(), 4,           4,            3,           1,         3,        16.0 }

PC-output signature:

!204 = !{ !220, !221, !222 }
;  element ID, semantic name,         etype,   sv, s.idx,  interp, rows, cols, start row, col, ext. list
!220= !{i32 3, !"SV_TessFactor",       i8 0, i8 25, !130,  i32 0, i32 4, i8 1, i32 0, i8 3, null}
!221= !{i32 4, !"SV_InsideTessFactor", i8 0, i8 26, !231,  i32 0, i32 2, i8 1, i32 4, i8 3, null}
!222= !{i32 5, !"array",               i8 0, i8 0,  !131,  i32 0, i32 4, i8 3, i32 0, i8 0, null}
; semantic indexing for flattened elements:
!231 = !{ i32 0, i32 1 }

Accessing signature value in operations

There are no function parameters or variables that correspond to signature elements. Instead loadInput and storeOutput functions are used to access signature element values in operations. The accesses are scalar.

These are the operation signatures:

; overloads: SM5.1: f16|f32|i16|i32,  SM6.0: f16|f32|f64|i8|i16|i32|i64
declare float @dx.op.loadInput.f32(
    i32,                            ; opcode
    i32,                            ; input ID
    i32,                            ; row (relative to start row of input ID)
    i8,                             ; column (relative to start column of input ID), constant in [0,3]
    i32)                            ; vertex index

; overloads: SM5.1: f16|f32|i16|i32,  SM6.0: f16|f32|f64|i8|i16|i32|i64
declare void @dx.op.storeOutput.f32(
    i32,                            ; opcode
    i32,                            ; output ID
    i32,                            ; row (relative to start row of output ID)
    i8,                             ; column (relative to start column of output ID), constant in [0,3]
    float)                          ; value to store

LoadInput/storeOutput takes input/output element ID, which is the unique ID of a signature element metadata record. The row parameter is the array element row index from the start of the element; the register index is obtained by adding the start row of the element and the row parameter value. Similarly, the column parameter is relative column index; the packed register component is obtained by adding the start component of the element (packed col) and the column value. Several overloads exist to access elements of different primitive types. LoadInput takes an additional vertex index parameter that represents vertex index for DS CP-inputs and GS inputs; vertex index must be undef in other cases.

Signature packing

Signature elements must be packed into a space of N 4-32-bit registers according to runtime constraints. DXIL contains packed signatures. The packing algorithm is more aggressive than that for DX11. However, DXIL packing is only a suggestion to the driver implementation. Driver compilers can rearrange signature elements as they see fit, while preserving compatibility of connected pipeline stages. DXIL is designed in such a way that it is easy to 'relocate' signature elements - loadInput/storeOutput row and column indices do not need to change since they are relative to the start row/column for each element.

Signature packing types

Two pipeline stages can connect in four different ways, resulting in four packing types.

Input Assembly: VS input only
- Elements all map to unique registers, they may not be packed together.
- Interpolation mode is not used.
Connects to Rasterizer: VS output, HS CP-input/output and PC-input, DS CP-input/output, GS input/output, PS input
- Elements can be packed according to constraints.
- Interpolation mode is used and must be consistent between connecting signatures.
- While HS CP-output and DS CP-input signatures do not go through the rasterizer, they are still treated as such. The reason is the pass-through HS case, in which HS CP-input and HS CP-output must have identical packing for efficiency.
Patch Constant: HS PC-output, DS PC-input
- SV_TessFactor and SV_InsideTessFactor are the only SVs relevant here, and this is the only location where they are legal. These have special packing considerations.
- Interpolation mode is not used.
Pixel Shader Output: PS output only
- Only SV_Target maps to output register space.
- No packing is performed, semantic index corresponds to render target index.

Packing constraints

The packing algorithm is stricter and more aggressive in DXIL than in DXBC, although still compatible. In particular, array signature elements are not broken up into scalars, even if each array access can be disambiguated to a literal index. DXIL and DXBC signature packing are not identical, so linking them together into a single pipeline is not supported across compiler generations.

The row dimension of a signature element represents an index range. If constraints permit, two adjacent or overlapping index ranges are coalesced into a single index range.

Packing constraints are as follows:

A register must have only one interpolation mode for all 4 components.
Register components containing SVs must be to the right of components containing non-SVs.
SV_ClipDistance and SV_CullDistance have additional constraints:
1. May be packed together
2. Must occupy a maximum of 2 registers (8-components)
3. SV_ClipDistance must have linear interpolation mode
Registers containing SVs may not be within an index range, with the exception of Tessellation Factors (TessFactors).
If an index range R1 overlaps with a TessFactor index range R2, R1 must be contained within R2. As a consequence, outside and inside TessFactors occupy disjoint index ranges when packed.
Non-TessFactor index ranges are combined into a larger range, if they overlap.
SGVs must be packed after all non-SGVs have been packed. If there are several SGVs, they are packed in the order of HLSL declaration.

Packing for SGVs

Non-SGV portions of two connecting signatures must match; however, SGV portions don't have to. An example would be a PS declaring SV_PrimitiveID as an input. If VS connects to PS, PS's SV_PrimitiveID value is synthesized by hardware; moreover, it is illegal to output SV_PrimitiveID from a VS. If GS connects PS, GS may declare SV_PrimitiveID as its output.

Unfortunately, SGV specification creates a complication for separate compilation of connecting shaders. For example, GS outputs SV_PrimitiveID, and PS inputs SV_IsFrontFace and SV_PrimitiveID in this order. The positions of SV_PrimitiveID are incompatible in GS and PS signatures. Not much can be done about this ambiguity in SM5.0 and earlier; the programmers will have to rely on SDKLayers to catch potential mismatch.

SM5.1 and later shaders work on D3D12+ runtime that uses PSO objects to describe pipeline state. Therefore, a driver compiler has access to both connecting shaders during compilation, even though the HLSL compiler does not. The driver compiler can resolve SGV ambiguity in signatures easily. For SM5.1 and later, the HLSL compiler will ensure that declared SGVs fit into packed signature; however, it will set SGV's start row-column location to (-1, 0) such that the driver compiler must resolve SGV placement during PSO compilation.

Shader Resources

All global resources referenced by entry points of an LLVM module are described via named metadata dx.resources, which consists of four metadata lists of resource records:

!dx.resources = !{ !1, !2, !3, !4 }

Resource lists are as follows.

Idx	Type	Description
0	Metadata	SRVs - shader resource views.
1	Metadata	UAVs - unordered access views.
2	Metadata	CBVs - constant buffer views.
3	Metadata	Samplers.

Metadata resource records

Each resource list contains resource records. Each resource record contains fields that are common for each resource type, followed by fields specific to each resource type, followed by a metadata list of tag/value pairs, which can be used to specify additional properties or future extensions and may be null or empty.

Common fields:

Idx	Type	Description
0	i32	Unique resource record ID, used to identify the resource record in createHandle operation.
1	Pointer	Pointer to a global constant symbol with the original shape of resource and element type.
2	Metadata string	Name of resource variable.
3	i32	Bind space ID of the root signature range that corresponds to this resource.
4	i32	Bind lower bound of the root signature range that corresponds to this resource.
5	i32	Range size of the root signature range that corresponds to this resource.

When the shader has reflection information, the name is the original, unmangled HLSL name. If reflection is stripped, the name is empty string.

SRV-specific fields:

Idx	Type	Description
6	i32	SRV resource shape (enum value).
7	i32	SRV sample count.
8	Metadata	Metadata list of additional tag-value pairs.

SRV-specific tag/value pairs:

Idx	Tag	Type	Resource Type	Description
0	0	i32	Any resource, except RawBuffer and StructuredBuffer	Element type.
1	1	i32	StructuredBuffer	Element stride or StructureBuffer, in bytes.

The symbol names for the are kDxilTypedBufferElementTypeTag (0) and kDxilStructuredBufferElementStrideTag (1).

UAV-specific fields:

Idx	Type	Description
6	i32	UAV resource shape (enum value).
7	i1	1 - globally-coherent UAV; 0 - otherwise.
8	i1	1 - UAV has counter; 0 - otherwise.
9	i1	1 - UAV is ROV (rasterizer ordered view); 0 - otherwise.
10	Metadata	Metadata list of additional tag-value pairs.

UAV-specific tag/value pairs:

Idx	Tag	Type	Resource Type	Description
0	0	i32	RW resource, except RWRawBuffer and RWStructuredBuffer	Element type.
1	1	i32	RWStructuredBuffer	Element stride or StructureBuffer, in bytes.

The symbol names for the are kDxilTypedBufferElementTypeTag (0) and kDxilStructuredBufferElementStrideTag (1).

CBV-specific fields:

Idx	Type	Description
6	i32	Constant buffer size in bytes.
7	Metadata	Metadata list of additional tag-value pairs.

Sampler-specific fields:

Idx	Type	Description
6	i32	Sampler type (enum value).
7	Metadata	Metadata list of additional tag-value pairs.

The following example demonstrates SRV metadata:

; Original HLSL
; Texture2D<float4> MyTexture2D : register(t0, space0);
; StructuredBuffer<NS1::MyType1> MyBuffer[2][3] : register(t1, space0);

!1 = !{ !2, !3 }

; Scalar resource: Texture2D<float4> MyTexture2D.
%dx.types.ResElem.v4f32 = type { <4 x float> }
@MyTexture2D = external addrspace(1) constant %dx.types.ResElem.v4f32, align 16
!2 = !{ i32 0, %dx.types.ResElem.v4f32 addrspace(1)* @MyTexture2D, !"MyTexture2D",
        i32 0, i32 0, i32 1, i32 2, i32 0, null }

; Array resource: StructuredBuffer<MyType1> MyBuffer[2][3].
%struct.NS1.MyType1 = type { float, <2 x i32> }
%dx.types.ResElem.NS1.MyType1 = type { %struct.NS1.MyType1 }
@MyBuffer = external addrspace(1) constant [2x [3 x %dx.types.ResElem.NS1.MyType1]], align 16
!3 = !{ i32 1, [2 x [3 x %dx.types.ResElem.NS1.MyType1]] addrspace(1)* @MyBuffer, !"MyBuffer",
        i32 0, i32 1, i32 6, i32 11, i32 0, null }

The type name of the variable is constructed by appending the element name (primitive, vector or UDT name) to dx.types.ResElem prefix. The type configuration of the resource range variable conveys (1) resource range shape and (2) resource element type.

Reflection information

Resource reflection data is conveyed via the resource's metadata record and global, external variable. The metadata record contains the original HLSL name, root signature range information, and the reference to the global resource variable declaration. The resource variable declaration conveys resource range shape, resource type and resource element type.

The following disassembly provides an example:

; Scalar resource: Texture2D<float4> MyTexture2D.
%dx.types.ResElem.v4f32 = type { <4 x float> }
@MyTexture2D = external addrspace(1) constant %dx.types.ResElem.v4f32, align 16
!0 = !{ i32 0, %dx.types.ResElem.v4f32 addrspace(1)* @MyTexture2D, !"MyTexture2D",
        i32 0, i32 3, i32 1, i32 2, i32 0, null }

; struct MyType2 { float4 field1; int2 field2; };
; Constant buffer: ConstantBuffer<MyType2> MyCBuffer1[][3] : register(b5, space7)
%struct.MyType2 = type { <4 x float>, <2 x i32> }
; Type reflection information (optional)
!struct.MyType2 = !{ !1, !2 }
!1 = !{ !"field1", null }
!2 = !{ !"field2", null }

%dx.types.ResElem.MyType1 = type { %struct.MyType2 }

@MyCBuffer1 = external addrspace(1) constant [0 x [3 x %dx.types.ResElem.MyType2]], align 16

!3 = !{ i32 0, [0 x [3 x %dx.types.ResElem.MyType1]] addrspace(1)* @MyCBuffer1, !"MyCBuffer1",
        i32 7, i32 5, i32 -1, null }

The reflection information can be removed from DXIL by obfuscating the resource HLSL name and resource variable name as well as removing reflection type annotations, if any.

Structure of resource operation

Operations involving shader resources and samplers are expressed via external function calls.

Below is an example for the sample method:

%dx.types.ResRet.f32 = type { float, float, float, float, i32 }

declare %dx.types.ResRet.f32 @dx.op.sample.f32(
    i32,                      ; opcode
    %dx.types.ResHandle,      ; texture handle
    %dx.types.SamplerHandle,  ; sampler handle
    float,                    ; coordinate c0
    float,                    ; coordinate c1
    float,                    ; coordinate c2
    float,                    ; coordinate c3
    i32,                      ; offset o0
    i32,                      ; offset o1
    i32,                      ; offset o2
    float)                    ; clamp

The method always returns five scalar values that are aggregated in dx.types.ResRet.f32 type and extracted into scalars via LLVM's extractelement right after the call. The first four elements are sample values and the last field is the status of operation for tiled resources. Some return values may be unused, which is easily determined from the SSA form. The driver compiler is free to specialize the sample instruction to the most efficient form depending on which return values are used in computation.

If applicable, each intrinsic is overloaded on return type, e.g.:

%dx.types.ResRet.f32 = type { float, float, float, float, i32 }
%dx.types.ResRet.f16 = type { half, half, half, half, i32 }

declare %dx.types.ResRet.f32 @dx.op.sample.f32(...)
declare %dx.types.ResRet.f16 @dx.op.sample.f16(...)

Wherever applicable, the return type indicates the "precision" at which the operation is executed. For example, sample intrinsic that returns half data is allowed to be executed at half precision, assuming hardware supports this; however, if the return type is float, the sample operation must be executed in float precision. If lower-precision is not supported by hardware, it is allowed to execute a higher-precision variant of the operation.

The opcode parameter uniquely identifies the sample operation. More details can be found in the Instructions section. The value of opcode is the same for all overloads of an operation.

Some resource operations are "polymorphic" with respect to resource types, e.g., dx.op.sample.f32 operates on several resource types: Texture1D[Array], Texture2D[Array], Texture3D, TextureCUBE[Array].

Each resource/sampler is represented by a pair of i32 values. The first value is a unique (virtual) resource range ID, which corresponds to HLSL declaration of a resource/sampler. Range ID must be a constant for SM5.1 and below. The second integer is a 0-based index within the range. The index must be constant for SM5.0 and below.

Both indices can be dynamic for SM6 and later to provide flexibility in usage of resources/samplers in control flow, e.g.:

Texture2D<float4> a[8], b[8];
...
Texture2D<float4> c;
if(cond)  // arbitrary expression
  c = a[idx1];
else
  c = b[idx2];
... = c.Sample(...);

Resources/samplers used in such a way must reside in descriptor tables (cannot be root descriptors); this will be validated during shader and root signature setup.

The DXIL validator will ensure that all leaf-ranges (a and b above) of such a resource/sampler live-range have the same resource/sampler type and element type. If applicable, this constraint may be relaxed in the future. In particular, it is logical from HLSL programmer point of view to issue loads on compatible resource types, e.g., Texture2D, RWTexture2D, ROVTexture2D:

Texture2D<float4> a[8];
RWTexture2D<float4> b[6];
...
Texture2D<float4> c;
if(cond)  // arbitrary expression
 c = a[idx1];
else
 c = b[idx2];
... = c.Load(...);

LLVM's undef value is used for unused input parameters. For example, coordinates c2 and c3 in an dx.op.sample.f32 call for Texture2D are undef, as only two coordinates c0 and c1 are required.

If the clamp parameter is unused, its default value is 0.0f.

Resource operations are not overloaded on input parameter types. For example, dx.op.sample.f32 operation does not have an overload where coordinates have half, rather than float, data type. Instead, the precision of input arguments can be inferred from the IR via a straightforward lookup along an SSA edge, e.g.:

%c0 = fpext half %0 to float
%res = call %dx.types.ResRet.f32 @dx.op.sample.f32(..., %c0, ...)

SSA form makes it easy to infer that value %0 of type half got promoted to float. The driver compiler can tailor the instruction to the most efficient form for the target hardware.

Resource operations

The section lists resource access operations. The specification is given for float return type, if applicable. The list of all overloads can be found in the appendix on intrinsic operations.

Some general rules to interpret resource operations:

The number of active (meaningful) return components is determined by resource element type. Other return values must be unused; validator ensures this.
GPU instruction needs status only if the status return value is used in the program, which is determined through SSA.
Overload suffixes are specified for each resource operation.
Type of resource determines which inputs must be defined. Unused inputs are passed typed LLVM 'undef' values. This is checked by the DXIL validator.
Offset input parameters are i8 constants in [-8,+7] range; default offset is 0.

Resource operation return types

Many resource operations return several scalar values as well as status for tiled resource access. The return values are grouped into a helper structure type, as this is LLVM's way to return several values from the operation. After an operation, helper types are immediately decomposed into scalars, which are used in further computation.

The defined helper types are listed below:

%dx.types.ResRet.i8  = type { i8, i8, i8, i8, i32 }
%dx.types.ResRet.i16 = type { i16, i16, i16, i16, i32 }
%dx.types.ResRet.i32 = type { i32, i32, i32, i32, i32 }
%dx.types.ResRet.i64 = type { i64, i64, i64, i64, i32 }
%dx.types.ResRet.f16 = type { half, half, half, half, i32 }
%dx.types.ResRet.f32 = type { float, float, float, float, i32 }
%dx.types.ResRet.f64 = type { double, double, double, double, i32 }

%dx.types.Dimensions = type { i32, i32, i32, i32 }
%dx.types.SamplePos  = type { float, float }

Resource handles

Resources are identified via handles passed to resource operations. Handles are represented via opaque type:

%dx.types.Handle     = type { i8 * }

The handles are created out of resource range ID and index into the range:

declare %dx.types.Handle @dx.op.createHandle(
    i32,                  ; opcode
    i8,                   ; resource class: SRV=0, UAV=1, CBV=2, Sampler=3
    i32,                  ; resource range ID (constant)
    i32,                  ; index into the range
    i1)                   ; non-uniform resource index: false or true

Resource class is a constant that indicates which metadata list (SRV, UAV, CBV, Sampler) to use for property queries.

Resource range ID is an i32 constant, which is the position of the metadata record in the corresponding metadata list. Range IDs start with 0 and are contiguous within each list.

Index is an i32 value that may be a constant or a value computed by the shader.