Leon's Blog

分享一点有趣的技术

0%

MLIR basics: deepdive part1

image-20250422105012239

本篇文章主要基于mlir的公开talk,深入理解一下mlir中的operation,attributes以及properties机制的底层实现。本文先着重解读operation的底层实现机制。

MLIR Operation Intro

image-20250422105310560

上图来源mlir tutorial。mlir operation是MLIR系统中最基本的可执行单元,其涵盖多种不同的抽象层级的计算(按照硬件相关性分类,可以分为硬件相关和硬件无关。抽象层级上,也有诸如vector operation和memref operation)。上图已经涵盖了operation op的每个细节,详情可参考mlir operation docs

具体的使用细节,可以参考HarmonyHu MLIR技术细节整理

mlir::Operation是通用定义,包含通用的接口和属性;MulOpTransposeOpConstantOp等等是特定定义,包含特定的属性。前者可以通过llvm::dyn_cast(动态)或llvm::cast(静态)转换成后者;后者通过getOperation()转换成前者。

1
2
3
4
5
6
7
8
9
10
11
12
void processConstantOp(mlir::Operation *operation) {
ConstantOp op = llvm::dyn_cast<ConstantOp>(operation);

// This operation is not an instance of `ConstantOp`.
if (!op)
return;

// Get the internal operation instance wrapped by the smart pointer.
mlir::Operation *internalOperation = op.getOperation();
assert(internalOperation == operation &&
"these operation instances are the same");
}

特定Op可以直接访问属性;mlir::Operation无法直接访问特定Op的属性,但可以类似如下间接访问,如下:

1
2
3
4
5
6
// getAttrOfType
IntegerAttr opsetAttr = op->getAttrOfType<::mlir::Attribute>("onnx_opset").dyn_cast_or_null<IntegerAttr>();
if (opsetAttr)
opset = opsetAttr.getValue().getSExtValue();
// getAttr
mlir::Type targetType = op->getAttr("to").cast<::mlir::TypeAttr>().getValue();

operation的具体使用方面,提供了如下接口:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
OperationName getName() { return name; }
/// Remove this operation from its parent block and delete it.
void erase();
/// Remove the operation from its parent block, but don't delete it.
void remove();
/// Returns the operation block that contains this operation.
Block *getBlock() { return block; }
/// Return the context this operation is associated with.
MLIRContext *getContext();
/// Returns the closest surrounding operation that contains this operation
/// or nullptr if this is a top-level operation.
Operation *getParentOp() { return block ? block->getParentOp() : nullptr; }
/// Return the closest surrounding parent operation that is of type 'OpTy'.
template <typename OpTy> OpTy getParentOfType();
void dump();
unsigned getNumOperands();
Value getOperand(unsigned idx) { return getOpOperand(idx).get(); }
void setOperand(unsigned idx, Value value);
unsigned getNumResults();
/// Get the 'idx'th result of this operation.
OpResult getResult(unsigned idx) { return OpResult(this, idx); }
void replaceAllUsesWith(Operation *op);
void moveBefore(Operation *existingOp);
void moveBefore(Block *block, llvm::iplist<Operation>::iterator iterator);
void moveAfter(Operation *existingOp);
void moveAfter(Block *block, llvm::iplist<Operation>::iterator iterator);

MLIR Operation Internal

前一章节主要介绍mlir operation以及其使用细节,而这一章则深入挖掘operation的底层实现存储。本章主要参考mlir properties talk

首先我们先来看一下mlir的官方doc中,对于operation的底层细节的一些描述:

An Operation defines zero or more SSA Value that we refer to as the Operation results. This array of Value is actually stored in memory before the Operation itself in reverse order. That is for an Operation with 3 results we allocate the following memory layout:

[Result2, Result1, Result0, Operation] ^ this is where Operation* pointer points to.

A consequence of this is that this class must be heap allocated, which is handled by the various create methods. Each result contains:

  • one pointer to the first use (see OpOperand)
  • the type of the SSA Value this result defines.
  • the index for this result in the array. The results are defined as subclass of ValueImpl, and more precisely as the only two subclasses of OpResultImpl: InlineOpResult and OutOfLineOpResult. The former is used for the first 5 results and the latter for the subsequent ones. They differ in how they store their index: the first 5 results only need 3 bits and thus are packed with the Type pointer, while the subsequent one have an extra unsigned value and thus need more space.

An Operation also has zero or more operands: these are uses of SSA Value, which can be the results of other operations or Block arguments. Each of these uses is an instance of OpOperand. This optional array is initially tail allocated with the operation class itself, but can be dynamically moved out-of-line in a dynamic allocation as needed.

An Operation may contain optionally one or multiple Regions, stored in a tail allocated array. Each Region is a list of Blocks. Each Block is itself a list of Operations. This structure is effectively forming a tree.

Some operations like branches also refer to other Block, in which case they would have an array of BlockOperand.

An Operation may contain optionally a “Properties” object: this is a pre-defined C++ object with a fixed size. This object is owned by the operation and deleted with the operation. It can be converted to an Attribute on demand, or loaded from an Attribute.

Finally an Operation also contain an optional DictionaryAttr, a Location, and a pointer to its parent Block (if any).

总结一下,mlir operation的底层细节中,比较有趣的点:

  • 操作的结果是以逆序存储在operation之前,并且前5个result和之后可能的result存储大小不一样,这个机制类似llvm的llvm::SmallVector实现。
  • operand是尾部存储的方式存储在operation之后,旨在提供可扩展和可修改性(operand本质是对SSA value的引用,在mlir中经常会有修改引用的情况,比如replaceAllUsesWith()),同时也支持动态存储在别处。
  • 一个operation会有多个region,也是尾部存储方式存储。

针对这些有趣的点,我们后续主要挖掘operation类的具体实现,以及operation前存储和tailing存储等技术细节。

Operation类的定义

image-20250422130653462

如图所示是/include/mlir/ir/operation.h文件,截取的是operation类的定义,继承自ilist_node_with_parentTrailingObjects

ilist_node_with_parent

参考llvm member list文档,其定义如下:

image-20250422131035118

是一个双向链表,允许operation获取其prev节点和next节点,分别有如下api调用:

image-20250422131140983

这个链表属于侵入式链表,和朴素的std::list有区别,具体参见Boost侵入式链表文档。这一块比较复杂,后续再学。

Trailing Objects

Trailing Objects的原理十分简单,如下是c++中的解释:

是一个在 C++ 中的术语,指的是一个对象,它紧跟在一个类的末尾,并且其大小是动态的或不固定的。通常,尾部对象用于在类内部存储一些额外的数据,这些数据的大小在编译时是不确定的,可能根据运行时的条件或对象的使用来决定。

llvm/include/support/TrailingObjects.h头文件中,有对于这个类的详细定义。注释中给出一个如何定义trailing object类的例子。

类的定义

1
2
3
4
5
6
7
// 定义VarLenghObj,该对象有固定的int和可变的double数组
class VarLengthObj : private TrailingObjects<VarLengthObj, int, double> {
friend TrailingObjects; // 允许模板访问私有方法
unsigned NumInts, NumDoubles;

size_t numTrailingObjects(OverloadToken<int>) const { return NumInts; }
};

注意这里有一个关键点,double是尾元素,因此其长度是可变的,所以没有numTrailingObjects方法实现。

内存分配

1
2
3
size_t Size = VarLengthObj::totalSizeToAlloc<int, double>(NumInts, NumDoubles);
void *Mem = operator new(Size);
auto *Obj = new (Mem) VarLengthObj();

访问数据

1
2
int *Ints = Obj->getTrailingObjects<int>();    // 访问 int 数组
double *Doubles = Obj->getTrailingObjects<double>(); // 访问 double 数组

Operation类的实现

image-20250422133350287

这张图解释了operation的实际存储,具体如下:

  • 前16bytes是前序节点和后继节点,注意前序节点是ilist_node_base类型加一个int,用来表示哨兵。
  • 后续是block指针等信息,同时还有region,results等个数。

上述的trailing objects中,只有OpOperand是编译时不确定可变的,其他均可通过operation这64bytes信息中的num值来推导实际存储位置,如下图所示:

image-20250422133953920

该图展示了operation的getRegion()方法的底层实现原理。而唯一特殊的点就是OpOperand,是tail-allocated或是和operation存储分离的。该OpOperand的定位通过64bytes中的OperandStorage来辅助:

image-20250422134301210

该OperandStorage中存储OpOperand指针,以及总共有多少operands。添加operands,这种模式也十分容易动态扩展。

解读完OpOperand,region等信息后,operation还剩一个关键信息:result。

image-20250422134727315

result的特殊性是其存储在operation存储体的前面,没有放在trailing objects中。其设计思路参考SmallVector设计,分为OutOfLineOpResult和InlineOpResult,对于少量result情况(<=6)可以有效压缩空间。其主要思路是用3bits表示有限的index索引,而超过3bits范围,则使用int表示索引范围,使得前六个result占用16bytes,其余为24个bytes。

image-20250422134946924

从operation中获取指定的result,其实现如下,逻辑是比较清晰简单的:

image-20250422135114847

同时也支持从result获取operation这样的操作。逻辑和上图基本一致。

如下图是一张很好的operation结构总结图:

image-20250422141744357

来源于How Slow is MLIR talk

Debug实战

mlir的debug tips参考[debug博客](MLIR debug tips | Leon’s Blog)。首先打断点打在如图所示:

image-20250422140535380

我们重点关注func::FuncOp的实现。

image-20250422140614518

可以看到,OpState是指向operation实体的指针,其余均是OpTrait萃取或是interface接口,不属于本章节内容。

image-20250422141259704

可以看到,OpState是符合前几章讲解的底层实现。

至此,解读完mlir的operation的底层实现,下一章节会重点解读mlir的attributes机制的底层实现。