深度学习编译器

本文整理深度学习编译器相关的论文和来自知乎等网站的相关材料[1]

1 深度学习编译器相关论文

1.1 Survey

  • The Deep Learning Compiler: A Comprehensive Survey

    DL编译器的survey,总结了DL编译器的设计框架

  • An In-depth Comparison of Compilers for Deep Neural Networks on Hardware

    比较了Halide, XLA, TVM, TC等几种编译器的性能

1.2 TVM系列

  1. OSDI'18 - TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

    文章[2]完整介绍了 TVM 的设计的背景,目标,技术难点和解决方案。该文章确定了整个 TVM 的技术架构,包括硬件无关的图级别的优化,硬件相关的算子优化,基于代价模型的搜索寻优等等。快速了解此文可以阅读已有的一些论文阅读笔记

  1. IWMLPL'18 - Relay - A New IR for Machine Learning Frameworks

    Relay[3]是TVM 新的中间表示形式,不同框架的模型先转化成 Relay,然后在Relay 上来做图优化。如何评价TVM的新IR(Relay)?

  1. Relay: A High-Level Compiler for Deep Learning[4]

    TVM的第二代high-level IR,类似于编程语言,设计了语法规则,引入了let-binding机制。DL背景的开发者可以使用data flow graph来定义计算图, PL(Program Language)背景的研究人员可以使用let binding来定义计算图。Let binding机制通过compute scope解决了AST的二义性问题。

    Relay: A High-Level Compiler for Deep Learning 论文翻译

1.3 Auto-tuning相关工作

  1. NIPS'18 - Learning to Optimize Tensor Programs

    文章[5]发表于 2018 年的NIPS,详细介绍了autotvm 自动寻优方案。文章是对论文[2]的第5节的一个更加详细的扩充。主要观点在 OSDI2018 的论文里都有描述,论文 1 没有涉及的的内容是引入了 transfer learning 的应用。

    Learning to Optimize Tensor Programs解读

  1. OSDI'20 - Ansor: Generating High-Performance Tensor Programs for Deep Learning

    Ansor 论文[6]发表于2020 年 OSDI , 其核心目标是生成高效的程序,具体包括两个部分(1)如何扩大搜索空间(2)如何提高搜索的性能和效率。

    为了扩大搜索空间,改进了TVM 基于模板的搜索空间定义,通过层次化搜索的方案,解耦 high-level 结构和low-level 细节。

    为了提升搜索的性能和效率,ansor 改进了TVM 的搜索策略,将搜索算法从模拟退火算法修改成了遗传算法(原文是进化搜索 evolutionary search,国内多翻译为遗传算法),从而能够有效的跳出局部最优。

    把schedule分成sketch和annotation两层,sketch相当于TVM的schedule template,Ansor 可以先搜索出sketch,再搜索annotation。

  1. FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System

    FlexTensor [7]

  1. CHAMELEON: ADAPTIVE CODE OPTIMIZATION FOR EXPEDITED DEEP NEURAL NETWORK COMPILATION

1.4 Polyhedral

上面面三篇公众号文章介绍Poly的一些基本原理和在DL领域中的应用,作者是要术甲杰,是 Poly研究领域的博士

同样是要术甲杰写的介绍Pluto算法的文章

1.5 Others

用shared memory来实现更激进的operator fusion策略

两篇关于自动微分的survey

schedule和execution阶段进行联合优化

阿里杨军的系列文章

用TVM在神威超算上生成算子

TensorFow中的图优化

2 其它相关文档

2.1 AI编译器@金雪峰

2.2 漫游深度学习编译器@知乎

2.3 TVM代码走读【@知乎专栏】

2.4 深度学习编译器@柳嘉强

2.5 从零开始学习深度学习编译器

  1. 其它

参考文献

[1]
陈逢锦, “Tvm及深度学习编译器相关论文,” 2022. https://zhuanlan.zhihu.com/p/500041871 (accessed May 03, 2022).
[2]
T. Chen et al., “TVM: an automated end-to-end optimizing compiler for deep learning,” in 13th USENIX symposium on operating systems design and implementation, OSDI 2018, carlsbad, ca, usa, october 8-10, 2018, 2018, pp. 578–594. Available: https://www.usenix.org/conference/osdi18/presentation/chen
[3]
J. Roesch et al., “Relay: a new ir for machine learning frameworks,” in Proceedings of the 2nd acm sigplan international workshop on machine learning and programming languages, 2018, p. nil. doi: 10.1145/3211346.3211348.
[4]
J. Roesch et al., “Relay: A High-Level Compiler for Deep Learning,” 2019.
[5]
T. Chen et al., “Learning to optimize tensor programs,” in Proceedings of the 32nd international conference on neural information processing systems, 2018, pp. 3393–3404.
[6]
L. Zheng et al., “Ansor: Generating high-performance tensor programs for deep learning,” in 14th USENIX symposium on operating systems design and implementation, OSDI 2020, virtual event, november 4-6, 2020, 2020, pp. 863–879. Available: https://www.usenix.org/conference/osdi20/presentation/zheng
[7]
S. Zheng, Y. Liang, S. Wang, R. Chen, and K. Sheng, “Flextensor,” in Proceedings of the twenty-fifth international conference on architectural support for programming languages and operating systems, 2020, p. nil. doi: 10.1145/3373376.3378508.
[8]
J. Zhao et al., “Akg: automatic kernel generation for neural processing units using polyhedral transformations,” in Proceedings of the 42nd acm sigplan international conference on programming language design and implementation, 2021, p. nil. doi: 10.1145/3453483.3454106.
[9]
J. Zhao and P. Di, “Optimizing the memory hierarchy by compositing automatic transformations on computations and data,” in 2020 53rd annual ieee/acm international symposium on microarchitecture (micro), 2020, p. nil. doi: 10.1109/micro50266.2020.00044.
updatedupdated2022-07-142022-07-14