ml-inference

Previous blog posts overviewed the MLIR dialect hierarchy for kernel code generation (CodeGen) and zoomed in on the Linalg and Vector dialects among them. Now I will switch to discuss the runtime side a bit, in order to provide a holistic view of MLIR-based machine learning (ML) compilers. This one touches the foundation and basics, including the target landscape, runtime requirements and designs to meet thereof.
These days if you would like to learn about machine learning, there are abundant great resources on the web discussing model architectures and how to code and train them. Materials about inference, though, are generally much harder to find, especially for edge and mobile. You might ask, inference is just the forward pass of training, so how hard can it be? Actually, it faces lots of unique challenges, to the extent that we are basically solving completely different major problems. I have been working on inference at the edge for a while, so let me capture them in this blog post, by contrasting training and inference in the cloud.
2021-07-17
11 min read