Introduction to Deepytorch Inference acceleration, benefits, and model limitations-Elastic GPU Service(EGS)-阿里云帮助中心

Deepytorch Inference is an AI inference accelerator from Alibaba Cloud that provides high-performance inference for PyTorch models. It significantly improves inference performance by partitioning the model's computation graph, fusing execution layers, and implementing high-performance operators (OPs). This topic describes the concepts, benefits, and supported models of Deepytorch Inference.

Introduction to Deepytorch Inference

Deepytorch Inference provides inference acceleration using just-in-time (JIT) compilation to optimize deep learning models in the PyTorch framework. This process enables fast and efficient inference without requiring you to specify precision or input sizes.

The following figure shows the architecture of Deepytorch Inference.

Architecture layer		Description
Framework layer		Pytorch Framework: The PyTorch framework component used to connect to your models. Pytorch Custom Ops: Other third-party PyTorch operators. These operators are not optimized and are retained in the framework layer.
Deepytorch Inference acceleration	Deepytorch Inference component	Torchscript Graph Optimization PipeLines: Graph optimization tools and operator fusion techniques on TorchScript. Environment Manager: Controls the execution features and optimization levels of Deepytorch Inference. Deepytorch Engine: The core execution engine. It includes key components such as Build Helper Ops, Operation Parser, Shape Tracker, Accuracy Checker, and Engine Rebuilder.
	Operator layer	High Performance Kernel Libs: High-performance operator libraries that provide high-performance features. Custom Plugins: Implementations of other functional operators.

Benefits

Significantly improves inference performance

Deepytorch Inference uses compilation to reduce model inference latency, which improves the model's real-time performance and response speed.

The following table compares the inference performance of different models.

Note

The following data shows the inference performance on a single A10 card. Compared to the default inference configuration of the model, using Deepytorch Inference for optimization significantly improves inference performance.

model	input-size	deepytorch inference (ms)	pytorch float (ms)	Inference speed increase	source
Resnet50	1 × 3 × 224 × 224	0.47	2.92	84%	torchvision
Mobilenet-v2-100	1 × 3 × 224 × 224	0.24	2.01	88%	torchvision
SRGAN-X4	1 × 3 × 272 × 480	23.07	132.00	83%	SRGAN
YOLO-V3	1 × 3 × 640 × 640	3.87	15.70	75%	yolov3
Bert-base-uncased	1 × 128, 1 × 128	0.94	3.76	75%	transformers
Bert-large-uncased	1 × 128, 1 × 128	1.33	7.11	81%	transformers
GPT2	1 × 128	1.49	3.82	71%	transformers

Ease of use
Deepytorch Inference does not require you to specify precision or input sizes. It is easy to use because it leverages JIT compilation with minimal code intrusion. This approach reduces code complexity and maintenance costs.

Model support

Deepytorch Inference currently supports optimization for the models listed below.

Models that support inference acceleration

Scenario	Supported model name
Vision scenarios	alexnet dcgan mnasnet1_0 mobilenet_v2 mobilenet_v3_large pytorch_stargan resnet18 resnet50 resnext50_32x4d shufflenet_v2_x1_0 squeezenet1_1 timm_efficientnet timm_nfnet timm_regnet timm_resnest timm_vision_transformer timm_vovnet vgg16 SRGAN-X4 YOLO-V3
NLP scenarios	BERT_pytorch attention_is_all_you_need_pytorch GPT2 bert-base-uncased bert-large-uncased

Models that do not support inference acceleration

Operations such as weight demodulation in the StyleGan2 model are not supported because weight demodulation dynamically generates weights based on the input.
Models that dynamically set attributes are not supported. For example, the fasterrcnn_resnet50_fpn model in torchvision.models.detection causes an error when you execute torch.jit.freeze.
The conv1d operator is not supported in models such as demucs.

References

Using the Deepytorch Inference tool can significantly improve model inference performance compared to the default configuration. For instructions, see Install and use Deepytorch Inference.
You can use the Deepytorch Training tool to optimize model training and significantly improve end-to-end training performance. For more information, see What is Deepytorch Training? and Install and use Deepytorch Training.