Abstract [eng] |
Computer vision algorithms have been actively developed, with the highest peak during the last decade. With the increasing need for a phase of transferring pre-trained models to the production environment, it becomes important for architects of artificial intelligence systems to assess the inference environment for reaching the most effective model performance. In this paper, we review image classification task, architectures, model serving process, and deployment software. Furthermore, we present benchmark specification and experiments results performed using EfficientNet and MobileNet family models with the purpose of comparing three model serving software: TensorFlow Serving, TorchServe, and Triton Inference Server. Additionally, model quantization impact on experiments inference time was reviewed. As a result, Triton Inference Server showed 16 times faster performance compared to TorchServe. Additionally, cloud instances' hourly costs were reviewed when comparing TensorFlow Serving and Triton Inference Server model's performance. Lastly, recommendations for efficient image classification model inference in production were provided. |