PieSlicer: Dynamically Improving Response Time for Cloud-based CNN Inference


Abstract

Executing deep-learning inference on cloud servers enables the usage of high complexity models for mobile devices with limited resources. However, pre-execution time-the time it takes to prepare and transfer data to the cloud-is variable and can take orders of magnitude longer to complete than inference execution itself. This pre-execution time can be reduced by dynamically deciding the order of two essential steps, preprocessing and data transfer, to better take advantage of on-device resources and network conditions. In this work, we present PieSlicer, a system for making dynamic preprocessing decisions to improve cloud inference performance using linear regression models. PieSlicer then leverages these models to select the appropriate preprocessing location. We show that for image classification applications PieSlicer reduces median and 99th percentile pre-execution time by up to 50.2ms and 217.2ms respectively when compared to static preprocessing methods.


BibTeX

@inproceedings{Ogden2021b, title = {{{PieSlicer}}: {{Dynamically Improving Response Time}} for {{Cloud}}-Based {{CNN Inference}}}, shorttitle = {{{PieSlicer}}}, booktitle = {Proceedings of the {{ACM}}/{{SPEC International Conference}} on {{Performance Engineering}}}, author = {Ogden, Samuel S. and Kong, Xiangnan and Guo, Tian}, date = {2021-04-09}, series = {{{ICPE}} '21}, pages = {249--256}, publisher = {{Association for Computing Machinery}}, location = {{New York, NY, USA}}, doi = {10.1145/3427921.3450256}, url = {https://doi.org/10.1145/3427921.3450256}, urldate = {2021-09-21}, abstract = {Executing deep-learning inference on cloud servers enables the usage of high complexity models for mobile devices with limited resources. However, pre-execution time-the time it takes to prepare and transfer data to the cloud-is variable and can take orders of magnitude longer to complete than inference execution itself. This pre-execution time can be reduced by dynamically deciding the order of two essential steps, preprocessing and data transfer, to better take advantage of on-device resources and network conditions. In this work, we present PieSlicer, a system for making dynamic preprocessing decisions to improve cloud inference performance using linear regression models. PieSlicer then leverages these models to select the appropriate preprocessing location. We show that for image classification applications PieSlicer reduces median and 99th percentile pre-execution time by up to 50.2ms and 217.2ms respectively when compared to static preprocessing methods.}, isbn = {978-1-4503-8194-9}, keywords = {cloud inference,mobile deep learning,performance modeling} }