As technology advances and more organizations are implementing machine learning operations (MLOps), people are looking for ways to speed up processes. MobiDev’s AI team decided to look into the benefits and drawbacks of on-premises vs cloud graphical processing units (GPU).
On-Premises GPU Options for Deep Learning
When using GPUs for on-premises implementations you have multiple vendor options. Two popular choices are NVIDIA and AMD.
NVIDIA is a popular option at least in part because of the libraries it provides, known as the CUDA toolkit. These libraries enable the easy establishment of deep learning processes and form the base of a strong machine learning community with NVIDIA products. This can be seen in the widespread support that many DL libraries and frameworks provide for NVIDIA hardware. The downside of NVIDIA is that it has recently placed restrictions on when CUDA can be used. These restrictions require that the libraries only be used with Tesla GPUs and cannot be used with the less expensive RTX or GTX hardware.
AMD provides libraries, known as ROCm. These libraries are supported by TensorFlow and PyTorch as well as all major network architectures. However, support for the development of new networks is limited as is community support.
Cloud Computing with GPUs
Cloud resources can provide pay-for-use access to GPUs in combination with optimized machine learning services. All three major providers offer GPU resources along with a host of configuration options – Microsoft Azure, AWS (Amazon Web Services), Google Cloud.
Microsoft Azure grants a variety of instance options for GPU access. These instances have been optimized for high computation tasks, including visualization, simulations, and deep learning.
In AWS you can choose from four different options, each with a variety of instance sizes. Options include EC2 P3, P2, G4, and G3 instances. These options enable you to choose between NVIDIA Tesla V100, K80, T4 Tensor, or M60 GPUs. You can scale up to 16 GPUs depending on the instance.
Rather than dedicated GPU instances, Google Cloud enables you to attach GPUs to your existing instances. Evgeniy Krasnokutsky, AI/ML Solution Architect at MobiDev explains: “For example, if you are using Google Kubernetes engine you can create node pools with access to a range of GPUs. These include NVIDIA Tesla K80, P100, P4, V100, and T4 GPUs.”
What is the Best GPU for Deep Learning Tasks in 2021?
When the time comes to choose your infrastructure you need to decide between an on-premises and a cloud approach. Cloud resources can significantly lower the financial barrier to building a DL infrastructure.
In contrast, on-premises infrastructures are more expensive upfront but provide you with greater flexibility. You can use your hardware for as many experiments as you want over as long a period as you want with stable costs. You also retain full control over your
However, if you do not have the necessary skills to operate on-premise resources, you should consider cloud offerings, which can be easier to scale and often come with managed options.
More detailed information about the benefits and drawbacks of on-premises vs cloud GPU in 2021 can be found at: https://mobidev.biz/blog/gpu-deep-learning-on-premises-vs-cloud
MobiDev is a US-based software engineering company focused on helping visionaries create their products with ease and joy. The company invests into technology research and has years of experience building AI-powered solutions, implementing machine learning, augmented reality and IoT.