# Deep Learning with “AWS Graviton2 + NVIDIA Tensor T4G” for as low as free* with CUDA 12.2

> \* The “as low as free” tagline is based on `*g5g.xlarge*` spot instance rates, which have been as low as $0.1519/hr.

# Introduction

The world we live in today heavily relies on artificial intelligence. From vacuum bots to sales support, from self-driving cars to disease detection, from finding the content you want to consume to translating from a foreign language to your native one. AI is behind every great product out there, and the need for an efficient, cost-effective, and scalable deep learning architecture has never been more critical.

The G5g instances powered by Amazon’s own Graviton2 processor and also feature NVIDIA T4G Tensor Core GPUs are a cost-effective alternative to Intel’s and AMD’s powered instances for deploying deep learning applications.

# The Dilemma

AWS offers robust, powerful, cost-effective architecture for running artificial intelligence and deep learning tasks. One of the advantages is the option to use spot instances, which are far more cost-effective at times and up to 70% cheaper than on-demand instances.

For example, the spot pricing history for the `g5g.xlarge` instance in various “us-east” zones ranged from **$0.1720** to **$0.1519** per hour for the past three months. These rates are tempting, but at the time of writing, no official **Amazon Linux 2023** Deep learning AMI is available for the Amazon **G5g** instances family. Setting up the environment can be cumbersome: finding drivers, the correct dev toolchain, and a pre-compiled PyTorch module supporting the latest DL toolkit.

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1694281084800/54ba784f-cd64-4e6c-b843-b0c0132ed617.png align="left")

Spot price history for g5g.xlarge for the last three months.

# Navigating the Challenge — A How-To Guide

This aims to bridge the gap by offering comprehensive step-by-step instructions suitable for newcomers and seasoned data scientists. The goal is to enable you to leverage these state-of-the-art technologies at a meager cost without the hassle of finding the right driver and packages for the G5g family. Eventually, we will compile all the individual steps into a single script that will further streamline the process.

## 1\. Launching an Instance

For setting up an instance, we’ll use `g5g.4xlarge` instance. The idea behind using a more powerful instance is to accelerate compilation time. We will launch the build instance with the AWS Command Line Interface (aws cli).

First, set the following environment variables:

* `REGION`: Specifies the AWS region, e.g., ‘us-east-1’.
    
* `SECURITY_GROUPS`: Your security group ID(s).
    
* `KEY_PAIR`: The name of your SSH key pair.
    
* `SUBNET`: The ID of your subnet.
    

If you have any confusion about these variables. You can refer to the [security group,](https://docs.aws.amazon.com/vpc/latest/userguide/security-groups.html) [keypair,](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html) and [subnets](https://docs.aws.amazon.com/vpc/latest/userguide/configure-subnets.html) documentation.

Once you have these values, you can set these variables like this.

export REGION='us-east-1'  
export SECURITY\_GROUPS='YourFirstSecurityGroupIdsHere'  
export KEY\_PAIR='YourSSHKeyNameHere'  
export SUBNET='YourSubnetHere'

Next, we need to find the latest **Amazon Linux 2023 AMI ID** so you will get the latest AMI every time you run this script. The following command will fetch the AMI ID and store it as `AMI_ID`.

Let’s launch the instance using the AMI ID we retrieved earlier by executing:

```bash
aws ec2 run-instances \
--image-id $AMI_ID \
--instance-type g5g.4xlarge \
--key-name $KEY_PAIR \
--subnet-id $SUBNET \
--security-group-ids $SECURITY_GROUPS \
--region $REGION \
--block-device-mappings '[{"DeviceName":"/dev/xvda","Ebs":{"VolumeSize":20,"VolumeType":"gp3"}}]' \
--tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=AMI-Builder}]'
```

This command initiates a `g5g.4xlarge` instance with the Latest Amazon Linux 2023 AMI ID. It also configures the instance to use the specified security groups, key pair, and subnet we provided in environment variables. We’ve also attached 20 GB of storage to the root device for downloading different libraries and PyTorch compilation.

## 2\. Installing System Updates and Required Packages

Setting up any machine, be it local or in the cloud, it is always a good practice to keep it updated. This part will install all the updates and tools used in compilation or running AI tasks.   
But before going to Gung Ho, We recommend taking an overview of the guide first and checking the complete script at the end of this tutorial, which should save you from lots of trouble.

First, let’s define some essential environment variables.

```bash
CUDA_HOME=/usr/local/cuda
HOME_DIR=/home/ec2-user
```

Now, we’ll create a function called `install_utils` that carries out a series of tasks.

```bash
install_utils() {
    # Update all system packages to their latest versions
    dnf -y update
    
    # Install development tools, which include compilers and other utilities
    dnf -y groupinstall "Development Tools"
    
    # Install the packages that are specifically required for our setup
    dnf install -y openssl-devel cmake3 rust cargo
    dnf install -y amazon-efs-utils htop iotop yasm nasm jq python3-pip python-devel cronie cronie-anacron
    
    # Add necessary paths to the .bashrc file
    echo "PATH=$CUDA_HOME/bin:\$PATH" | sudo tee -a $HOME_DIR/.bashrc
    echo "LD_LIBRARY_PATH=$CUDA_HOME/lib64:\$LD_LIBRARY_PATH" | sudo tee -a $HOME_DIR/.bashrc
    
    # Configure shared libraries
    echo "/usr/local/lib" | sudo tee /etc/ld.so.conf.d/usr-local-lib.conf
    echo "/usr/local/lib64" | sudo tee -a /etc/ld.so.conf.d/usr-local-lib.conf
}
```

By running this `install_utils` function, you will have an updated OS and development tools needed in later steps.

## 3\. Install Latest NVIDIA Drivers, CUDA 12.2 Toolkit, and Cuda Deep Neural Network library:

In this step, we will install the NVIDIA GPU driver, Latest CUDA 12.2 toolkit, and CUDA Deep Neural Network (CuDNN) libraries. This part uses the latest driver and toolkit released on August 29, 2023. If you read it later, you can update the URLs for the latest driver and libraries; everything else will be the same. Steps to find the latest driver, toolkit, and library are also mentioned below.

### **Install NVIDIA GPU Driver**

To download and install the NVIDIA Tesla T4G driver, execute

wget https://us.download.nvidia.com/tesla/535.104.05/NVIDIA-Linux-aarch64-535.104.05.run  
sh NVIDIA-Linux-aarch64-535.104.05.run --disable-nouveau --silent

If everything goes smooth; you should have a working NVIDIA driver by now, which can be checked by running the NVIDIA system management interface command `nvidia-smi` in the terminal.

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1694281087225/b5efeedd-4b91-4ca0-ae70-07ae6d56be7f.png align="center")

nvidia-smi — NVIDIA System Management Interface

The latest drivers for NVIDIA Tesla T4G can be found [here](https://www.nvidia.com/Download/Find.aspx) by selecting the following options.

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1694281089558/36dc8212-9ca8-4b4c-be1e-0c855c6ff610.png align="left")

For guidance on selecting the correct driver, refer to the options above.

### Install CUDA Toolkit

The next step involves downloading and installing the CUDA 12.2 toolkit. which can be done by running following bash commands

```bash
wget https://developer.download.nvidia.com/compute/cuda/12.2.2/local_installers/cuda_12.2.2_535.104.05_linux_sbsa.run
sh cuda_12.2.2_535.104.05_linux_sbsa.run --silent --override \
--toolkit --samples --toolkitpath=/usr/local/cuda-12.2 \
--samplespath=$CUDA_HOME --no-opengl-libs
```

To find the latest version, visit [NVIDIA’s developer page](https://developer.nvidia.com/cuda-toolkit) and use the following selection.

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1694281091521/2d354046-5d71-422d-96fa-68c5e3c2c922.png align="left")

Follow the options above to choose the right CUDA Toolkit for your setup.

### Install NVIDIA CUDA® Deep Neural Network library (cuDNN):

Lastly, we’ll install the CuDNN library for “Server Base System Architecture (SBSA)”.

```bash
wget https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-sbsa/cudnn-linux-sbsa-8.9.4.25_cuda12-archive.tar.xz
tar -xf cudnn-linux-sbsa-8.9.4.25_cuda12-archive.tar.xz
cp -P cudnn-linux-sbsa-8.9.4.25_cuda12-archive/include/* $CUDA_HOME/include/
cp -P cudnn-linux-sbsa-8.9.4.25_cuda12-archive/lib/* $CUDA_HOME/lib64/
chmod a+r $CUDA_HOME/lib64/*
```

Latest cuDNN can be downloaded from [here](https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-sbsa/).

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1694281093543/e6a6e90a-40d7-46ab-bb79-4ef56f7a3ed6.png align="left")

List of available cuDNN sbsa libraries for CUDA 11 and CUDA 12.

By combining all three, we will have the following function, which we will use in the final script as well.

```bash
setup_gpu() {
    wget https://us.download.nvidia.com/tesla/535.104.05/NVIDIA-Linux-aarch64-535.104.05.run
    sh NVIDIA-Linux-aarch64-535.104.05.run --disable-nouveau --silent
    
    wget https://developer.download.nvidia.com/compute/cuda/12.2.2/local_installers/cuda_12.2.2_535.104.05_linux_sbsa.run
    sh cuda_12.2.2_535.104.05_linux_sbsa.run --silent --override --toolkit --samples --toolkitpath=/usr/local/cuda-12.2 --samplespath=$CUDA_HOME --no-opengl-libs
    
    wget https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-sbsa/cudnn-linux-sbsa-8.9.4.25_cuda12-archive.tar.xz
    tar -xf cudnn-linux-sbsa-8.9.4.25_cuda12-archive.tar.xz
    cp -P cudnn-linux-sbsa-8.9.4.25_cuda12-archive/include/* $CUDA_HOME/include/
    cp -P cudnn-linux-sbsa-8.9.4.25_cuda12-archive/lib/* $CUDA_HOME/lib64/
    chmod a+r $CUDA_HOME/lib64/*
    ldconfig
}
```

## 4\. Compiling and Installing CUDA 12.2 enabled PyTorch

Next we will compile and install PyTotch from source with the latest CUDA support for ARM-based ec2 instances, along with all the necessary Python packages.

```bash
# Download and install ccache for faster compilation
wget https://github.com/ccache/ccache/releases/download/v4.8.3/ccache-4.8.3.tar.xz
tar -xf ccache-4.8.3.tar.xz
pushd ccache-4.8.3
cmake .
make -j $CPUS
make install
popd
# Install NumPy, a dependency for PyTorch
dnf install -y numpy
# Install Python typing extensions for better type-checking
sudo -u ec2-user pip3 install typing-extensions
# Clone PyTorch repository and install from source
git clone --recursive https://github.com/pytorch/pytorch.git
pushd pytorch
python3 setup.py install
popd
# Refresh the dynamic linker run-time bindings
ldconfig
# Install additional Python libraries for PyTorch
sudo -u ec2-user pip3 install sympy filelock fsspec networkx
```

## 5\. Test Your Installation

After you’ve gone through the installation process, you’ll want to ensure that PyTorch and CUDA are working as expected. Run the following command to test the setup.

```bash
python3 -c "import torch; print('Using device: ', torch.device('cuda' if torch.cuda.is_available() else 'cpu'))";
```

If the device returns ‘cuda,’ then congratulations, you’ve successfully installed PyTorch with latest CUDA support!

# Complete script for effortless Setup 🪄

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1694281095562/c1aea5d5-7033-478a-bb8a-afb63a6f0a4c.png align="left")

Ready for some magic? Before getting started, ensure that your AWS CLI is properly configured. If you haven’t done this, refer to the [AWS documentation](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-welcome.html) to get up to speed. You will also need to gather the IDs for your [security group](https://docs.aws.amazon.com/vpc/latest/userguide/security-groups.html) and [subnet](https://docs.aws.amazon.com/vpc/latest/userguide/configure-subnets.html) and the name of your [key pair](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html).

Once you have completed the necessary preparations, run the provided script. This will launch a g5g.4xlarge instance pre-loaded with user data, which initiates the installation process upon launch. The entire setup process should take approximately an hour to complete. However, you can monitor the progress as it goes. To begin, SSH into your newly launched instance.

```bash
ssh -i "your-key-pair.pem" ec2-user@your-instance-ip
```

Then, run the following command to monitor the installation in real-time:

```bash
tail -f /home/ec2-user/install.log
```

Complete script can be downloaded from [Github](https://gist.github.com/bilalmughal/0500f27454a508bd3552fcf03e3adadb) and goes as follows.

%[https://gist.github.com/bilalmughal/0500f27454a508bd3552fcf03e3adadb] 

After everything is done you should get the following greetings.

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1694281097983/c86a4cf2-3889-421f-a6b0-de82401bdb69.png align="left")

## Using AWS Management Console

You can also use AWS Management console for this process as well. All you need to to do is “Launch an instance” from ec2 console and then select the right AMI, Architecture and instance type, along with other networking and security configurations you will do for launching any other instance. Don’t forget to increase the volume size to 20 GB as well.

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1694281100543/dbc15d5a-6099-46d6-aa28-9839bf1bcce7.png align="left")

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1694281101996/e83cfb1d-7f5b-447b-aeed-1273c1e0a140.png align="left")

After selecting the right AMI, architecture, instance type, storage and other options, configure your instance’s User Data by adding custom setup commands that will run during launch.  
To add User Data, go to the ‘Advanced Details’ section during the ‘Configure Instance’ stage, input the desired text or file, and paste the script from the GitHub repository between the ‘EOF’ markers into the User Data text area.

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1694281103445/470696bd-58d1-4059-acf7-c99c98898ab2.png align="left")

Remember, this User Data script is what automates your deep learning setup, so don’t skip this step!

## Wrapping Up

And there you have it! A one-stop solution to make your deep learning setup on an Amazon EC2 Graviton2 ARM-based instance much easier. After following these steps, you can create an AMI (Amazon Machine Image) and use it for deep-learning tasks. You should also try out spot instances for your interruptible artificial intelligence inferences, as it could save you a lot on operational costs!  
With this guide, we made configuration and setup hassle-free so you can dive straight into the work that matters most to you. If you find this script as helpful as we do, we would love to hear about the exciting projects it’s helping you accomplish. Feel free to share your success stories and any ingenious modifications you’ve made. Happy coding!

### 💡 Pro Tip: Max Power, Min Price — The G5G Magic Equation!

Did you know the `g5g.xlarge`, `g5g.2xlarge`, `g5g.4xlarge` and `g5g.8xlarge` have the same GPU power? If increasing the CPU power or adding more memory doesn’t significantly improve performance for your application, you can stick with the `g5g.xlarge` to save some money!

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1694281105312/4ce4e239-9288-4620-8838-279ccadf749f.png align="left")

G5g Instance specification details.

## About the Author and Our Journey at Jumpshare

I have been the part of tech industry for 18 years, serving different roles and devising different engineering solutions throughout. The ever-changing landscape of tech world and challenges it bring excites me, specially in the area of cloud computing and machine learning.

At [**Jumpshare**](https://jumpshare.com)**,** where I hold the position of VP of Engineering, we have successfully turned these challenges into opportunities. We’re passionate about implementing the techniques like this to make our machine learning inference tasks more cost-effective. By leveraging the power of AWS Graviton2 and NVIDIA Tensor T4G instances, we’ve been able to drastically reduce operational costs without compromising performance.

This guide is yet another effort to express our commitment of sharing our experience and insights with the community as we strongly believe in democratizing technology and saving costs on infrastructure can unlock doors to innovation.

We’re always open to hearing about your own experiences and improvements on the journey towards cost-effective, high-performance deep learning.

*This article was originally published on* [*Jumpshare.com*](https://jumpshare.com/blog/deep-learning-on-aws-graviton2-nvidia-tensor-t4g-for-as-low-as-free-with-cuda-12-2/)

## Resources

[https://jumpshare.com/blog/deep-learning-on-aws-graviton2-nvidia-tensor-t4g-for-as-low-as-free-with-cuda-12-2/](https://jumpshare.com/blog/deep-learning-on-aws-graviton2-nvidia-tensor-t4g-for-as-low-as-free-with-cuda-12-2/)[  
https://www.nvidia.com/Download/Find.aspx](https://www.nvidia.com/Download/Find.aspx)  
[https://developer.nvidia.com/cuda-toolkit](https://developer.nvidia.com/cuda-toolkit)  
[https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-sbsa/](https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-sbsa/)  
[https://instances.vantage.sh/?selected=g5g..x|g5g.x](https://instances.vantage.sh/?selected=g5g..x%7Cg5g.x)  
[https://docs.aws.amazon.com/vpc/latest/userguide/security-groups.html](https://docs.aws.amazon.com/vpc/latest/userguide/security-groups.html)  
[https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html)  
[https://docs.aws.amazon.com/vpc/latest/userguide/configure-subnets.html](https://docs.aws.amazon.com/vpc/latest/userguide/configure-subnets.html)  
[https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html)