Mirza Bilal's blog

The Downside of Vertical Scaling GPU Instances

Mirza Bilal — Sun, 14 Jan 2024 09:38:27 GMT

The artificial intelligence, machine learning, and generative AI application's growth have swelled the demand for high-performance GPU workloads. To fulfill these needs, cloud services have introduced a broad range of instances to fulfill diverse needs. For many vertical scaling has been an quick fix to increase computational power by replacing a smaller instance with a more powerful one. However, this methodology might be less effective, or even pointless for GPU-intensive workloads.

Understanding GPU Workloads

GPUs are designed for high-speed mathematical calculations. These specialized computational capabilities make them a good choice or for some applications the only option for handling large dataset mathematical operations in machine learning (ML) and video editing. CPUs primary aim is to perform sequential task execution, whereas GPUs excel as the workhorses when it comes to deep learning, 3D rendering, and scientific computations.

The Basics of GPU Scaling

Instances can be scaled vertically, horizontally or through a combination of both as explained here. The Vertical scaling strategy means upgrading to a larger server with more computational power, such as CPU, memory, and GPU. The Horizontal scaling strategy involves adding more instances instead of upgrading the current one, hence increasing resources in parallel, very much like buying additional properties in the neighborhood to accommodate your growing family.

Limitations of Vertical Scaling for GPUs

Vertical scaling's effectiveness is evident when we need more CPU or memory resources, but it hits a roadblock with GPU workloads. This is because upgrading to a larger instance does not necessarily increase the GPU's computational power. For example, the g5g.large and g4dn.large instances from AWS both house the Nvidia Tesla T4 and T4G respectively. Despite any vertical scaling between these instance types, the GPU power remains unchanged, as they contain the same number of GPU cores as shown in the following image. Hence vertical scaling will not address the actual bottleneck in GPU-intensive tasks.

Practical Implications

The implication is evident: when your application demands more GPU power, vertically scaling your instance will not deliver the required performance improvements. This inefficiency can lead to not just stagnant performance, but also increased costs, as you are paying for additional CPU and memory resources that your workload may never utilize.

An alternative to Vertical Scaling

When the bottleneck is GPU computational capabilities, horizontal scaling becomes a more viable solution. By adding more instances to your cloud infrastructure, you can achieve a near-linear increase in GPU capabilities. Horizontal scaling with right kind of servers will improve your infrastructure's resource utilization, thus offering a more cost-effective solution. Training an ML model with a vertically scaled server will add more CPU and memory but will not improve GPU power. Therefore, there will not be any improvement in training time. Shifting to horizontal scaling, with multiple GPU instances working in tandem, can significantly reduce the training time, showcasing the right approach to scaling for such tasks.

Conclusion

While vertical scaling offers a quick solution for prompt performance boost in certain scenarios, it falls short with GPU instances option available. I highly recommend checking out horizontal and vertical scaling for a deeper understanding of when to employ each strategy. A nuanced grasp of these scaling methods is crucial for optimizing performance and effective cost management in various computational scenarios. For tasks that are GPU-bound, adding more GPU power through horizontal scaling or re-architecting the solution is often the more effective path. Even for horizontal scaling, opt for the least powered instance, such as G5g.xlarge or G4dn.xlarge, to avoid paying for extra unused CPU power that is not required. Adopting the right scaling strategy, you can ensure that your infrastructure is not just robust but also cost-effective and performance-optimized for your specific computational needs.

Scaling the Cloud: Vertical and Horizontal Scaling Strategies

Mirza Bilal — Sat, 06 Jan 2024 23:16:01 GMT

In the ever-expanding digital empire, a multitude of apps and solutions emerge with the intent to simplify human lives and boost productivity. As the demand grew, cloud services have been proactive in providing diverse array of options to cater these needs. Organizations strive for an cost-effectiveness infrastructure without compromising on quality of service.

The need to efficiently scale infrastructure has never been greater. No infrastructure can remain stagnant, and inevitably, scaling is needed to match these growing needs. In this article, I will discuss the idea of horizontal and vertical scaling, delving into the details of each approach. This writing aims to provide insights into which scaling strategy and which methodology will best suit your specific requirements. Thus helping you make the informed decisions to fulfill infrastructure needs.

Vertical Scaling

Picture a house, meeting your needs comfortably. Life is good; everything fits adequately within available space. However, as your family grows and your lifestyle evolves, the need for additional space becomes apparent. To address this demand, you opt to extend upwards, and adds more floors to meet your expanding requirements.

Let's draw a parallel to this housing expansion in terms of cloud infrastructure. Imagine you have an AWS C6g.large instance with 2 vCPUs and 4GiB of Memory to process PDF documents. It has served you well for a while, but as your product gains more traction, the demand for processing PDFs increases and now you need more computational power. To address this demand, you replace C6g.large instance with a larger one, like C6g.xlarge with 4 vCPUs and 8GiB of memory. This upgrade effectively adds more "floors" to your digital dwelling, theoretically allowing you to process twice as many PDFs as compared to the previous instance. This phenomenon of replacing a smaller instance with a larger one is termed as "Vertical Scaling" in the cloud computing landscape.

Horizontal Scaling

Imagine the same scenario where your home has become insufficient to match your growing needs. However, instead of stacking more floors onto the existing structure, you decide to build or buy an adjacent building. Let's relate this house expansion strategy with increasing processing capability of cloud computing where your C6g.large instance is insufficient to process ever increasing PDF document processing. Instead of replacing the existing instance with a larger one, you adopt a different strategy. You decide to add another C6g.large instance to your computing arsenal. Each instance operates independently but contributes to the overall processing power. This concept of adding replica instances is known as "Horizontal Scaling" in the cloud computing realm.

Conclusion

Vertical and Horizontal Scaling both strategies play a crucial role in optimizing computational resources. Vertical scaling, with its simplicity in enhancing power without altering workloads, offers a straightforward approach for immediate performance improvements. On the other hand, horizontal scaling stands out in cloud computing for its remarkable flexibility, allowing easy adjustment of resources to meet fluctuating demands. However the most effective solutions often emerge from a hybrid approach: optimizing an instance through vertical scaling to align precisely with targeted workload requirements and then employing this tailored size within a horizontal scaling framework. This combination harnesses the strengths of both vertical and horizontal scaling, providing a robust and adaptable infrastructure for various computing needs.

Why Businesses Hesitate to Employ Freelancers: Unveiling the Reasons

Mirza Bilal — Wed, 08 Nov 2023 10:03:13 GMT

Hiring the right candidate for an organization can be a daunting task, as finding the perfect fit is never guaranteed despite a thorough hiring process. There is no mathematical equation for assessing a candidate's qualifications. Evaluating a candidate primarily hinges on two factors: "Technical Ability" and "Cultural Fit". Technical skills are relatively easier to measure than finding a cultural fit for the team, especially while evaluating during the few hours of the interview process.

The Guiding Principles

Despite the effort invested, there's no surety the selected candidate is technically equipped and the perfect fit for the team, leading to a leap of faith. To mitigate hiring risks for a full-time job, we have formalized several rules. One significant rule emphasizes:

💡

Prefer candidates with full-time experience over long-term freelancers.

Freelancers and Challenges

Drawing from our experiences working with long-term freelancers, we have identified some common quirks among many individuals. If you're a freelancer considering this shift, we've pinpointed traits that should be addressed for a successful transition.

Approach to Problem Solving 🧩
Freelancers often approach problem-solving differently from full-time employees. Through their extensive experience in various freelance projects, they might start embracing the "If It's Working, It's Done" mindset. This approach prioritizes immediate functionality and might disregard the best practices for task completion, potentially impacting the project's overall health, team coordination, and successful goal achievement.
Collaboration Struggles 🖇
Freelancers' prolonged solo work impacts their team collaboration skills, making it challenging for them to coordinate within the team and other teams in the organization. This lack of team experience often results in difficulties when attempting to synchronize efforts within the organization.
No Strings Attached Mindset 🔄
Freelancers often cultivate a habit of transiting from one project to another, nurturing a constant "Urge for Change" within themselves. Organizations looking for stability invest in candidates with the anticipation of a more enduring commitment and are caught off guard by departures within a short span of 2 to 3 months at times.
Short-Sightedness
The idea of "Be your own man" and doing everything on your own may seem powerful, but it's even more important to recognize the significance of being part of something bigger and creating things that fit flawlessly with others. The nature of freelancing sometimes requires a shorter-term focus on immediate tasks or projects and does not realize "Your work will be a cog in the machine instead of the machine itself". Companies prefer candidates who can take a more holistic and long-term view, contributing to the broader vision and goals of the company, rather than focusing solely on short-term tasks. Forward-thinkers who foresee future needs are the architects behind groundbreaking products.
The Performance Metrics 📈
High-performance applications significantly influence user satisfaction, productivity, and competitive advantage in today's market. The Freelancer's "quantity over quality" approach for delivering tasks quickly, with minimal effort and time, often leads to subpar unoptimized output. These modules and products adversely affect product performance, leading to customer dissatisfaction, impeding growth, and negatively impacting customer retention.

Conclusion

"Pleasure in the job puts perfection in the work."
-- Aristotle

The purpose of the discussion is not to devalue the role of freelancers, rather it is important to acknowledge their significant contributions to the thriving tech industry. While freelancers bring so much to the table, their alignment with organizational objectives often poses challenges. Recognizing the strengths and understanding the potential discord between freelancing and long-term goals is essential for refining hiring strategies and ensuring an ideal match for both employees and organizations. Freelancers, too, should introspect on these common issues discussed and address them, regardless of pursuing a full-time job or continuing in their freelancing career.

How To Enable Hardware Acceleration on Chrome, Chromium & Puppeteer on AWS in Headless mode

Mirza Bilal — Wed, 25 Oct 2023 11:50:54 GMT

Running Google Chrome with hardware acceleration in headless mode can be more challenging than it appears. We embarked on this journey with Remotion, which is an excellent framework that enables developers to "Make Videos Programmatically". On our way, explore various Nvidia driver versions, including those from Nvidia's website and Ubuntu's official repositories.

We collaborated with the vibrant Remotion Open-source community to find the answer for using GPU with Remotion for server-side rendering. After encountering some setbacks on our way, we were able to make it work for Remotion eventually. We consolidated our findings in the Remotion Docs at Using the GPU in the Cloud and simplified the instructions for Remotion developers to make the most out of it. Although the whole research was done for Remotion, the same process will work for any headless application that requires GPU power, especially in headless mode.

Hardware Acceleration in Browser Rendering

Hardware Acceleration is to your browser what "Nitrogen Fuel" is for a gasoline car. It uses the power of specialized hardware like GPUs to make your web pages load faster and videos play smoothly. Hardware acceleration improves the part where the web page is displayed and shown to you, known as rendering. This results in a smoother and snappier browsing experience.

DLAMI from Marketplace?

One may wonder why we didn't opt for the AWS Deep Learning AMI or DLAMI. But we needed fine-grained control over the GPU drivers and utilities, and the outdated DLAMIs despite all the bloatware was not the answer. Such tasks are often challenging to achieve with pre-packaged AMIs as discussed in "Why Deep Learning AMI is Holding You Back". Tailoring the environment to our exact requirements also helped us troubleshoot issues effectively.

We experimented with multiple packages, to identify the key ones needed for Google Chrome to recognize the GPU. Despite these efforts, GPU recognition remained elusive, and in some instances, GPU drivers were crashing even at initialization. The long and arduous journey has brought us to the end of the tunnel, yet the darkness persists, and the solution remains as elusive as ever. As we decided to call it a day, We stumbled upon this thread on Puppeteer's Github issues, which did not give us any hope, but did give us a new perspective and drive to achieve our goal.

The Fresh Start

It was time to return to the drawing board and start afresh. We started with a clean base Ubuntu AMI on AWS EC2 Instance. After that, we downloaded and installed the Nvidia Tesla T4G drivers from Nvidia's official website.

Once confirmed that the GPU was working properly using nvidia-smi, We installed Google Chrome and tested it for hardware acceleration, to our relief Google Chrome finally gave a nod to work with Hardware Acceleration.

Launching an EC2 Instance

Launching an EC2 instance can be achieved in various ways, with the AWS Web Console and AWS CLI being two notable options. Regardless of the method, the important parameters are instance type and AMI selection. At the time of writing, the AMI ID for our chosen instance g4dn.xlarge is ami-053b0d53c279acc90. For launching the EC2 instance via aws-cli, the following command can be used, after replacing the placeholders with actual values.

aws ec2 run-instances \  --instance-initiated-shutdown-behavior "terminate" \  --image-id "ami-053b0d53c279acc90" \  --instance-type "g4dn.xlarge" \  --key-name "" \  --subnet-id "" \  --security-group-ids "" \  --iam-instance-profile Arn="" \  --block-device-mappings '[  {    "DeviceName": "/dev/sda1",    "Ebs": {      "VolumeSize": 8,      "VolumeType": "gp3"    }  }]'

Ubuntu Upgrade

Ubuntu 22.04 AMI ami-053b0d53c279acc90 on AWS comes with Linux Kernel v5 (specifically 5.19.0-1025-aws). At this point, we have two options either we stick with Kernel v5 or Upgrade to v6 (v6.2.0-1013-aws), we decided to upgrade the Kernel. This decision is crucial as the Nvidia driver compiled for one version will not work for the other.

After launching the instance, SSH into the instance and initiate the Ubuntu packages upgrade process. To update all the installed packages including Linux Kernel execute the following one-liner.

sudo bash -c "apt update && export DEBIAN_FRONTEND=noninteractive && export NEEDRESTART_MODE=a && apt upgrade -y && reboot"

We can also split apt update, apt upgrade -y and reboot. But the reason behind doing it like this is to install these updates in non-interactive mode.

After the execution, the system will reboot with Linux Kernel Version 6. We can confirm the kernel update with uname -r. Rebooting is essential when upgrading the Ubuntu system because it loads the new Kernel on the next boot, and Nvidia drivers will subsequently be built for the updated version.

Nvidia GPU Driver Installation

The process of installing GPU drivers is straightforward when we have the correct driver for the GPU and the corresponding compatible version. Before starting the installation we need to install a couple of packages: build-essential and libvulkan1 . The former is a bundle with a variety of compilation tools required to compile the Nvidia driver. The latter, while not mandatory, is required by Google Chrome. Therefore it's a good idea to install it beforehand to enable support for Vulkan ICD loader for Nvidia, should we require it in the future.

sudo apt install -y build-essential libvulkan1

After build tools are installed, we can proceed to download and install the GPU drivers using the following commands:

DRIVER_URL="https://us.download.nvidia.com/tesla/535.104.12/NVIDIA-Linux-x86_64-535.104.12.run"DRIVER_NAME="NVIDIA-Linux-driver.run"wget -O "$DRIVER_NAME" "$DRIVER_URL"sudo sh "$DRIVER_NAME" --disable-nouveau --silentrm "$DRIVER_NAME"

Now we can run nvidia-smi to confirm the GPU drivers installation. The output will look similar to this:

Configuring the Startup Service

We learned this the hard way: running nvidia-smi once is needed for the proper initialization of EGL and ANGLE. Google Chrome and Chromium fail to initialize EGL without this preliminary setup. To automate this process, we will create a service that runs nvidia-smi at boot time using following commands:

echo '[Unit]Description=Run nvidia-smi at system startup[Service]ExecStart=/usr/bin/nvidia-smiType=oneshotRemainAfterExit=yes[Install]WantedBy=multi-user.target' | sudo tee /etc/systemd/system/nvidia-smi.servicesudo systemctl enable nvidia-smi.servicesudo systemctl start nvidia-smi.service

Testing the Installation

To test hardware acceleration with Google Chrome or Chromium, we can follow these steps. It's worth noting that Google Chrome has more dependencies, making it an easier choice if we plan to use the Node.js library, Puppeteer.

On the other hand, Chromium is a lightweight option with a smaller footprint of dependencies, but we can still use Puppeteer with Chromium by installing a few extra dependencies.

Chromium

Installation:
Installing Chromium is straightforward, as it is readily available with Ubuntu's official software repositories. We can install it using apt with the following command.

sudo apt install -y chromium-browser

Testing:
Chromium is fully prepared to handle Hardware Acceleration tasks. We can verify its readiness by launching Chromium with the specified parameters.

chromium-browser --headless --use-gl=angle \   --use-angle=gl-egl --use-cmd-decoder=passthrough \   --print-to-pdf=output.pdf 'chrome://gpu'

To use Puppeteer with Chromium instead of Google Chrome, there are a few extra dependencies that should be installed.

sudo apt install ca-certificates fonts-liberation \    libasound2 libatk-bridge2.0-0 libatk1.0-0 libc6 \    libcairo2 libcups2 libdbus-1-3 libexpat1 libfontconfig1 \    libgbm1 libgcc1 libglib2.0-0 libgtk-3-0 libnspr4 libnss3 \    libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 \    libx11-xcb1 libxcb1 libxcomposite1 libxcursor1 libxdamage1 \    libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 \    libxtst6 lsb-release wget xdg-utils

Google Chrome

Installation:
Google Chrome can be installed from multiple options, but we will use the Google Chrome official repository for Debian. We need to add it to Ubuntu's repo list along with the public signing key.

curl -fsSL https://dl.google.com/linux/linux_signing_key.pub | sudo gpg --dearmor -o /usr/share/keyrings/googlechrom-keyring.gpgecho "deb [arch=amd64 signed-by=/usr/share/keyrings/googlechrom-keyring.gpg] http://dl.google.com/linux/chrome/deb/ stable main" | sudo tee /etc/apt/sources.list.d/google-chrome.listsudo apt updatesudo apt install -y google-chrome-stable

Testing:
Google Chrome is ready for our Hardware Acceleration tasks. We can test it by launching Google Chrome with the following parameters.

google-chrome-stable --headless --use-gl=angle \    --use-angle=gl-egl --use-cmd-decoder=passthrough \    --print-to-pdf=output.pdf 'chrome://gpu'

The above commands will generate output.pdf, and transfer it to your local machine to check the status of Hardware Acceleration in Google Chrome or Chromium. If the process went smoothly, the resultant PDF will look like this.

Working with Puppeteer

To verify GPU acceleration using the Node.js library Puppeteer, follow these steps:

Install Node.js, as this clean Ubuntu AMI does not include any unnecessary packages. Use the following commands:

curl -fsSL https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key | sudo gpg --dearmor -o /etc/apt/keyrings/nodesource.gpgNODE_MAJOR=18echo "deb [signed-by=/etc/apt/keyrings/nodesource.gpg] https://deb.nodesource.com/node_$NODE_MAJOR.x nodistro main" | sudo tee /etc/apt/sources.list.d/nodesource.listsudo apt updatesudo apt install nodejs

After installing Node.js, we can install Puppeteer by running npm i puppeteer in our code directory. Here's a sample index.js file for checking hardware acceleration:

const puppeteer = require('puppeteer');(async () => {  const browser = await puppeteer.launch({    headless: true,     args: ['--use-gl=angle', '--use-angle=gl-egl'],   });  const page = await browser.newPage();  await page.goto('chrome://gpu');  await page.waitForTimeout(2000);  await page.screenshot({ path: 'output.png' });  await browser.close();})();

Run the script using node index.js. This will generate an output.png file containing hardware acceleration information.

Note from the Creator

Mirza has been immensely helpful in helping the Remotion community adopt GPU-accelerated rendering.
He's researched and written about how to obtain, configure and run EC2 containers and headless Chrome correctly in order to take advantage of graphics acceleration.
This area has been especially hard to crack, and without Mirza we would not have been able to unlock huge speed boosts.
We are super grateful for that!
--
Jonny Burger
Creator / Remotion.dev

Conclusion

With hardware acceleration now readily available in headless mode, we can harness the power of GPUs for faster and more complex rendering tasks. Our next steps involve creating a custom AMI from this instance and streamlining the process using AWS Image Builder Pipelines for efficiency. Additionally, we plan to extend hardware capabilities to Docker containers, further extending our options.

Resources

Remotion - Make videos programmatically:
https://remotion.dev
Remotion - Using the GPU in the cloud:
https://www.remotion.dev/docs/miscellaneous/cloud-gpu
Cannot enable GPU acceleration:
https://github.com/puppeteer/puppeteer/issues/3637
Recommend a workflow for using the GPU on a AWS instance:
https://github.com/remotion-dev/remotion/issues/2889
Getting Started with Headless Chrome:
https://developer.chrome.com/blog/headless-chrome/
Why Your AWS Deep Learning AMI is Holding You Back and How to Fix:
https://mirzabilal.com/why-your-aws-deep-learning-ami-is-holding-you-back-and-how-to-fix
Deep Learning with AWS Graviton2 + NVIDIA Tensor T4G for as low as free* with CUDA 12.2:
https://mirzabilal.com/deep-learning-with-aws-graviton2-nvidia-tensor-t4g-for-as-low-as-free-with-cuda-12-2-56d8457a6f6d
How to install FFmpeg with Harware Accelaration on AWS:
https://mirzabilal.com/how-to-install-ffmpeg-with-harware-accelaration-on-aws
CPU vs GPU for Video Transcoding - Challenging the Cost-Speed Myth:
https://mirzabilal.com/cpu-vs-gpu-for-video-transcoding-challenging-the-cost-speed-myth

CPU vs GPU for Video Transcoding: Challenging the Cost-Speed Myth

Mirza Bilal — Wed, 11 Oct 2023 19:00:00 GMT

Introduction

When evaluating computational power, especially in terms of CPUs and GPUs, theres a prevailing narrative. A general belief is, that CPUs may take longer to process, but they're cost-effective, whereas GPUs might be faster but operate at a higher cost.

How true is this widely accepted notion?

To challenge this belief, we conducted a tangible, real-world assessment using AWS instances and FFmpeg for video transcoding benchmarks. And sought to determine the most cost-and-time-efficient option for transcoding videos and audio, hence enabling us to save on our AWS bills.

The Selection of Instances

In our tests, we compared various AWS instance types, covering both GPU and CPU across Intel and AWS's Silicon-based Graviton2 instances. On the GPU side, we picked instances featuring Nvidia Tesla T4 and T4G. Whereas for CPUs, we looked at three instances, two from the same generation and size an Intel-based c7i.2xlarge, and Graviton2-powered c7g.2xlarge. The third CPU-based instance we chose was c6g.4xlarge, to assess the impact of more vCPU on transcoding.

GPU Instances:

g4dn.xlarge
g5g.xlarge

CPU Instances:

c7g.2xlarge
c7i.2xlarge
c6g.4xlarge

We made thoughtful selections for each of these instances. We aimed to choose those with similar costs to ensure a fair cost-to-performance comparison. Additionally, to explore the performance implications of doubling the CPU count, we extended our benchmarks to the c6g.4xlarge instance.

The Process

Before analyzing the results, it's important to discuss and understand the types of tests that were conducted.

Downscale to 480p:
Downscaling is the process when the video is squeezed smaller than its original size. It's useful for platforms or devices that cannot support high-resolution videos or when smaller file sizes are needed. For this test, we downscale the input video to 480p (640 pixels x 480 pixels).
Resample at 720p:
Resampling does not change the video's resolution but may alter the underlying pixel values. It can be beneficial for modifying encoding settings or applying specific filters. In this case, we resampled the video at its original resolution of 720p (1280 pixels 720 pixels).
Upscale to 1080p:
Upscaling is the opposite of downscaling and is used to convert the video to higher resolution. Upscaling generally produces better results than playing or rendering a smaller video and stretching at playtime. In this test, we upscale the video to a higher resolution of 1080p (1920 pixels 1080 pixels).
No Scaling:
All the above tests were conducted using a scale filter of FFmpeg but for this test, we did not provide any filter for scaling instead we simply re-encoded the video.

Benchmark Details

To ensure objectivity, we use the same video file for benchmarking, the input video details are as follows:

Container Format: mp4
Duration: 01:34:40.38
Bitrate: 1579 kb/s
Video Codec: h264 (High),
Video Resolution: 1280x720 @ 23.98 fps
Audio: aac (LC), Sample Rate: 48000 Hz, 5.1

We utilized FFmpeg, a leading open-source software for multimedia processing, to devise our benchmark script. The script contains tests for both CPU and GPU-powered machines, first, it checks whether GPU is available or not. Depending on the result, it executes the appropriate command for video processing.

To execute our transcoding tasks, we used the following benchmark script:

#!/bin/bashif lspci | grep -i "NVIDIA Corporation" >/dev/null; then    echo "System has a GPU"commands=(    'ffmpeg -y -hide_banner -hwaccel cuda -hwaccel_output_format cuda -i input.mp4  -vf "scale_cuda=720:480"   -c:a copy -c:v h264_nvenc output.mp4 -benchmark'    'ffmpeg -y -hide_banner -hwaccel cuda -hwaccel_output_format cuda -i input.mp4  -vf "scale_cuda=1280:720"  -c:a copy -c:v h264_nvenc output.mp4 -benchmark'    'ffmpeg -y -hide_banner -hwaccel cuda -hwaccel_output_format cuda -i input.mp4  -vf "scale_cuda=1920:1080" -c:a copy -c:v h264_nvenc output.mp4 -benchmark'    'ffmpeg -y -hide_banner -hwaccel cuda -hwaccel_output_format cuda -i input.mp4                             -c:a copy -c:v h264_nvenc output.mp4 -benchmark')else commands=(    'ffmpeg -y -hide_banner -i input.mp4 -vf "scale=720:480"   -c:a copy -c:v libx264 output.mp4 -benchmark'    'ffmpeg -y -hide_banner -i input.mp4 -vf "scale=1280:720"  -c:a copy -c:v libx264 output.mp4 -benchmark'    'ffmpeg -y -hide_banner -i input.mp4 -vf "scale=1920:1080" -c:a copy -c:v libx264 output.mp4 -benchmark'    'ffmpeg -y -hide_banner -i input.mp4                       -c:a copy -c:v libx264 output.mp4 -benchmark')fifor cmd in "${commands[@]}"; do    echo "--------------------------------------------------------------------------------"    echo "Executing: $cmd"    echo "--------------------------------------------------------------------------------"    # Use the time command to measure how long it takes to run the command    { time eval "$cmd"; } 2>&1    rm output.mp4done | tee output_results.txt

The Findings

The data from our AWS benchmarks after executing 20 different tests over five different AWS Instances, painted a compelling narrative. The results showed clear differences in the cost-efficiency and performance dynamics among these instances. The data extracted and processed from benchmark results can be listed as:

The GPU instances, notably AWS Graviton2 g5g.xlarge, were not only faster but also more cost-effective across the various transcoding operations, compared to the CPU-centric instances like c7g.2xlarge. The introduction of the c6g.4xlarge, with its doubled vCPUs, provided insights into how increasing computational power influences performance and cost. Interestingly, even with the added CPUs, despite it being more expensive than GPU-powered instances, it performed significantly worse and GPU instances continue to offer a better balance between speed and cost. Furthermore, it is important to mention here that, the FFmpeg was built to run on multiple cores and was utilizing all CPU cores as can be seen in htop screenshot during a transcoding task.

Let's create a visualization of benchmark results to compare the time taken and cost when running on different EC2 instances.

The Winner?

From the previous charts, it's evident that the AWS Graviton2-based G5g.xlarge emerges as the most efficient choice. Not only does it excel in efficiency, but it also appears to be more cost-effective. To further illustrate its cost advantage, let's juxtapose it with various AWS instances to discern just how economical it truly is.

The bar chart offers a vivid representation of how G5g.xlarge stacks up against other AWS EC2 instances in terms of cost. When downscaling to 480p, the G5g.xlarge is significantly more cost-effective, with the c6g.4xlarge, which is the most expensive across different transcoding tasks, which is a whopping 370.9% more expensive than G5g.xlarge for downscaling operations. For the resampling at 720p, the disparity grows even more evident, with the c6g.4xlarge being 445% pricier than the G5g.xlarge. Similarly, when upscaling to 1080p, the cost associated with c6g.4xlarge is 438.9% more than our winner. Finally, for the 'No Scaling' operation, c6g.4xlarge proves to be 446.1% more expensive.

In stark contrast, the g4dn.xlarge, although being one of the GPU-based instances, presents minimal cost differences when compared with G5g.xlarge. Its costs are just around 24.8% to 27% for the various operations, showcasing that while GPUs might be fast, their cost benefits, especially in this case, aren't always as pronounced.

These findings underline the impressive cost efficiency of the AWS Graviton2 G5g.xlarge featuring Nvidia Tesla T4G, when placed against other popular AWS instances.

In Conclusion

The ever-evolving realm of technology often holds narratives based on past truths, which may not hold relevance today. Our experiment underscores a crucial fact: in video transcoding, modern GPU instances aren't just faster; they also offer a more economical choice. When choosing between a CPU or GPU for cloud-based operations, it's essential to consider both performance and cost. And as demonstrated, sometimes the supposedly "faster and pricier" option can also be the most cost-effective.

Building a Robust Backend in Just 30 Minutes with Outerbase

Mirza Bilal — Mon, 02 Oct 2023 06:03:03 GMT

Introduction

In the ever-evolving landscape of software development, new tools and technologies consistently emerge. However, every once in a while, something truly groundbreaking makes its mark. Today, we're looking into one such potential innovation Outerbase, and I promise to be candid, discussing both its strengths and weaknesses.

When the idea for RateMyCraft germinated in my mind, I was both eager and hesitant. My eagerness stemmed from the potential of the project, but my reluctance arose from the anticipated time-consuming backend setup setting up databases, crafting APIs, and the works. This initial phase usually demands significant time, and frustratingly, often offers little to showcase for all the effort.

Enter Outerbase. A tool that, I believe, has the promise to usher in a transformative shift in how many of us approach backend development. But like all tools, it has its highs and lows, which we'll explore in-depth.

https://www.youtube.com/watch?v=m7LMcFRsmxY

What is RateMyCraft?

RateMyCraft offers users a platform to explore, rate, and review various local services, from plumbers and electricians to bakeries. Think of it as a close-knit community where people share their experiences, ensuring others can make informed choices about local services.

Unraveling Outerbase: The Future of Backend Development

Before diving into the intricacies of RateMyCraft's functionalities, it's imperative to lay the foundation by understanding Outerbase the backbone of our backend.

Setting Up a Database with Outerbase

The beauty of Outerbase is that it streamlines the process of initializing and managing databases. Heres a brief walkthrough:

Database Creation: Within the Outerbase environment, you have the flexibility to connect to an existing MySQL, Postgres, or any other supported database. Alternatively, you can create a new SQLite database from scratch with just a few clicks

Table Creation and Structure Definition: Once the database is in place, the next step is to define your tables. With Outerbase, this step is straightforward. Use a simple command like from Outerbase web console:

 -- Table Creation for 'service_providers' CREATE TABLE service_providers (     id INTEGER PRIMARY KEY,     name TEXT NOT NULL,     services TEXT NOT NULL,     contact_info TEXT NOT NULL,     description TEXT,     city TEXT NOT NULL ); -- Table Creation for 'reviews' CREATE TABLE reviews (     id INTEGER PRIMARY KEY,     provider_id INTEGER,     nickname TEXT NOT NULL,     content TEXT,     rating INTEGER CHECK (rating >= 1 AND rating <= 5),     FOREIGN KEY (provider_id) REFERENCES service_providers(id) );

Populating the Tables: Now that your tables are set up, you can populate them with data either by manual entry or by importing data sets. The latter is especially handy if youre migrating from another system or have bulk data ready.

-- SQL queries for generating test data can be grabbed from github repo of this projectINSERT INTO service_providers (name, services, contact_info, description, city) VALUES ('Baker Delight', 'Bakery', '+11234567915 | baker@delight.com', 'Experience the finest baked delicacies in Chicago.', 'Chicago'),('Chicago Car Care', 'Car Mechanic', '+11234567896 | care@chicagocar.com', 'Premium car maintenance services ensuring smooth drives every time.', 'Chicago'),('CleanSpace NY', 'Cleaning', '+11234567899 | clean@space.com', 'Revitalize your home with professional cleaning services.', 'New York')-- Add as many services as you wantINSERT INTO reviews (provider_id, nickname, content, rating) VALUES(1, 'BakingLover', 'The pastries are to die for!', 5),(2, 'CarFan123', 'Got my car repaired here. Top-notch service!', 4),(2, 'Jess', 'Pretty quick service. The price was reasonable.', 4),-- Add as many reviews as you want

This whole process will not take more than 5 minutes, but we only have a database with test data, how about the API endpoints let's dive into that and see if we can do the rest in remaining 25 minutes.

The Outerbase commands magic and API endpoints

Ever felt that building a backend is like trying to solve a jigsaw puzzle with pieces from different sets? I've been there! But are Outerbase Commands any better or more helpful in this regard? Well, my journey led me to some interesting answers. Let's explore.

Think of it as that friend who always has the right tools for the job. Now, lets dive into how this nifty tool simplified the often intricate process of creating API endpoints for me. For this project, we need multiple endpoints so we can present and update ratings for different service providers.

1. New Command

Let's start with creating the commands for /providers endpoint. This endpoint will list all the service providers registered with RateMyCraft. To create a command go to
+ New -> Commands

This will show a popup where you specify the Name, Path and Type (Get/Post/Put/Delete) for the endpoint. The following screenshot demonstrates this process.

2. Javascript Command Node

The /providers endpoint can be invoked with or without city GET parameters. If the city param is not provided then we will list all the service providers, if the city param is provided for example /providers?city=Miami, this will return all the services from that city, in this case Miami.
The following javascript code will return the city name or %, depending on city parameters. Let's write a code to read query parameters into a node and call it a "City Node".

function userCode() {    const city = {{request.query.city}};    if (typeof city !== 'string' || city.trim() === '') {        return "%";    }     return city;}

3. SQL Query Command Node

Now we will use the city name returned from the javascript node in the SQL query node. The interesting thing here is you can pass on data from one node to the other here we will pass the city from City Node to Query Node, you can see in the following code we are using {{city-node}}, which is the placeholder for the return value from the previous node.

SELECT     s.*,     ROUND(AVG(r.rating), 1) AS average_rating,     COUNT(r.rating) AS total_ratings FROM service_providers AS s LEFT JOIN reviews AS r ON s.id = r.provider_id WHERE s.city LIKE '{{city-node}}' GROUP BY s.id;

The command will eventually look like the following screenshot and you know what that it /provider endpoint is finished, you can access that by simply sending a request to https://thoughtless-amber.cmd.outerbase.io/providers. thoughtless-amber is the unique id and will change for each project.

Other endpoints

Since providers was our first node it took us a little longer and was done in 10 minutes, but now we have the basic understanding of Outerbase commands, different options and where to find what. Now all we need to do is focus on the logic and write in Outerbase.

Now we will implement the remaining endpoints

Retrieving Reviews

This will share the same 2-node structure, the first node will format the data and the second will run the query. Since Outerbase is still in beta the functionality to return in between is yet not available but it has been documented to work like the following where in case of provider_id is not given we will return 400. But currently we can not return in between a node. After this feature is released it will give the developers a lot more control and can do a lot of different things.

Endpoint: /reviews

Nodes:

  function userCode() {      const provider_id = {{request.query.provider_id}};      if (typeof provider_id !== 'string' || provider_id.trim() === '') {          return {              status: 400,              error: "missing first name from request body"          }      }       return provider_id;  }

  SELECT * FROM reviews WHERE provider_id = '{{provider-node}}';

Testing: After this is done your /reviews endpoint is ready and can be tested using the following example

curl -X GET "https://thoughtless-amber.cmd.outerbase.io/reviews?provider_id=2"

Searching Service Providers

To demonstrate the simplicity and power of Outerbase commands we will only be using a single node with SQL query for search.

Or we can say that "SQL Query API Endpoint" let that sink in. 🤯

Endpoint: /search

Nodes:

  SELECT * FROM service_providers   WHERE     name LIKE '%{{request.query.query}}%'     OR services LIKE '%{{request.query.query}}%';

Testing: After this is done your /search endpoint is ready and can be tested using the following example

    curl -X GET "https://thoughtless-amber.cmd.outerbase.io/search?query=Beauty"

Fetching Provider Details

For some functions we need to fetch the provider details, for that, we will create the following endpoint.

Endpoint: /provider

Nodes:

  SELECT * FROM service_providers WHERE id = '{{request.query.id}}'

Testing: After this, your /provider endpoint is ready and can be tested using the following example**:**
```
  curl -X GET "https://thoughtless-amber.cmd.outerbase.io/provider?id=1"
```

Adding a Review

The rating system without the functionality for adding a review is incomplete. Let's create an endpoint for that as well. For this endpoint, we will select POST as command type.

Endpoint: /review/add

Nodes:

  INSERT INTO reviews(provider_id, nickname, content, rating) VALUES({{request.body.provider_id}}, {{request.body.nickname}}, {{request.body.content}}, {{request.body.rating}})

Testing: Now /review/add endpoint is ready and can be tested using the following example

curl -X POST "https://thoughtless-amber.cmd.outerbase.io/review/add" \-H "Content-Type: application/json" \-d '{"nickname":"Mirza","rating":"5","content":"The best cupcakes in town","provider_id":"1"}'

Adding a Service Provider

New service providers can also register themselves or users can simply add the last service provider they used themselves in RateMyCraft.

Endpoint: /provider/add

Nodes:

  INSERT INTO service_providers (name, services, contact_info, description, city) VALUES ({{request.body.name}}, {{request.body.services}},{{request.body.contact_info}},{{request.body.description}}, {{request.body.city}});

Testing: Now /provider/add endpoint is ready and you can add the provider using the following template.

  curl -X POST "https://thoughtless-amber.cmd.outerbase.io/provider/add" \  -d '{"name":"Mirza",  "contact_info":"mirza@example.com | +1234232232",  "city":"New York",  "description": "We provide all kind of Electronics repairs",  "services":"Electronics Repair"  }'

Update Service Provider

Users or admins also need the functionality to update any field of service provider for that we will create our first PUT request. Due to the current limitation with Outerbase, we will first fetch the current values for the provider and then replace only the ones that the client has requested to update and use the rest as it is.

Endpoint: /provider/update

Command:

SELECT * FROM service_providers WHERE id = {{request.body.id}};

function userCode() {    const requestBody = {        id: {{request.body.id}},        name: {{request.body.name}},        services: {{request.body.services}},        city: {{request.body.city}},        contact_info: {{request.body.contact_info}},        description: {{request.body.description}}    };    const current_values = JSON.parse({{data-node}});    const items = current_values?.response?.items;    if (!items || !Array.isArray(items) || items.length === 0) {        return "No items available";    }    let item = items[0];    // Update item properties if the corresponding request body values are valid    for (let key in requestBody) {        if (typeof requestBody[key] === 'string' && requestBody[key].trim() !== '') {            item[key] = requestBody[key];        }    }    return item;}

UPDATE service_providers SET     name = {{params-node.name}},    services = {{params-node.services}},    contact_info = {{params-node.contact_info}},    description = {{params-node.description}},    city = {{params-node.city}}WHERE     id = {{params-node.id}};

Deleting a Service Provider

To remove a service provider we will use DELETE request.

Endpoint: /provider/delete
Command:

DELETE FROM service_providers WHERE id = {{request.body.id}};

Did you see how fast creating these Outerbase commands was? Oh, rather these API Endpoints 😊, and these last 7 endpoints hardly took 15 mins. The only thing that could hinder your progress or could consume more time is the current beta state, the Outerbase team has a lot of ground to make, but the idea is worth waiting for.

APIs which are the backbone for RateMyCraft are ready and deployed in barely 30 minutes, The disclaimer here is, that I have spent quite some time with Outerbase and knew what would work out of the box and, what would require workarounds. It might take you longer for the first time but once you are familiar with it, and you understand your problem you can do it even for less.

Where is the Dashboard?

In applications like RateMyCraft, a dashboard isn't just an accessoryit's the nerve center. It's where you'd typically view data summaries, gain insights, and make decisions. Now, traditionally, creating an intuitive dashboard involves layers of design, development, and countless hours of tweaking. But what if there was a shortcut? Another functionality of Outerbase is the integration of EZQL.

With Outerbase's EZQL, the very platform can serve as your dashboard! This isn't about cutting corners; it's about optimizing processes. Instead of toggling between multiple tools, you can query directly and get real-time insights, all within Outerbase.

For instance, need to know the top-rated car mechanics in Chicago? All you need to do is ask the Outerbase magician

Get the best mechanic from Chicago based on average ratings

Isn't it magic? 🪄 You can do so many things with Outerbase, not only you, but your non-technical co-founder can access everything without any technical know-how or without waiting for developers to finish the dashboard and add the specific data he need. It's that simple! With EZQL, you're not just building an application; you're crafting an experienceboth for your users and yourself as a developer.

The Final Product

After harnessing the powerful capabilities of Outerbase and weaving them into the backend of RateMyCraft, the outcome is nothing short of spectacular. But as they say, a picture is worth a thousand words. So, let me show you!

As for the aesthetic and user-friendly frontend, I utilized Vuetifya fantastic framework for Vue.js. It played a pivotal role in giving life to the visuals and interactivity of RateMyCraft. However, getting deep into the intricacies of Vuetify is beyond the scope of this article. Today, our spotlight remains firmly on the magic and efficiency that Outerbase brought to the table. But if you are interested you can check the code in GitHub repo for this project.

The fusion of Outerbase's backend prowess with a polished frontend showcases what modern tools can achieve when used innovatively.

The Pros and Cons of Outerbase

When navigating the seas of software development, it's essential to have a balanced view of the tools we use. Outerbase, despite its groundbreaking capabilities, is no exception. It's worth noting that Outerbase is currently in beta, which means some bumps in the road are to be expected. That said, let's weigh the advantages against the challenges.

Pros:

Responsive Team that Listens: There's immense value in having a team behind a product that's not just technically adept but is also receptive to feedback. This means issues get addressed, and user suggestions often shape the tool's evolution.
Growing Community on Discord: A robust community is a sign of a product's potential. With discussions, troubleshooting, and shared experiences, new users find a supportive ecosystem ready to assist.
Intuitive UI: Outerbase's user interface stands out for its clean design and user-centric approach. Even those without a solid technical background find it easy to navigate and understand.
Flexibility of Commands: The ability to craft commands using multiple programming languages provides developers with a versatile toolkit, ensuring they can tackle diverse challenges head-on.

Cons:

Incomplete Documentation: As with many beta products, the documentation isn't exhaustive. While foundational topics are covered, some nuanced or advanced features lack comprehensive guides or no documentation at all.
Broken Features: Being in beta means not everything is polished to perfection. Users might stumble upon features that don't work as expected. However, with the responsive team behind Outerbase, these issues are often addressed swiftly.
Absence of an Inbuilt Authentication System: Security is paramount, especially in backend operations. The lack of a built-in authentication system means developers might need to integrate third-party solutions or build custom ones, which can be time-consuming.

The Future?

Outerbase has nailed down many things exceptionally well, but as with any evolving product, there's always room for enhancements. As a staunch fan of OpenAPI specifications, I'd relish the option to import them, translating directly into Outerbase commands. Each endpoint from the specification could manifest as an Outerbase command, with distinct steps or nodes defined through specialized 'vendor-specific extensions'. For comparison, AWS employs x-amazon-apigateway-integration. In a similar vein, Outerbase could introduce its proprietary extension. I believe this concept harmonizes perfectly with Outerbase's vision of rapid backend development. Implementing such a feature would mark a significant leap towards realizing the effortless backend platform that Outerbase aspires to become.

Conclusion

This is surely the future for backend development, and some product is going to rule this space, will it be Outerbase? Everything depends on the Outerbase team if they can deliver on their promises.

Like any tool, Outerbase has its strengths and areas for improvement. However, its potential shines through, and as it matures out of beta, many of the current challenges are likely to be ironed out.

Outerbase, with its revolutionary approach to backend operations, enabled the creation of a robust backend in just 30 minutes. For fellow developers out there, this is a testament to the capabilities of modern-day tools. Embrace them, and the sky's the limit!

Resources

Outerbase: https://outerbase.com
Documentation: https://docs.outerbase.com
Source Code: github.com/bilalmughal/rate-my-craft
Live Demo: ratemycraft.mirzabilal.com

How to install FFmpeg with Harware Accelaration on AWS

Mirza Bilal — Mon, 25 Sep 2023 10:09:49 GMT

Introduction

Video is not just a story; it is the storyteller. It weaves narratives, captures moments, and connects us in ways words alone cannot. The ever-increasing demand for video content in marketing, streaming and OTT platforms, social media, and mobile video consumption has increased the demand for bandwidth and server capacity. Cloud media processing has revolutionized the media industry, enhancing accessibility to video content beyond previous boundaries. The continuously increasing appetite for multimedia content and the growing prominence of deep learning have driven industry leaders like Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform to offer GPU-enabled instances tailored for parallel processing excellence.

FFmpeg is an indispensable multimedia innovation champion, consistently delivering eminence in audio-video processing and transcoding. If you are new to FFmpeg or need to setup FFmpeg for optimal performance on CPU, you can refer to the first article of this series, How to install FFmpeg on Linux from Source where we discussed about FFmpeg and provided a step-by-step guide to make the most out of FFmpeg on CPU.

Now we will take a step further and look into how FFmpeg can leverage the power of hardware acceleration, which can significantly reduce processing time and help deliver content to users faster than ever before. This How-to guide will show you how to set up FFmpeg for multimedia transcoding and other multimedia task with hardware acceleration.

Problem

Traditional CPU-based video processing on cloud servers can be time-consuming and inefficient. As videos increase in resolution and size, the processing demand grows exponentially. Even with multi-core CPU configurations, the computational overhead can cause extended processing times, leading to delays in workflows and increased costs. Given cloud infrastructure providers charge per instance-hour, the longer your server runs, the more expensive your bill becomes. Failing to embrace GPU-powered machines means underutilizing their potential, especially since GPUs are specifically designed for parallel computational tasks. Thus, there is a pressing need to harness the GPU's power to optimize FFmpeg's performance and make the most out of it.

The Remedy

Know your Hardware

The first step is to figure out the GPU for your required architecture. For the scope of this guide, we will focus on Nvidia Tesla T4 and Nvidia Tesla T4G available with AWS Graviton2 G4g and Intel-based G4dn instances on AWS, but you can follow this guide for any Nvidia GPU by installing the desired driver yourself or by following this guide and getting the appropriate driver from the Nvidia driver download center.

Update and Install System Utilities

The following snippet updates your existing libraries and other utilities. It uses some variables that are defined in the final script.

echo "Installing utilities..."dnf -y updatednf -y groupinstall "Development Tools"dnf install -y openssl-devel cmake3 amazon-efs-utils htop iotop yasm nasm jq

Setup Nvidia GPU, CUDA, and CUDNN

As the scope of this article is to demonstrate a working GPU-powered FFmpeg on AWS, the following will check whether this is the Graviton2-powered G4g instance which is based on ARM/AARCH64 architecture or an Intel-based G4dn instance. Once selected, script will install the appropriate drivers for the architecture.

if [ "$(uname -m)" = "aarch64" ]; then    echo "System is running on ARM / AArch64"    DRIVE_URL="https://us.download.nvidia.com/tesla/535.104.05/NVIDIA-Linux-aarch64-535.104.05.run"    CUDA_SDK_URL="https://developer.download.nvidia.com/compute/cuda/12.2.2/local_installers/cuda_12.2.2_535.104.05_linux_sbsa.run"    CUDNN_ARCHIVE_URL="https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-sbsa/cudnn-linux-sbsa-8.9.5.29_cuda12-archive.tar.xz"else    DRIVE_URL="https://us.download.nvidia.com/tesla/535.104.05/NVIDIA-Linux-x86_64-535.104.05.run"    CUDA_SDK_URL="https://developer.download.nvidia.com/compute/cuda/12.2.2/local_installers/cuda_12.2.2_535.104.05_linux.run"    CUDNN_ARCHIVE_URL="https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-x86_64/cudnn-linux-x86_64-8.9.5.29_cuda12-archive.tar.xz"fiecho "Setting up GPU..."DRIVER_NAME="NVIDIA-Linux-driver.run"wget -O "$DRIVER_NAME" "$DRIVE_URL"TMPDIR=$LOCAL_TMP sh "$DRIVER_NAME" --disable-nouveau --silentCUDA_SDK="cuda-linux.run"wget -O "$CUDA_SDK" "$CUDA_SDK_URL"TMPDIR=$LOCAL_TMP sh "$CUDA_SDK" --silent --override --toolkit --samples --toolkitpath=$USR_LOCAL_PREFIX/cuda-12.2 --samplespath=$CUDA_HOME --no-opengl-libsCUDNN_ARCHIVE="cudnn-linux.tar.xz"EXTRACT_PATH="$SRC_DIR/cudnn-extracted"mkdir -p "$EXTRACT_PATH"wget -O "$CUDNN_ARCHIVE" "$CUDNN_ARCHIVE_URL"tar -xJf "$CUDNN_ARCHIVE" -C "$EXTRACT_PATH"CUDNN_INCLUDE=$(find "$EXTRACT_PATH" -type d -name "include" -print -quit)CUDNN_LIB=$(find "$EXTRACT_PATH" -type d -name "lib" -print -quit)cp -P "$CUDNN_INCLUDE"/* $CUDA_HOME/include/cp -P "$CUDNN_LIB"/* $CUDA_HOME/lib64/chmod a+r $CUDA_HOME/lib64/*ldconfig

By now, you should have a working system with an Nvidia device driver which can be checked by running nvidia-smi in the terminal.

Install FFmpeg Prerequisites

Before diving into FFmpeg's installation, it's crucial to understand and set up its dependencies, so FFmpeg can process most general process tasks for different formats and get an understanding of how to expand or limit the scope of FFmpeg for your multimedia task.

System Dependencies

FFmpeg is a great tool for text processing and text rendering in videos as well. To enable this feature, we will need to install libraries, that are used for text manipulation, embedding subtitles and other fancy operations like embedding the watermark of your brand in the video. These are essential system libraries that FFmpeg requires.

dnf install -y freetype-devel fribidi-devel harfbuzz-devel fontconfig-devel bzip2-devel

Installing Audio & Video Codecs

At its core, FFmpeg functions like a versatile framework, ready to be extended with various plugins and modules. One of the primary ways this extensibility shines is through its support for many audio and video codecs. These codecs are the building blocks that allow FFmpeg to decode and encode media in various formats.

Now we will install different codecs, FFmpeg is like a template that is built in a way anyone can write a plugin like a video codec and that can be used with FFmpeg

ffnvcodec: Since the target instance is powered with GPU, ffnvcodec codec enables FFmpeg to leverage NVIDIA GPU acceleration, significantly speeding up video processing tasks for av1, H.264 and HEVC/H.265
LIBASS: FFmpeg can use LIBASS for subtitle renderer to handle subtitles in various video formats.
LIBAOM (AV1 Codec Library): FFmpeg uses LIBAOM to decode and encode AV1 videos, if you want to use software transcoding for AV1.
Other Codecs: These include libraries for various audio and video formats (e.g., libmp3lame for MP3 audio, opus for the Opus audio codec, and libogg for the Ogg container). In this build we will also enable software encoding of H.264 and HEVC/H.265 using x264 and x265 libraries .

By ensuring all these prerequisites are installed and correctly set up, FFmpeg can provide its robust set of features, and users can harness the desired power out of this great utility.

Installing FFmpeg

Finally, we will compile GPU-accelerated FFmpeg from the source, with enabled support of all the codecs and libraries we have compiled or installed in previous steps.

wget https://ffmpeg.org/releases/ffmpeg-snapshot.tar.bz2 &&tar -jxf ffmpeg-snapshot.tar.bz2 &&pushd ffmpeg &&PKG_CONFIG_PATH="$USR_LOCAL_PREFIX/lib/pkgconfig:/usr/lib64/pkgconfig:/usr/share/pkgconfig:/usr/lib/pkgconfig:$USR_LOCAL_PREFIX/lib/pkgconfig" \    ./configure \    --prefix="$USR_LOCAL_PREFIX" --disable-static --enable-shared \    --extra-cflags="-I$USR_LOCAL_PREFIX/include $NVIDIA_CFLAGS" \    --extra-ldflags="-L$USR_LOCAL_PREFIX/lib $NVIDIA_LDFLAGS" \    --extra-libs='-lpthread -lm' --bindir="$USR_LOCAL_PREFIX/bin" \    --enable-gpl --enable-libaom --enable-libass --enable-libfdk-aac \    --enable-libfreetype --enable-libmp3lame --enable-libopus \    --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 \    --enable-nonfree --enable-openssl $NVIDIA_FFMPEG_OPTS &&make -j $CPUS &&make install

Verifying GPU Acceleration Support in FFmpeg

Once you've set up FFmpeg with GPU acceleration capabilities, it's a good idea to ensure that it recognizes and can utilize the GPU for tasks like video encoding and decoding. The command below checks for the presence of NVIDIA's NVENC (NVIDIA Video Encoder) in the list of supported codecs within FFmpeg:

ffmpeg -hide_banner -codecs | grep nvenc

If you see the output codecs with the nvenc label (e.g., h264_nvenc, hevc_nvenc, av1_nvenc), it indicates that FFmpeg has been correctly configured to support GPU-accelerated encoding for those specific codecs using NVIDIA's hardware.

The Complete Script

Building upon the various segments we've discussed, here's the consolidated FFmpeg installation script along with the Nvidia GPU installation. This script has been thoroughly tested on Amazon Linux 2023 and Ubuntu 22.04 running on AWS G5g instances. It's not just limited to those; its flexibility is its strength. Whether you're on Debian, Red Hat, CentOS, Fedora, openSUSE, or even the sleek Alpine, this script is crafted to serve both ARM/aarch64 and x86_64 architectures seamlessly. Dive in and experience the universality of our FFmpeg installation script, bringing the power of multimedia to every corner of the Linux world.

https://gist.github.com/bilalmughal/cb89936bc947fa727a8ec66e3ddf768a

Hands-On: Converting H.264 to H.265 with GPU Acceleration

In this section, we'll take a hands-on approach to show you how to convert an H.264 encoded video to H.265 using the power of NVIDIA's GPU acceleration. This way we can test the installation and experience the power of GPU acceleration in action. To perform this conversion, use the following command:

ffmpeg -y -hide_banner -hwaccel cuvid -c:v h264_cuvid -i input.mp4 -vf "scale_cuda=720:480" -c:v hevc_nvenc output.mp4

In this command:

-hwaccel cuvid and -c:v h264_cuvid uses GPU-based decoding of the H.264 video.
-vf "scale_cuda=1920:1080" scales the video using GPU acceleration.
-c:v hevc_nvenc instructs FFmpeg to encode the output video using the H.265 codec with NVIDIA's GPU acceleration.

After executing the command, you'll get output.mp4 an H.265 1080p encoded video.

Conclusion

By integrating FFmpeg with GPU hardware acceleration on AWS or any other infra-provider or architecture, you can optimize costs and reduce processing times, but it also future-proofs your workflow for higher-resolution media processing tasks down the line. By compiling from the source we are also addressing the outdated libraries and generalized package issues, which are not fully optimized for your architecture, as discussed in "How to install FFmpeg on Linux from source" and some aspects are also discussed in the article "Why your AWS Deep learning AMI is Holding you back and how to fix as well. The evolution of multimedia is ceaseless, but with tools like FFmpeg combined with AWS's GPU prowess and cost-effectiveness, you are well-equipped to handle whatever challenges come next. Whether you're a media company looking to scale or a hobbyist aiming for maximum efficiency, GPU-accelerated FFmpeg on AWS is a game-changer.

Bridging Dreams and Reality: How “Builders” Transform Teams

Mirza Bilal — Tue, 19 Sep 2023 08:00:13 GMT

Introduction

In the previous discussion,"Beyond Rockstars: Crafting the Team for Sustainable Success", we established the significance of Rockstars, who drive innovation, and Doers, who ensure steady execution. However, an essential ingredient to building a truly resilient, adaptive, and high-performing team is often overlooked - the "Builder." As promised in my last post, we shall dive deep into understanding this integral role and its transformative impact on team dynamics.

Bridging the Gap: The Builder's Role

Imagine a construction site: The architects are like rockstars, envisioning skyscrapers. Doers are the laborers who lay brick after brick, but who ensures that the architect's dream is realized without overwhelming the workforce? That's the "Builder". A Builder doesn't just construct; they understand, adapt, and create pathways. They fill the gap between the high-level ideas of Rockstars and the pragmatic approach of Doers, ensuring seamless communication and collaboration.

The Unique Qualities of a Builder

Adaptive Vision

Builders may not have the creative spark of Rockstars, but they can visualize the bigger picture and break it into actionable steps for doers to execute effectively. The builder can see the project's visionary and practical aspects and lay out a plan for that. This roadmap aligns with the broader vision and is meticulously structured to guide Doers in their execution.

"You don't have to be a chef to spot a great steak."

Hands-On Approach

Builders aren't just strategic thinkers but possess a pragmatic touch; they are involved in the ground realities of a project. Their strength lies in their ability to focus on specifics, although not to the same extent as Doers. This hands-on involvement ensures they comprehend the intricacies and challenges firsthand, enabling them to guide both Rockstars and Doers. They ensure that Rockstar's vision is achievable and that Doers have a clear, efficient path to realize it. To put it succinctly:

"They may not create the complete piece, but they know all the tools and methods very well."

The Communicator

Imagine two individuals from the same town, living just next door to each other, but they speak entirely different languages. They might have so much in common, so much to share, yet they're worlds apart simply because they can't communicate.

In the world of team dynamics, this is where a Builder shines. A Builder can speak the languages of both Rockstars and Doers, bridging that communicative divide. They ensure that the lofty ideas of Rockstars and the grounded approach of Doers are not just two parallel lines, but intersecting paths towards a common goal. Their unique position allows them to interpret, refine, and convey messages seamlessly across the team spectrum, ensuring that nothing is "lost in translation" and that all members are truly in sync.

Identifying the Builder in Your Team

Often, the Builder might not be overtly obvious, overshadowed by the aura of Rockstars or the diligent presence of Doers. However, recognizing and fostering this role can be the cornerstone to a project's success. Here are some signs and characteristics to help identify the Builder in your team:

Mediator in Discussions: They often mediate during team disagreements, ensuring a middle ground is found.
Clarity in Vision: While they may not be the source of every idea, they can understand and clarify the big picture, breaking it down into actionable steps.
Hands-on Yet Strategic: They aren't just about strategy or execution alone. They are often involved in ground-level tasks but can swiftly shift to a bird's eye view when required.
Versatile Skills: While they might not be the king in any single domain, they possess a broad range of skills and can understand the nuances of multiple disciplines.
People's person: They resonate with the concerns of both Rockstars and Doers, acting as a bridge for communication and understanding.

Recognizing the Builder is essential. Once identified, their skills can be honed, and their presence can significantly enhance your team's synergy.

The Builder's Blueprint

Builders hold a pivotal role in modern team dynamics. Their presence streamlines the workflow, ensuring the entire team operates efficiently. They possess the unique ability to anticipate issues from both Rockstars and Doers, addressing them preemptively and preventing them from becoming more serious ones. This proactive approach cultivates a harmonious environment, fostering a culture of collaboration and mutual respect.

Maximizing Utilization of Builders

For teams to truly harness the strengths of Builders, a few strategies can be employed.

Empowerment

Empower the builders with the authority to make decisions. This autonomy not only garners respect from other team members but also positions them as effective mediators.

In the loop

It's also crucial to maintain frequent check-ins with them. By doing so, teams can ensure that Builders remain aligned with the visionary aspirations of Rockstars while staying grounded to the practical challenges Doers face.

Evolution

In an ever-evolving industry landscape, continuous learning is paramount. Keeping Builders updated with industry trends ensures they remain effective bridges between ideation and execution.

Conclusion

By understanding and implementing these strategies, teams can optimize the contributions of Builders, resulting in a cohesive, efficient, and innovative work environment.
An orchestra, no matter how talented, is incomplete without its conductor. Similarly, a team, no matter how balanced with Rockstars and Doers, is enhanced with the presence of Builders. They ensure that the melody of innovation and rhythm of execution synchronize in harmony.
Do you see a Builder within your team, or may be you recognize some of these traits within yourself? How has their presence (or absence) shaped the dynamics and efficiency of your projects? I'd lover to hear about your experiences.

Look Forward to the Future

Our exploration into team dynamics doesn't end here. In the coming article, we'll discuss optimizing team interactions, nurturing the growth of each role, and the challenges that lie ahead. As always, thank you for your support and engagement. The best teams are yet to be built, and together, we'll discover the blueprint. Stay tuned!

How to install FFmpeg on Linux from Source

Mirza Bilal — Mon, 18 Sep 2023 22:00:36 GMT

Introduction

It was probably 2006 when I first heard about FFmpeg, and I was amazed by its capabilities. FFmpeg is a go-to solution for transcoding, and video manipulation, from trimming to burning the subtitles, adding a watermark, and more. Since FFmpeg was first launched, it has come a long way and has become the industry leader for multimedia workload for desktop apps, web, and especially on the backend.In this article, we will discuss, Why you need to customize FFmpeg to match your requirements? and How you can compile FFmpeg from source along with various essential libraries and codes needed for video processing on different Linux distributions, from Ubuntu to Redhat based systems.

Problem

Installing FFmpeg directly from a Linux distribution's default repositories may seem like a no-brainer, but this method is filled with potential drawbacks. For one, these repositories might not offer the most recent versions of FFmpeg and essential codecs or libraries, possibly exposing users to security risks and preventing them from accessing the latest features. Moreover, these default builds can be burdened with unnecessary codecs and libraries that many users may never use, or you might stranded in a scenario where the package has everything but does not come with the library you need.But the challenges don't stop here; in today's diverse device ecosystem, which spans across ARM and x86_64 architectures, a one-size-fits-all build from the repository might not be optimized for a specific device with a specific instruction set. This is especially concerning for multimedia tasks that demand real-time processing, and you want to make the most out of the hardware you have.Furthermore, for young developers or those new to the Linux landscape, crafting a custom FFmpeg build tailored to their requirements can be a daunting experience. The sea of options, configurations, and dependencies can be overwhelming, leading them to settle for suboptimal, generic builds rather than extracting the true potential of FFmpeg.

Why Compile from Source?

To overcome discussed problems for newbies to experts, the importance of comprehensive guides and resources to bridge the knowledge gap and empower the next generation of developers to confidently customize their multimedia tools. Following are the notable advantages of using source code as the starting point.

Customization: By compiling from the source, you can choose the exact features and codecs you want, leading to a personalized FFmpeg build tailored to your needs.
Smaller Footprint: When you include only the components you need, the resultant build will often be leaner and require less disk space.
Access to the Latest Code: Official channels might not always have the latest version of FFmpeg. Compiling from source ensures you're working with the most recent codebase.
Performance Optimizations: Building from source code can enable certain optimizations specific to your machine's architecture and instruction set, potentially enhancing performance.
Greater Learning Opportunity: Compiling software from a source provides a deeper understanding of the software and its dependencies.

Installing Dependencies

Before we jump into the installation process, it's essential to set up our system with the necessary dependencies.

On Ubuntu:

sudo apt updatesudo apt install -y build-essential yasm cmake libtool libc6 libc6-dev unzip wget

On RedHat-based distributions:

sudo yum groupinstall "Development Tools"sudo yum install -y yasm cmake libtool unzip wget

Installing dependencies and codes:

Once your build environment is set up, the next critical step is to install the various codecs and libraries that FFmpeg relies upon or that you wish to utilize for specific multimedia tasks. Remember, one of the key benefits of building from source is the ability to customize your FFmpeg installation, ensuring it's streamlined to your needs; here you can pick and drop or even add codecs or libraries that you wish to use.

In the final script, we have provided a comprehensive list of dependencies and libraries, ensuring that you have a broad spectrum of multimedia capabilities at your fingertips.

Installing FFmpeg

You can obtain the latest source code from FFmpeg's official website or use git:

git clone https://git.ffmpeg.org/ffmpeg.git ffmpegcd ffmpeg

Configuration

This step is crucial. The ./configure script allows you to select which features and codecs to include. For a basic setup, you can run:

./configure \            --prefix="$USR_LOCAL_PREFIX" \            --disable-static --enable-shared \            --extra-cflags="-I$USR_LOCAL_PREFIX/include $NVIDIA_CFLAGS" \            --extra-ldflags="-L$USR_LOCAL_PREFIX/lib $NVIDIA_LDFLAGS" \            --extra-libs='-lpthread -lm' \            --bindir="$USR_LOCAL_PREFIX/bin" \            --enable-gpl --enable-libaom --enable-libass \            --enable-libfdk-aac --enable-libfreetype --enable-libmp3lame \            --enable-libopus --enable-libvorbis --enable-libvpx \            --enable-libx264 --enable-libx265 --enable-nonfree \            --enable-openssl

However, for a more customized setup, use flags like --enable-codec, --disable-codec, --enable-libx264, etc. Check ./configure --help for a full list of options.

Compilation & Installation

Once configured, compile and install FFmpeg here:

make -j$(nproc)sudo make install

This will use all your CPU cores for the compilation (-j$(nproc)) and then install the software.

Verification

To verify your FFmpeg installation:

ffmpeg -version

Wrap up & the magic script:

By now, you should have a basic understanding of how to customize FFmpeg. If so, that was our goal. If you have any confusion, leave a comment and I'll be happy to discuss. To streamline everything and every step mentioned here, there's a script for you to simply run, and it will handle the entire process.

https://gist.github.com/bilalmughal/83de56f470bf5b91a7c4424cce4071ac

Conclusion

Now that we've gone through the process, I'm curious: Where do you plan to use your custom-tailored FFmpeg? Are you working on an innovative multimedia project, developing a web application, or just experimenting with video processing for personal use? FFmpeg's versatility excels in a multitude of applications, from large-scale video platforms to hobbyist projects. Share your plans or current FFmpeg projects in the comments below. Let's learn from one another and discover even more unique use cases for this powerful tool. Remember, the world of multimedia is vast, and with FFmpeg, the possibilities are endless.

The Dark Side of High-Tech Success: Addressing Mental Well-being in Tech

Mirza Bilal — Wed, 13 Sep 2023 10:00:11 GMT

Introduction

In an era of rapid technological advancement, Software engineers and other IT professionals are in high demand. They might be earning good salaries but these lucrative financial opportunities do not come for free, and a hidden cost often goes unnoticed - the toll on their mental health. IT Professionals are facing mental health issues because of the intense work stress in this ever-demanding industry. In this article, we will build a case around the responsibilities and importance of the role of industry leaders and organizational leadership in addressing concerns related to the so-far neglected human side of the industry.

The Wealth-Well-being Paradox

The software developers and IT professionals are currently amongst the highly compensated in the job market. However, the seemingly attractive salaries can often obscure the challenging work environment, characterized by stress, long days and looming deadlines. This paradoxical situation raises significant concerns about the mental well-being of those working in this industry.

Mental Health Issues in the Tech Industry

The tech industry is notorious for its work culture that promotes continuous productivity and sacrifices personal well-being for the sake of meeting unrealistic targets at times. As a result, software developers and IT experts face a host of mental health challenges:

Burnout: This physical and emotional exhaustion due to a demanding work environment leads to burnout. Which affects an individual's productivity, feelings of detachment, and diminished sense of personal achievement.
Isolation: Long hours and remote work arrangements can lead to feelings of isolation, which can aggravate mental health concerns due to the lack of social interaction and result in loneliness and deteriorating mental health.
Anxiety and Depression: The constant desire to perform at their best and the fear of being surpassed by peers weighed heavily on their minds. The stigma of being perceived as weak or incapable prevents employees from seeking assistance, exacerbating the condition.
Imposter Syndrome: A lot of tech professionals feel like they're not good enough or are under-achieving, even when they succeed. This feeling of "can do more", increase anxiety and negatively impact self-esteem.

The Role of Industry Leaders

Industry leaders and organizational leadership have a crucial role to play in addressing the mental health crisis amongst their subordinate and set an example for the industry to follow:

Championing Work-Life Balance: It's essential for leaders to emphasize and devise a balanced work-life routine. By motivating staff to take regular breaks, use their holidays, and establish clear boundaries between work and personal time, they can significantly alleviate stress amongst the team.
Work-Life Boundaries in Remote Settings: While offering flexible work arrangements like remote work and adaptable hours can help ease stress amongst professionals from tech sector, it's vital to recognize and address the blurred lines between work and personal time that often come with remote work. Leaders should emphasis the importance of identifying and drawing clear boundaries to maintain a healthy work-life balance even when working from home.
De-stigmatize Mental Health: Industry leaders should actively work to de-stigmatize mental health issues. Creating an open and supportive environment where employees feel comfortable discussing their struggles and seeking help is essential, and talking about mental health should not be considered a taboo.
Mandatory Mental Health Sessions: Organizations should prioritize and invest in mental health support programs. Offering counseling services, stress management workshops, and other mental health resources is crucial. Employees should be required and even rewarded to attend these sessions to ensure they're equipped to handle the pressures of their roles.
Manage Workload: Leadership should assess and manage workloads to prevent burnout and avoid setting unrealistic goals and targets. Distributing tasks evenly, setting realistic deadlines, and avoiding excessive overtime can contribute to a healthier work environment.

Conclusion:

The software development and IT sectors are lucrative, but they're also tough on mental well-being. We can't ignore the growing mental health strain that professionals in these areas are grappling with. It's essential that industry bigwigs and company leaders genuinely listen and act. They need to foster a caring work environment, offer solid mental health support, and tear down the stigma around discussing mental challenges. Recognizing and acting on these issues is the only way to make sure our tech gurus aren't just well-paid, but also mentally healthy and happy. After all, true success in the tech world should strike a balance between a good paycheck and peace of mind.

"Almost everything will work again if you unplug it for a few minutes, including you." - Anne Lamott

Beyond Rockstars: Crafting the Team for Sustainable Success

Mirza Bilal — Mon, 11 Sep 2023 08:00:09 GMT

Introduction

If you possess prior experience working in industries such as technology or any other field, it is probable that you have encountered certain colloquialisms, such as "Ninjas," "Wizards," or "Rockstars." These terms are used to identify exceptionalism. While having a team of such talented individuals can offer significant advantages and sound great on paper, building a team entirely of "Rockstars" in the real world may be counterproductive. I have worked for around 18 years, occupying various roles from Junior Software Developer to VP of Engineering. Based on my good and bad experiences, I will explore why an ideal team isn't made solely of Rockstars or just Doers but goes beyond individual members' technical abilities. An efficient team requires a balanced mix of precisely calibrated and functioning components, much like a well-tuned automobile engine.

The Rockstars Team: High-Octane But Unstable

Imagine a car with a combustion engine made entirely of the most potent combustion cylinders. The concept sounds spectacular on paper-each cylinder, a Rockstar in its own right, promises to deliver unparalleled performance. However, in actuality, this engine would soon turn into a nightmare. The extreme power generated would be too much for any vehicle to handle, leading to an unstable, inefficient, and ultimately unworkable machine. A team full of such remarkably capable individuals may generate many brilliant ideas. Still, the risk of conflicting egos, misaligned objectives, and lack of focus on routine but necessary tasks could derail the team's success.

Drawbacks of a Rockstar-Only Team

Conflicting Ideas: When everyone is a visionary, aligning the team to a common goal becomes challenging.
Ego Clashes: Rockstars are accustomed to being in the spotlight, and having multiple such personalities in a team can lead to conflicts.
Neglected Fundamentals: While everyone pursues the next big thing, routine but essential tasks that keep the project moving may be overlooked.

The Doers Team: Stable But Stagnant

Let's consider a car whose wheels are its only components. While wheels are essential for any vehicle to move, a car made only of wheels won't have the power or functionality to go anywhere fast or effectively. Likewise, a team of Doers might excel in day-to-day tasks and bring stability to a project. Yet, it may lack the creative spark and drive to innovate and evolve, which is essential for taking the next step in product growth.

Drawbacks of a Doer-Only Team

Lack of Innovation: Doers are excellent at following instructions but have little to no will or capacity to innovate.
Dependency on Guidance: This type of team tends to be passive and waits for instructions instead of taking the initiative to solve problems independently.
Resistance to Change: Such teams can resist adopting new methods or technologies, ultimately hindering progress.

The Perfect Recipe-Mixing Rockstars and Doers

A balanced team comprises diverse skill sets, talents, and personalities. A perfectly designed car needs cylinders for power and wheels for movement. The perfect team needs both exceptional performers and diligent workers to function effectively.

Components of a Mixed Team

Domain-specific Rockstars: Depending on the size and nature of your organization, find a Rockstar or two for each crucial domain-be it front-end development, back-end architecture, or data science.
Reliable Doers: Populate the rest of the team with Doers who excel in execution. They will be the ones to implement the vision set forth by the Rockstars.
Interdisciplinary Understanding: Encourage mutual respect and understanding. Rockstars should appreciate the execution skills of Doers, and Doers should respect the vision of Rockstars.

Managing the Mix-The Role of Leadership

As a leader, your role is to assemble this balanced team and ensure its components work harmoniously. Just like a car's engine and wheels need a skilled driver at the helm, a mixed team needs strong leadership to steer it towards its objectives.

Setting Expectations: Define roles and responsibilities and ensure everyone understands their specific and collective goals.
Encouraging Collaboration: Facilitate open communication between Rockstars and Doers. Promote an environment where each can learn from the other.
Monitoring and Tuning: Periodically review team performance. If the team is leaning too much toward innovation and venturing into impractical or unrealistic ideas at the expense of execution, or vice versa, recalibrate.

Conclusion

In the quest for building a high-performing team, it's easy to be seduced by the allure of stacking your team with exceptional individuals. While the potential for innovation and high performance is tempting, it's crucial to remember that too much of a good thing can be harmful. Conversely, a team comprising executors may get the job done but could struggle with stagnation. Therefore, like a well-engineered car, a balanced mix of Rockstars for driving innovation and Doers for steady execution is often the most effective route to long-term success.

What's Next?

So far, we've discussed the essential roles of Rockstars and Doers in a successful team. But what if there was another element-one that could make all the difference? Is your team truly complete? In our upcoming discussion, we will delve deeper into the composition of an ideal team and the importance of another element that will redefine how you approach team building.

Thank you for taking the time to read this article. Your interest and support are highly appreciated. I am eager to share more insights with you in the future. Please stay tuned for more updates.

Why Your AWS Deep Learning AMI is Holding You Back and How to Fix

Mirza Bilal — Sat, 09 Sep 2023 16:37:14 GMT

If you're exploring your options for Deep Learning on AWS, you've likely considered using Deep Learning AMIs (Amazon Machine Images) to simplify your setup. Although pre-configured environments can be a good starting point and look like a no-brainer, but they have several limitations that will haunt you in the long run.

The Bloatware Problem

Deep Learning Amazon Machine Images (DLAMI) comes pre-installed with a plethora of applications, frameworks, and libraries. You will not need many of them in your production environment and sometimes not even in development.

Outdated Drivers and Toolkits

Your favorite deep learning framework released a new version that offers a valuable addition to your application. You are eager to start using it but unfortunately discover that your toolkit and drivers are outdated, dampening your enthusiasm. Now you're locked into using older drivers, toolkits, and older framework and misses out on your favorite new feature of Deep Learning, which you were so excited about.

Dependency Hell

Installing required modules or libraries for your application can be challenging with these DLAMIs. You may encounter an issue where the module you are attempting to install requires version 2 of XYZ, but you only have version 1.5 installed. This issue should be resolved by simply updating XYZ. However, upon attempting that, you may find that another application ABC or library requires XYZ. When you try to remove ABC, which your application does not neet, but to your surprise, yet another application is dependent on it, and this chain of dependencies seems never-ending.

Limited Architecture Support

Suppose you want to leverage cost-effective instances like g5g.xlarge for deep learning inferences. In that case, you're out of luck because no Deep Learning AMIs support them, or the only solution available has an older OS or outdated build tools. Especially for ARM-based instances, your choices are minimal. For Example, the only DLAMI available for the mentioned instance family is NVIDIA DLAMI, built on top of older version of Ubuntu 20.04.

Solution

Frustrated with these limitations ourselves, We've developed an automated, customizable script that can set up a high-performing deep learning environment on AWS EC2. This script downloads the latest Nvidia Drivers, CUDA 12.2, and cuDNN library. It uses the latest Amazon Linux 2023 as its base AMI. It offers several advantages, including support for the latest Linux Kernel 6 and more recent versions of GCC and other build tools and utilities. This script clones PyTorch and compiles it from the source to ensure you have the latest CUDA device support.

Performance and Cost Benefits

The customization allows for a lean, performance-optimized setup with a minimal footprint. As in this script, PyTorch is compiled from source after cloning it from the official repository. It offers advantages like hardware-specific optimization and the use of up-to-date code. Hence resulting in better performance and security as compared to pre-built Pytorch module. And if your compute tasks can tolerate interruptions or you can design your application with failover tolerance in mind, you can take advantage of spot instances, which are incredibly cost-effective at as low as $0.152 per hour these days.

Want the Full Step-By-Step Guide? Dive In Here!

For a comprehensive guide addressing these problems and access to this game-changing script, check out our complete guide at Deep Learning on AWS Graviton2, NVIDIA Tensor T4G for as Low as Free with CUDA 12.2.

Deep Learning with “AWS Graviton2 + NVIDIA Tensor T4G” for as low as free* with CUDA 12.2

Mirza Bilal — Mon, 04 Sep 2023 08:02:00 GMT

* The as low as free tagline is based on *g5g.xlarge* spot instance rates, which have been as low as $0.1519/hr.

Introduction

The world we live in today heavily relies on artificial intelligence. From vacuum bots to sales support, from self-driving cars to disease detection, from finding the content you want to consume to translating from a foreign language to your native one. AI is behind every great product out there, and the need for an efficient, cost-effective, and scalable deep learning architecture has never been more critical.

The G5g instances powered by Amazons own Graviton2 processor and also feature NVIDIA T4G Tensor Core GPUs are a cost-effective alternative to Intels and AMDs powered instances for deploying deep learning applications.

The Dilemma

AWS offers robust, powerful, cost-effective architecture for running artificial intelligence and deep learning tasks. One of the advantages is the option to use spot instances, which are far more cost-effective at times and up to 70% cheaper than on-demand instances.

For example, the spot pricing history for the g5g.xlarge instance in various us-east zones ranged from $0.1720 to $0.1519 per hour for the past three months. These rates are tempting, but at the time of writing, no official Amazon Linux 2023 Deep learning AMI is available for the Amazon G5g instances family. Setting up the environment can be cumbersome: finding drivers, the correct dev toolchain, and a pre-compiled PyTorch module supporting the latest DL toolkit.

Spot price history for g5g.xlarge for the last three months.

Navigating the ChallengeA How-To Guide

This aims to bridge the gap by offering comprehensive step-by-step instructions suitable for newcomers and seasoned data scientists. The goal is to enable you to leverage these state-of-the-art technologies at a meager cost without the hassle of finding the right driver and packages for the G5g family. Eventually, we will compile all the individual steps into a single script that will further streamline the process.

1. Launching an Instance

For setting up an instance, well use g5g.4xlarge instance. The idea behind using a more powerful instance is to accelerate compilation time. We will launch the build instance with the AWS Command Line Interface (aws cli).

First, set the following environment variables:

REGION: Specifies the AWS region, e.g., us-east-1.
SECURITY_GROUPS: Your security group ID(s).
KEY_PAIR: The name of your SSH key pair.
SUBNET: The ID of your subnet.

If you have any confusion about these variables. You can refer to the security group, keypair, and subnets documentation.

Once you have these values, you can set these variables like this.

export REGION='us-east-1'
export SECURITY_GROUPS='YourFirstSecurityGroupIdsHere'
export KEY_PAIR='YourSSHKeyNameHere'
export SUBNET='YourSubnetHere'

Next, we need to find the latest Amazon Linux 2023 AMI ID so you will get the latest AMI every time you run this script. The following command will fetch the AMI ID and store it as AMI_ID.

Lets launch the instance using the AMI ID we retrieved earlier by executing:

aws ec2 run-instances \--image-id $AMI_ID \--instance-type g5g.4xlarge \--key-name $KEY_PAIR \--subnet-id $SUBNET \--security-group-ids $SECURITY_GROUPS \--region $REGION \--block-device-mappings '[{"DeviceName":"/dev/xvda","Ebs":{"VolumeSize":20,"VolumeType":"gp3"}}]' \--tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=AMI-Builder}]'

This command initiates a g5g.4xlarge instance with the Latest Amazon Linux 2023 AMI ID. It also configures the instance to use the specified security groups, key pair, and subnet we provided in environment variables. Weve also attached 20 GB of storage to the root device for downloading different libraries and PyTorch compilation.

2. Installing System Updates and Required Packages

Setting up any machine, be it local or in the cloud, it is always a good practice to keep it updated. This part will install all the updates and tools used in compilation or running AI tasks.
But before going to Gung Ho, We recommend taking an overview of the guide first and checking the complete script at the end of this tutorial, which should save you from lots of trouble.

First, lets define some essential environment variables.

CUDA_HOME=/usr/local/cudaHOME_DIR=/home/ec2-user

Now, well create a function called install_utils that carries out a series of tasks.

install_utils() {    # Update all system packages to their latest versions    dnf -y update    # Install development tools, which include compilers and other utilities    dnf -y groupinstall "Development Tools"    # Install the packages that are specifically required for our setup    dnf install -y openssl-devel cmake3 rust cargo    dnf install -y amazon-efs-utils htop iotop yasm nasm jq python3-pip python-devel cronie cronie-anacron    # Add necessary paths to the .bashrc file    echo "PATH=$CUDA_HOME/bin:\$PATH" | sudo tee -a $HOME_DIR/.bashrc    echo "LD_LIBRARY_PATH=$CUDA_HOME/lib64:\$LD_LIBRARY_PATH" | sudo tee -a $HOME_DIR/.bashrc    # Configure shared libraries    echo "/usr/local/lib" | sudo tee /etc/ld.so.conf.d/usr-local-lib.conf    echo "/usr/local/lib64" | sudo tee -a /etc/ld.so.conf.d/usr-local-lib.conf}

By running this install_utils function, you will have an updated OS and development tools needed in later steps.

3. Install Latest NVIDIA Drivers, CUDA 12.2 Toolkit, and Cuda Deep Neural Network library:

In this step, we will install the NVIDIA GPU driver, Latest CUDA 12.2 toolkit, and CUDA Deep Neural Network (CuDNN) libraries. This part uses the latest driver and toolkit released on August 29, 2023. If you read it later, you can update the URLs for the latest driver and libraries; everything else will be the same. Steps to find the latest driver, toolkit, and library are also mentioned below.

Install NVIDIA GPU Driver

To download and install the NVIDIA Tesla T4G driver, execute

wget https://us.download.nvidia.com/tesla/535.104.05/NVIDIA-Linux-aarch64-535.104.05.run
sh NVIDIA-Linux-aarch64-535.104.05.run --disable-nouveau --silent

If everything goes smooth; you should have a working NVIDIA driver by now, which can be checked by running the NVIDIA system management interface command nvidia-smi in the terminal.

nvidia-smiNVIDIA System Management Interface

The latest drivers for NVIDIA Tesla T4G can be found here by selecting the following options.

For guidance on selecting the correct driver, refer to the options above.

Install CUDA Toolkit

The next step involves downloading and installing the CUDA 12.2 toolkit. which can be done by running following bash commands

wget https://developer.download.nvidia.com/compute/cuda/12.2.2/local_installers/cuda_12.2.2_535.104.05_linux_sbsa.runsh cuda_12.2.2_535.104.05_linux_sbsa.run --silent --override \--toolkit --samples --toolkitpath=/usr/local/cuda-12.2 \--samplespath=$CUDA_HOME --no-opengl-libs

To find the latest version, visit NVIDIAs developer page and use the following selection.

Follow the options above to choose the right CUDA Toolkit for your setup.

Install NVIDIA CUDA Deep Neural Network library (cuDNN):

Lastly, well install the CuDNN library for Server Base System Architecture (SBSA).

wget https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-sbsa/cudnn-linux-sbsa-8.9.4.25_cuda12-archive.tar.xztar -xf cudnn-linux-sbsa-8.9.4.25_cuda12-archive.tar.xzcp -P cudnn-linux-sbsa-8.9.4.25_cuda12-archive/include/* $CUDA_HOME/include/cp -P cudnn-linux-sbsa-8.9.4.25_cuda12-archive/lib/* $CUDA_HOME/lib64/chmod a+r $CUDA_HOME/lib64/*

Latest cuDNN can be downloaded from here.

List of available cuDNN sbsa libraries for CUDA 11 and CUDA 12.

By combining all three, we will have the following function, which we will use in the final script as well.

setup_gpu() {    wget https://us.download.nvidia.com/tesla/535.104.05/NVIDIA-Linux-aarch64-535.104.05.run    sh NVIDIA-Linux-aarch64-535.104.05.run --disable-nouveau --silent    wget https://developer.download.nvidia.com/compute/cuda/12.2.2/local_installers/cuda_12.2.2_535.104.05_linux_sbsa.run    sh cuda_12.2.2_535.104.05_linux_sbsa.run --silent --override --toolkit --samples --toolkitpath=/usr/local/cuda-12.2 --samplespath=$CUDA_HOME --no-opengl-libs    wget https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-sbsa/cudnn-linux-sbsa-8.9.4.25_cuda12-archive.tar.xz    tar -xf cudnn-linux-sbsa-8.9.4.25_cuda12-archive.tar.xz    cp -P cudnn-linux-sbsa-8.9.4.25_cuda12-archive/include/* $CUDA_HOME/include/    cp -P cudnn-linux-sbsa-8.9.4.25_cuda12-archive/lib/* $CUDA_HOME/lib64/    chmod a+r $CUDA_HOME/lib64/*    ldconfig}

4. Compiling and Installing CUDA 12.2 enabled PyTorch

Next we will compile and install PyTotch from source with the latest CUDA support for ARM-based ec2 instances, along with all the necessary Python packages.

# Download and install ccache for faster compilationwget https://github.com/ccache/ccache/releases/download/v4.8.3/ccache-4.8.3.tar.xztar -xf ccache-4.8.3.tar.xzpushd ccache-4.8.3cmake .make -j $CPUSmake installpopd# Install NumPy, a dependency for PyTorchdnf install -y numpy# Install Python typing extensions for better type-checkingsudo -u ec2-user pip3 install typing-extensions# Clone PyTorch repository and install from sourcegit clone --recursive https://github.com/pytorch/pytorch.gitpushd pytorchpython3 setup.py installpopd# Refresh the dynamic linker run-time bindingsldconfig# Install additional Python libraries for PyTorchsudo -u ec2-user pip3 install sympy filelock fsspec networkx

5. Test Your Installation

After youve gone through the installation process, youll want to ensure that PyTorch and CUDA are working as expected. Run the following command to test the setup.

python3 -c "import torch; print('Using device: ', torch.device('cuda' if torch.cuda.is_available() else 'cpu'))";

If the device returns cuda, then congratulations, youve successfully installed PyTorch with latest CUDA support!

Complete script for effortless Setup 🪄

Ready for some magic? Before getting started, ensure that your AWS CLI is properly configured. If you havent done this, refer to the AWS documentation to get up to speed. You will also need to gather the IDs for your security group and subnet and the name of your key pair.

Once you have completed the necessary preparations, run the provided script. This will launch a g5g.4xlarge instance pre-loaded with user data, which initiates the installation process upon launch. The entire setup process should take approximately an hour to complete. However, you can monitor the progress as it goes. To begin, SSH into your newly launched instance.

ssh -i "your-key-pair.pem" ec2-user@your-instance-ip

Then, run the following command to monitor the installation in real-time:

tail -f /home/ec2-user/install.log

Complete script can be downloaded from Github and goes as follows.

https://gist.github.com/bilalmughal/0500f27454a508bd3552fcf03e3adadb

After everything is done you should get the following greetings.

Using AWS Management Console

You can also use AWS Management console for this process as well. All you need to to do is Launch an instance from ec2 console and then select the right AMI, Architecture and instance type, along with other networking and security configurations you will do for launching any other instance. Dont forget to increase the volume size to 20 GB as well.

After selecting the right AMI, architecture, instance type, storage and other options, configure your instances User Data by adding custom setup commands that will run during launch.
To add User Data, go to the Advanced Details section during the Configure Instance stage, input the desired text or file, and paste the script from the GitHub repository between the EOF markers into the User Data text area.

Remember, this User Data script is what automates your deep learning setup, so dont skip this step!

Wrapping Up

And there you have it! A one-stop solution to make your deep learning setup on an Amazon EC2 Graviton2 ARM-based instance much easier. After following these steps, you can create an AMI (Amazon Machine Image) and use it for deep-learning tasks. You should also try out spot instances for your interruptible artificial intelligence inferences, as it could save you a lot on operational costs!
With this guide, we made configuration and setup hassle-free so you can dive straight into the work that matters most to you. If you find this script as helpful as we do, we would love to hear about the exciting projects its helping you accomplish. Feel free to share your success stories and any ingenious modifications youve made. Happy coding!

💡 Pro Tip: Max Power, Min PriceThe G5G Magic Equation!

Did you know the g5g.xlarge, g5g.2xlarge, g5g.4xlarge and g5g.8xlarge have the same GPU power? If increasing the CPU power or adding more memory doesnt significantly improve performance for your application, you can stick with the g5g.xlarge to save some money!

G5g Instance specification details.

About the Author and Our Journey at Jumpshare

I have been the part of tech industry for 18 years, serving different roles and devising different engineering solutions throughout. The ever-changing landscape of tech world and challenges it bring excites me, specially in the area of cloud computing and machine learning.

At Jumpshare, where I hold the position of VP of Engineering, we have successfully turned these challenges into opportunities. Were passionate about implementing the techniques like this to make our machine learning inference tasks more cost-effective. By leveraging the power of AWS Graviton2 and NVIDIA Tensor T4G instances, weve been able to drastically reduce operational costs without compromising performance.

This guide is yet another effort to express our commitment of sharing our experience and insights with the community as we strongly believe in democratizing technology and saving costs on infrastructure can unlock doors to innovation.

Were always open to hearing about your own experiences and improvements on the journey towards cost-effective, high-performance deep learning.

This article was originally published on Jumpshare.com

Resources

https://jumpshare.com/blog/deep-learning-on-aws-graviton2-nvidia-tensor-t4g-for-as-low-as-free-with-cuda-12-2/
https://www.nvidia.com/Download/Find.aspx
https://developer.nvidia.com/cuda-toolkit
https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-sbsa/
https://instances.vantage.sh/?selected=g5g..x|g5g.x
https://docs.aws.amazon.com/vpc/latest/userguide/security-groups.html
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html
https://docs.aws.amazon.com/vpc/latest/userguide/configure-subnets.html
https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html

Mirza Bilal's blog

The Downside of Vertical Scaling GPU Instances

Understanding GPU Workloads

The Basics of GPU Scaling

Limitations of Vertical Scaling for GPUs

Practical Implications

An alternative to Vertical Scaling

Conclusion

More From the Author

Scaling the Cloud: Vertical and Horizontal Scaling Strategies

Vertical Scaling

Horizontal Scaling

Conclusion

More From the Author

Why Businesses Hesitate to Employ Freelancers: Unveiling the Reasons

The Guiding Principles

Freelancers and Challenges

Approach to Problem Solving 🧩

Collaboration Struggles 🖇

No Strings Attached Mindset 🔄

Short-Sightedness

The Performance Metrics 📈

Conclusion

More From the Author

How To Enable Hardware Acceleration on Chrome, Chromium & Puppeteer on AWS in Headless mode

Hardware Acceleration in Browser Rendering

DLAMI from Marketplace?

The Fresh Start

Launching an EC2 Instance

Ubuntu Upgrade

Nvidia GPU Driver Installation

Configuring the Startup Service

Testing the Installation

Chromium

Installation:

Testing:

Google Chrome

Installation:

Testing:

Working with Puppeteer

Note from the Creator

Conclusion

Resources

CPU vs GPU for Video Transcoding: Challenging the Cost-Speed Myth

Introduction

The Selection of Instances

The Process

Benchmark Details

The Findings

The Winner?

In Conclusion

Building a Robust Backend in Just 30 Minutes with Outerbase

Introduction

What is RateMyCraft?

Unraveling Outerbase: The Future of Backend Development

Setting Up a Database with Outerbase

The Outerbase commands magic and API endpoints

1. New Command

2. Javascript Command Node

3. SQL Query Command Node

Other endpoints

Retrieving Reviews

Searching Service Providers

Fetching Provider Details

Adding a Review

Adding a Service Provider

Update Service Provider

Deleting a Service Provider

Where is the Dashboard?

The Final Product

The Pros and Cons of Outerbase

Pros:

Cons:

The Future?

Conclusion

Resources

How to install FFmpeg with Harware Accelaration on AWS

Introduction

Problem

The Remedy