Saturday, August 10, 2019

Autoscaling: Azure HDInsight Cluster


Introduction

Scale out/in of HDInsight Spark cluster is required due to variable workload been executed at specific intervals. This strategy help to optimize the cost and Azure resources. Here in this article we will discuss about various options available to perform autoscaling 


From Azure Portal

Scheduled based HDI Cluster Scale out in is possible from azure portal itself – from HDI settings select “Cluster Size”










The HDI Spark now has “Enable Autoscale” feature available in preview mode – there are 2 options available under this; 1) Load based, and 2) Schedule Based

Load-Based


Simply saying that it will do autoscaling of the cluster based on amount of CPU cores and memory required to complete the pending jobs. If CPU cores and memory required is more than the available CPU core and memory then it will trigger autoscale (up/down) accordingly.

Schedule-Based

With Schedule-based we need to configure the Autoscale schedule as displayed:




One the schedule is configured the – the autoscaling will happen as specified:


Conclusion

This example showcase one particular way to autoscale the HDInsight cluster from Azure portal itself. There are various other custom approaches to achieve similar benefits using Azure CLI or PowerShell, those options we will discuss in next post. 

Wednesday, August 7, 2019

Docker - mounting an external disk

This short tutorial, In this I will brief about, how we can manage to attach external drive to the docker container. I stumbled on an issue wherein I have already attached and mounted and external drive to the my AWS Linux VM instance. This external drive is supposed to hold all my data (kind of data disk/drive) so that I can keep my OS disk isolated.

The actual issue was that when I tried to run the docker image in the container with current using:
sudo docker run -i -t --rm -v $(pwd):/remote:rw [image_name] /bin/bash
This error was thrown:
docker: Error response from daemon: error while creating mount source path '/remotedata/ws': mkdir /remote: read-only file system
Here, '/data/ws/' is a folder location (with drive mounted on: /remotedata) on my mounted external drive and '/remote' is a path mapped for reference in the container.

Soon I realized that the issue and found that the problem was because I mounted the external disk to different mount point, e.g., /remotedata in my case. So to solve this problem I mounted the external drive again under the hood of  '/mnt/remotedata'. After that, I also had to change the directory permissions to "766" using chmod and lastly made current user to default owner for the folder '/mnt/remotedata'.

Hence, once again running the command as:
sudo docker run -i -t --rm -v $(pwd):/remote:rw [image_name] /bin/bash
Successfully started the container as desired. Here, $(pwd) is '/mnt/remotedata'.

Just to conclude here-
The issue: "docker: Error response from daemon: error while creating mount source path '/data/ws': mkdir /remote: read-only file system" was resolved by mounting the external disk under'/mnt' hood and by modifying the permissions and ownership.

Autoscaling: Azure HDInsight Cluster