Saturday, August 10, 2019

Autoscaling: Azure HDInsight Cluster


Introduction

Scale out/in of HDInsight Spark cluster is required due to variable workload been executed at specific intervals. This strategy help to optimize the cost and Azure resources. Here in this article we will discuss about various options available to perform autoscaling 


From Azure Portal

Scheduled based HDI Cluster Scale out in is possible from azure portal itself – from HDI settings select “Cluster Size”










The HDI Spark now has “Enable Autoscale” feature available in preview mode – there are 2 options available under this; 1) Load based, and 2) Schedule Based

Load-Based


Simply saying that it will do autoscaling of the cluster based on amount of CPU cores and memory required to complete the pending jobs. If CPU cores and memory required is more than the available CPU core and memory then it will trigger autoscale (up/down) accordingly.

Schedule-Based

With Schedule-based we need to configure the Autoscale schedule as displayed:




One the schedule is configured the – the autoscaling will happen as specified:


Conclusion

This example showcase one particular way to autoscale the HDInsight cluster from Azure portal itself. There are various other custom approaches to achieve similar benefits using Azure CLI or PowerShell, those options we will discuss in next post. 

Wednesday, August 7, 2019

Docker - mounting an external disk

This short tutorial, In this I will brief about, how we can manage to attach external drive to the docker container. I stumbled on an issue wherein I have already attached and mounted and external drive to the my AWS Linux VM instance. This external drive is supposed to hold all my data (kind of data disk/drive) so that I can keep my OS disk isolated.

The actual issue was that when I tried to run the docker image in the container with current using:
sudo docker run -i -t --rm -v $(pwd):/remote:rw [image_name] /bin/bash
This error was thrown:
docker: Error response from daemon: error while creating mount source path '/remotedata/ws': mkdir /remote: read-only file system
Here, '/data/ws/' is a folder location (with drive mounted on: /remotedata) on my mounted external drive and '/remote' is a path mapped for reference in the container.

Soon I realized that the issue and found that the problem was because I mounted the external disk to different mount point, e.g., /remotedata in my case. So to solve this problem I mounted the external drive again under the hood of  '/mnt/remotedata'. After that, I also had to change the directory permissions to "766" using chmod and lastly made current user to default owner for the folder '/mnt/remotedata'.

Hence, once again running the command as:
sudo docker run -i -t --rm -v $(pwd):/remote:rw [image_name] /bin/bash
Successfully started the container as desired. Here, $(pwd) is '/mnt/remotedata'.

Just to conclude here-
The issue: "docker: Error response from daemon: error while creating mount source path '/data/ws': mkdir /remote: read-only file system" was resolved by mounting the external disk under'/mnt' hood and by modifying the permissions and ownership.

Wednesday, April 24, 2019

Azure Service Fabric & Azure Kubernetes Service: Comparative Analysis

Hello Folks! in today's blog I will highlight the key main differences between Azure's (of Microsoft's) two widely popular container Orchestration services. Please remember that these differences are analyzed as per today's date. As we know that services on cloud keep evolving and points described here may not be relevant after sometime.
So let's start rolling ---- 
Those who don't know about Azure Service Fabric (ASF) and Azure Kubernetes Service (AKS), I strongly recommend them to look into starting points to get to know more about these services at high level.


Azure Service Fabric from the Microsoft's documentation:

Trust a proven platform for mission-critical applications
Focus on building applications and business logic, and let Azure solve the hard distributed systems problems such as reliability, scalability, management, and latency. Service Fabric is an open source project and it powers core Azure infrastructure as well as other Microsoft services such as Skype for Business, Intune, Azure Event Hubs, Azure Data Factory, Azure Cosmos DB, Azure SQL Database, Dynamics 365, and Cortana. Designed to deliver highly available and durable services at cloud-scale, Azure Service Fabric intrinsically understands the available infrastructure and resource needs of applications, enabling automatic scale, rolling upgrades, and self-healing from faults when they occur.

Choose from a variety of productive programming models and languages including .NET Core 2.0, C#, and Java to build your microservice and container-based applications.



Azure Kubernetes Service from the Microsoft's documentation:

Deploy and manage Kubernetes with ease, scale and run applications with confidence, secure your Kubernetes environment, Accelerate containerized application development.


Brief Overview

Azure Service Fabric - 

  • High affinity with Microsoft's (Visual Studio) product and tools.
  • It is an orchestration engine and a framework to build micro services using mostly .NET. It also includes and support programming models.
  • Application not necessarily to be hosted in containers.
  • Unable to support traditional ASP.Net applications for as is deployment.
  • It is typically a PaaS offering.
  • Support hybrid deployments to run applications hosted on Service Fabric on Azure and onpremise

Azure Kubernetes Service - 

  • It is mostly likely a PaaS and an IaaS offering
  • The configuration is a bit more complex because you have to define all components in your cluster such as load balancers and endpoints. In Service Fabric more of it is done for you automatically. 
  • Store images in Docker Hub or Azure Container Registry and deploy to your preferred targets.
  • When you use an AKS cluster the Kubernetes master nodes are hidden from you and are operated by Microsoft, the only VM resources you see are the worker nodes, which are still IaaS and need to be managed and updated as usual. In AKS, you are only paying for the worker nodes, the managed part is free.
  • It can easily be used for lift and shift architectural styles and large-scale projects


Key Similarities

Native Cloud Features


  • Open source standards of cloud-native foundation
  • Powerful native cluster orchestration, cluster management, auto-scaling, self-healing
  • Open-source Interoperable (while using Service Fabric with Net-Core)
  • Azure Cloud Native: log analytics, managed service identity, encryption

Architectural Patterns

  • Micro-service and event-driven architecture
  • Multi-tenancy, orchestration, discovery, advanced clustering
  • lift and shift architectural styles and large-scale projects
  • Suited for Large scale complex prod apps 

DevOps, Language & Tools

  • Standard support for widely used DevOps methods and tooling, e.g., Jenkins, HELMS
  • Support remote debugging


Comparative Analysis

Following is the gist of overall key differences identified and experienced so far while using these two:


. Azure Service Fabric Azure Kubernetes Service
 Cloud feature  Native dev framework: "stateful/stateless reactive" & " 12-factor applications"  It’s a container orchestrator
 PaaS: Infrastructure is abstracted  IaaS & PaaS: low level control on infrastructure; resource quota, infra placement.
 Legacy Migration  Refactor legacy .Net application; unsupported Nuget Pkg  Rewrite/Refactor/Lift&Shift
 Containerization  Containers are just guest executables  Containers are primary need
 Language/platform and tools  Primarily Open Source support for .Net Core  Language agnostic
 DevOps  Affinity with MS tools and stack, e.g Visual Studio  Integration with wide range of OS tools and extension
 Vendor support  Indirect vendor lockin; SF is   not manages service with any other cloud provider  Supported by all leading cloud provider
 Community Support  Limited and MS dependent  Extensive and mature
 Practical challenges  Remote debugging is tedious
 Inaccessible logs to diagnose errors and issue
 Co-ordination with MS to raise and resolve issues. Service tickets and difficult SLAs
 Inconsistent services upgrade from local to portal

Conclusion

In this article I have tried to explain the key differences and basic similarities between the two container orchestration services (or framework) provided by Microsoft Azure. Hence, this will certainly help decide among these two depending upon the technical and architectural needs of the application. Feel free to provide suggestions in the comment section below.

Wednesday, January 30, 2019

Quick steps to setup Ubuntu GUI & RDP from Windows Machine

Introduction

The post will discuss mainly about provisioning an Ubutu machine on Amazon AWS, setting up GUI on the machine and finally able to do the RDP from the windows machine.

The approach really helped me to cut my cost of running my Machine Learning experiments from the Ubuntu (Linux) machine rather running them from a Windows machine. This actually reduced the bill by whooping 50%.

Step 1: Provision a Ubuntu machine on AWS

As explained in my earlier post, from the documentation we can acquire the Ubuntu machine on Amazon AWS. The tutorial also help with the process to connect to the new machine using Putty tool.

Step 2: Setup GUI on Ubuntu

Once we are successfully login and connected to the Ubuntu machine, execute the following commands in sequence:-

Command 1: sudo apt update
Command 2: sudo apt upgrade 
Command 3: sudo sed -i 's/^PasswordAuthentication no/PasswordAuthentication yes/' /etc/ssh/sshd_config
Command 4: sudo /etc/init.d/ssh restart 
Command 5: sudo passwd ubuntu (this will ask for password reset for user'ubuntu'. Remember it!)
Command 6: sudo apt install xrdp xfce4 xfce4-goodies tightvncserver
Command 7: echo xfce4-session > /home/ubuntu/.xsession 
Command 8: sudo cp /home/ubuntu/.xsession /etc/skel 
Command 9:  sudo sed -i '0,/-1/s//ask-1/' /etc/xrdp/xrdp.ini 
Command 10: sudo service xrdp restart 
Command 11: reboot


Step 3: Configure and Save Connections from Putty

Enable tunneling from the putty to use the localhost port# 8888 (just an e.g.) to the Ubuntu machine RDP port# 3389. Here we also need to use the Private IP of the remote machine and map its port# 3389 to the tunnel. Save the connection for connecting with the machine multiple times later.



Step 4: Connecting Via Windows machine

Again connect to the remote machine by “Load” the above saved putty configuration.
Once connected to machine using putty, use RDP (Run -> mstsc.exe) to connect to the remote machine by using: localhost:8888 in ‘computer’ field. This is the port# which I have used to enable tunneling via local computer.





Once connected – it will prompt for ‘ubuntu’ password which we configured in Step#2, command 5. Upon successful login we will be connected to the Ubuntu GUI from our Windows via RDP.



Conclusion

Once I was able to setup my Ubuntu GUI machine, then I further installed Visual Studio code as an IDE for my experiments and development and later installed my ML Docker image to have a fully functional Python and TensorFlow development environment.

Autoscaling: Azure HDInsight Cluster