Pankaj Jainani Blog: January 2019

Wednesday, January 30, 2019

Quick steps to setup Ubuntu GUI & RDP from Windows Machine

Introduction

The post will discuss mainly about provisioning an Ubutu machine on Amazon AWS, setting up GUI on the machine and finally able to do the RDP from the windows machine.

The approach really helped me to cut my cost of running my Machine Learning experiments from the Ubuntu (Linux) machine rather running them from a Windows machine. This actually reduced the bill by whooping 50%.

Step 1: Provision a Ubuntu machine on AWS

As explained in my earlier post, from the documentation we can acquire the Ubuntu machine on Amazon AWS. The tutorial also help with the process to connect to the new machine using Putty tool.

Step 2: Setup GUI on Ubuntu

Once we are successfully login and connected to the Ubuntu machine, execute the following commands in sequence:-

Command 1: sudo apt update

Command 2: sudo apt upgrade

Command 3: sudo sed -i 's/^PasswordAuthentication no/PasswordAuthentication yes/' /etc/ssh/sshd_config

Command 4: sudo /etc/init.d/ssh restart

Command 5: sudo passwd ubuntu (this will ask for password reset for user'ubuntu'. Remember it!)

Command 6: sudo apt install xrdp xfce4 xfce4-goodies tightvncserver

Command 7: echo xfce4-session > /home/ubuntu/.xsession

Command 8: sudo cp /home/ubuntu/.xsession /etc/skel

Command 9: sudo sed -i '0,/-1/s//ask-1/' /etc/xrdp/xrdp.ini

Command 10: sudo service xrdp restart

Command 11: reboot

Step 3: Configure and Save Connections from Putty

Enable tunneling from the putty to use the localhost port# 8888 (just an e.g.) to the Ubuntu machine RDP port# 3389. Here we also need to use the Private IP of the remote machine and map its port# 3389 to the tunnel. Save the connection for connecting with the machine multiple times later.

Step 4: Connecting Via Windows machine

Again connect to the remote machine by “Load” the above saved putty configuration.

Once connected to machine using putty, use RDP (Run -> mstsc.exe) to connect to the remote machine by using: localhost:8888 in ‘computer’ field. This is the port# which I have used to enable tunneling via local computer.

Once connected – it will prompt for ‘ubuntu’ password which we configured in Step#2, command 5. Upon successful login we will be connected to the Ubuntu GUI from our Windows via RDP.

Conclusion

Once I was able to setup my Ubuntu GUI machine, then I further installed Visual Studio code as an IDE for my experiments and development and later installed my ML Docker image to have a fully functional Python and TensorFlow development environment.

Thursday, January 24, 2019

TensorFlow Docker setup on Ubuntu

Introduction

To continue with series of my experiments on setting up TensorFlow development environment, in this post I will cover:

How to setup TensorFlow Docker Development Environment on the Ubuntu machine?

The is one step ahead of what I did to setup Tensorflow on the windows machine. Along side with the robust Visual studio code as IDE for development.

Following are the steps which I used for the quick setup:

STEP 1: ACQUIRE THE UBUNTU MACHINE

I quickly acquire the latest Ubuntu linux machine through my AWS account. One can use this quick tutorial from AWS documentation to “Launch Instance” of type Ubuntu

STEP 2: SETTING UP DOCKER

Once we have the machine, the next step is to quickly do the setup of docker. This require execution of the commands as mentioned in the tutorial: Get Docker CE for Ubuntu

STEP 3: CREATE A DOCKER IMAGE

At the PWD create a file having name Dockerfile (without any extension), copy the below content and save the file.

FROM python:3.6
RUN apt-get update -y
RUN apt-get install -y git
RUN apt-get install -y unzip
update pip
WORKDIR /remote
VOLUME /remote
ENV REPO ""
RUN pip install pip --upgrade
RUN pip install wheel
RUN pip install -U pip virtualenv
RUN virtualenv --system-site-packages -p python ./tensorflow && \
sh ./tensorflow/bin/activate && \
pip install --upgrade pip && \
pip install --upgrade numpy && \
pip install --upgrade scipy && \
pip install --upgrade opencv-python && \
pip install --upgrade matplotlib && \
pip install --upgrade pandas && \
pip install --upgrade sklearn && \
pip install --upgrade scikit-image && \
pip install --upgrade tensorflow && \
pip install --upgrade keras && \
pip list

Refer code in the repo link

STEP 4: Build the Docker Image

Once we have Dockerfile
$ sudo docker build -t di-ubuntu-py3-tf .

STEP 5: RUN THE DOCKER INSIDE THE DOCKER CONTAINER

Till now we have the build the TensorFlow docker environment inside the Ubuntu machine, next is to allocate the computation resources to this image. Hence let’s run this image inside the container using below command:

$ sudo docker run -i -t --rm -v $(pwd):/remote:rw di-py3-tf-base /bin/bash

The above command execute the TensorFlow docker image inside the container and various options used in the command has following capabilities –

-t: flag assigns a pseudo-tty or terminal inside the new container
-i: flag allows you to make an interactive connection by grabbing the standard input (STDIN) of the container
–rm: flag automatically removes the container when the process exits
-d: Run the container as the daemon
-v: is a volume mounting HOST DIRECTORY on the ubuntu machine to the CONTAINER DIRECTORY as defined in docker image.

STEP 6: INSIDE THE CONTAINER

Once the container is started, it enables the fully capable TensorFlow development environment in the Ubuntu (linux) machine. The /remote directory which is also defined as the working directory for the container in Dockerfile is mapped to the /{pwd} on the host machine. This mapped volume will always persist on the host machine even if the container is terminated.

Conclusion

By quickly acquiring an Ubuntu machine from AWS console and on top of it setting up the docker environment can help to run various Open Source Deep Learning docker images framework on the fly. Here, I demonstrated the process using my own custom configured docker image. This actually gives me lots of flexibility and control on my development environment.

Wednesday, January 23, 2019

Preparing Visual Studio Code for TensorFlow Development

Introduction

Visual Studio Code is a popular open-source IDE distributed by Microsoft. This is a powerful tool which support wide variety of code development almost across all the platforms. This allows various extensions which can be installed to support entire end-to-end development life-cycle.

Google’s TensorFlow is well known Deep Learning library which was originally available for the Python developers.

This article describe about preparing TensorFlow development environment on Visual Studio Code on Windows VM.

Prerequisites

Download and install Visual Studio Code
Python: TensorFlow support version 3.4, 3.5, 3.6 (64-bit version)
Pip: Which installs as an option with Python
VirtualEnv: To setup virtual environment for TF
Download and install Visual C++ 2015 Redistributable Update 3 from this URL: https://www.microsoft.com/en-us/download/details.aspx?id=53587

Setup Steps

Download and install Visual Studio Code on the Windows 64-bit machine. The the download url and installation instructions are available on VS Code site
Open VS Code and create a workspace folder. Also create a dummy python file, call it tf_test.py.
Once the file is created – VS Code will prompt for Python installation and Python extensions for VS Code. Install Python 3.6 from official download site. [Ignore if Python is already installed].
Allow VSCode to install and enable necessary extensions for Python, e.g. Pylint, etc..
TensorFlow installation: refer the [url]
Install Virtual env
PS C:\Program Files\Python36> pip3 install -U pip virtualenv
Create new TensorFlow virtual environment:
virtualenv --system-site-packages -p python ./tensorflow
Activate the TenforFlow virtualenv:
PS C:\Program Files\Python36> ./tensorflow/scripts/activate
(tensorflow) PS C:\Program Files\Python36>
Execute the python script as:
(tensorflow) PS C:\Program Files\Python36> python <filePpath>\tf_test.py.

Lastly, to deactivate and exit the virtual environment use : deactivate

Conclusion

This simple tutorial for TensorFlow setup on the Windows machine will help developers who are not comfortable to start with training Deep Learning models on Ubuntu (or any Linux) machine.

Saturday, January 19, 2019

Ensemble Approach - Stacking

This is going to be series of tutorials mainly describing about various Ensemble techniques and approaches. This also gives the very high-level idea of the implementation of each of these techniques in Python. Here, few experiments are performed on the famous Iris dataset, and the task is to classify the plant species from its key attributes, namely: length, width, sepal and petal.

At the beginning of this I am going to start with the simple Stacking example- Here i will be using a self-generated random data-set with two input variables X1 and X2, output variable is, Y.

To start with the experiment let's perform the basic steps to setup notebook:

Import the required Python libraries for the experiment

import numpy as npimport pandas as pd

import os

import sklearnfrom sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeClassifier

from sklearn.neighbors import KNeighborsClassifier

from sklearn.linear_model import LogisticRegression

Import the data-set, add the columns attributes and shuffle the data-set

#Process on iris dataset
dataset = pd.read_csv("../input/iris-dataset/iris.data.csv"
dataset.columns= ["length","width", "sepal", "petal","class"]
dataset = dataset.sample(frac=1).reset_index(drop=True)
#Process on random dataset
random = pd.read_csv("../input/randomdata/random-data.csv"

Create train and test split

data_x = random.iloc[:,0:3]
data_y = random.iloc[:,3]
train_x, test_x, train_y, test_y = train_test_split(data_x, data_y, test_size = 0.20,random_state = 1001)

Implementing - Stacking

Stacking is a technique by which we Pick the Model1 as a base model and then create a K-Fold from the training set, then this base model is made to learn from the K-1 part of the training data. Then the prediction is made on the K-th split set of the training data. This process is repeated K-times to fetch next set of (K-1) splits for training and then K-th set for validation purpose.

The approach is better demonstrated from the experiment shown below: The Stacking function is defined with 4 arguments. The stratified K-folds are made to create K-splits for each iteration. *Iteration 1* fit the model on k-1 splits and predict results of the k-th split (which is actually a validation set). Thus repeating the entire process k-times. Along with this, simultaneously the same base model is also fit against the test set.

The entire above process is then repeated for the next base model Model2 resulting in entirely new set of predictions for train set and test set.

Method Definition:

def stacking(model, train, y, test, n_fold):
folds = sklearn.model_selection.StratifiedKFold(n_splits = n_fold, random_state=1001)
test_pred = []
train_pred = []
for train_indices, val_indices in folds.split(train,y.values):
x_train, x_val = train.iloc[train_indices], train.iloc[val_indices]
y_train, y_val = y.iloc[train_indices], y.iloc[val_indices]

model.fit(X=x_train, y=y_train)
train_pred = np.append(train_pred, model.predict(x_val))
test_pred = np.append(test_pred, model.predict(test))

return test_pred, train_pred

Model 1

model1 = DecisionTreeClassifier(random_state=1)
test_pred1,train_pred1 = stacking(model = model1,n_fold = 5,
train = train_x,test = test_x,y = train_y)
train_pred1 = pd.DataFrame(train_pred1).astype(int)
test_pred1 = pd.DataFrame(test_pred1).astype(int)

Model 2

model2 = KNeighborsClassifier()
test_pred2,train_pred2 = stacking(model = model2,n_fold = 5,
train = train_x,test = test_x,y = train_y)
train_pred2 = pd.DataFrame(train_pred2).astype(int)
test_pred2 = pd.DataFrame(test_pred2).astype(int)

Once we have the predictions from the test set, we will use these predictions as the new set of features to create a Model3 (stacking the results from the above 2). Lastly the 3rd Model is used to predict on the test set for the final predictions. Below is the code to implement the same.

df_final_train = pd.concat([train_pred1,train_pred2], axis=1)

df_final_test = pd.concat([test_pred1, test_pred2], axis=1)

Model 3

model3 = DecisionTreeClassifier(random_state=1)

model3.fit(X=df_final_train, y=train_y)

pred = model3.predict(df_final_test.reset_index(drop=True))

model3.score(df_final_test.reset_index(drop=True), test_y.reset_index(drop=True))

Conclusion

Hence the example shows the basic and the simple way to implement the ensemble stacking using simple base models. This, approach helps to combine the predictive power of simple base models to perform better predictions.
In the Next tutorial we will see one more basic ensemble approach which is almost similar to stacking, called blending.

Wednesday, January 30, 2019

Quick steps to setup Ubuntu GUI & RDP from Windows Machine

Introduction

Step 1: Provision a Ubuntu machine on AWS

Step 2: Setup GUI on Ubuntu

Step 3: Configure and Save Connections from Putty

Step 4: Connecting Via Windows machine

Conclusion

Thursday, January 24, 2019

TensorFlow Docker setup on Ubuntu

Introduction

STEP 1: ACQUIRE THE UBUNTU MACHINE

STEP 2: SETTING UP DOCKER

STEP 3: CREATE A DOCKER IMAGE

STEP 4: Build the Docker Image

STEP 5: RUN THE DOCKER INSIDE THE DOCKER CONTAINER

STEP 6: INSIDE THE CONTAINER

Conclusion

Wednesday, January 23, 2019

Preparing Visual Studio Code for TensorFlow Development

Introduction

Prerequisites

Setup Steps

Conclusion

Saturday, January 19, 2019

Ensemble Approach - Stacking

Import the required Python libraries for the experiment

Import the data-set, add the columns attributes and shuffle the data-set

#Process on iris dataset dataset = pd.read_csv("../input/iris-dataset/iris.data.csv" dataset.columns= ["length","width", "sepal", "petal","class"]dataset = dataset.sample(frac=1).reset_index(drop=True) #Process on random datasetrandom = pd.read_csv("../input/randomdata/random-data.csv"

Create train and test split

Implementing - Stacking

Conclusion

Autoscaling: Azure HDInsight Cluster

#Process on iris dataset
dataset = pd.read_csv("../input/iris-dataset/iris.data.csv"
dataset.columns= ["length","width", "sepal", "petal","class"]
dataset = dataset.sample(frac=1).reset_index(drop=True)
#Process on random dataset
random = pd.read_csv("../input/randomdata/random-data.csv"