Evaluating AzureML Experiments

Posted on October 12, 2017 by Haritha Thilakarathne

Azure Machine Learning Studio allows you to build and deploy predictive machine learning experiments easily with few drags and drops (technically 😉).

The performance of the machine learning models can be evaluated based on number of matrices that are commonly used in machine learning and statistics available through the studio. Evaluation of the supervised machine learning problems such as regression, binary classification and multi-class classification can be done in two ways.

Train-test split evaluation
Cross validation

Train-test evaluation –

In AzureML Studio you can perform train-test evaluation with a simple experiment setup. The ‘Score Model’ module make the predictions for a portion of the original dataset. Normally the dataset is divided into two parts and the majority is used for training while the rest used for testing the trained model.

Train-test split

You can use ‘Split Data’ module to split the data. Choose whether you want a randomized split or not. In most of the cases, randomized split works better. If the dataset is having a periodic distribution for an example a time series data, NEVER use randomized split. Use the regular split.

Stratified split allows you to split the dataset according to the values in the key column. This would make the testing set more unbiased.

Pros-
- Easy to implement and interpret
- Less time consuming in execution
Cons-
- If the dataset is small, keeping a portion for testing would be decrease the accuracy of the predictive model.
- If the split is not random, the output of the evaluation matrices are inaccurate.
- Can cause over-fitted predictive models.

Cross Validation –

Overcome the mentioned pitfalls in train-test split evaluation, cross validation comes handy in evaluating machine learning methods. In cross validation, despite of using a portion of the dataset for generating evaluation matrices, the whole dataset is used to calculate the accuracy of the model.

k-fold cross validation

We split our data into k subsets, and train on k-1 of those subsets. What we do is holding the last subset for test. We’re able to do it for each of the subsets. This is called k-folds cross validation.

Pros –
- More realistic evaluation matrices can be generated.
- Reduce the risk of over-fitting models.
Cons –
- May take more time in evaluation because more calculations to be done.

Cross-validation with a parameter sweep –

I would say using ‘Tune model Hyperparameters’ module is the easiest way to identify the best predictive model and then use ‘Cross validate Model’ to check its reliability.

Here in my sample experiment I’ve used the breast cancer dataset available in AzureML Studio that normally use for binary classification.

The dataset consists 683 rows. I used train-test split evaluation as well as cross validation to generate the evaluation matrices. Note that whole dataset has been used to train the model in cross validation case, while train-test split only use 70% of the dataset for training the predictive model.

Two-class neural networks has used as the binary classification algorithm. The parameters are swapped to get the optimal predictive model.

When observing the outputs, the cross-validation evaluation provides that model trained with whole dataset give a mean accuracy of 0.9736 while the train-test evaluation provides an accuracy of 0.985! So, is that mean training with less data has increased the accuracy? Hell no! The evaluation done with cross-validation provides more realistic matrices for the trained model by testing the model with maximum number of data points.

Take-away – Always try to use cross-validation for evaluating predictive models rather than going for a simple train-test split.

You can access the experiment in the Cortana Intelligence Gallery through this link –

https://gallery.cortanaintelligence.com/Experiment/Breast-Cancer-data-Cross-Validation

Copying & Migrating AzureML experiments

Posted on August 14, 2017 by Haritha Thilakarathne

A set Major advantages in using cloud based machine learning platforms are the ability of collaborative projects, easy sharing and easy migration. Within AzureML Studio you can share or migrate the experiments using various approaches.

01. Share AzureML workspace

If you want to share all the experiments in your workspace with another user, this is the best option you can go with. All your built experiments, trained models, datasets would be shared with the users with this permission.

Click SETTINGS in the left pane
Click the USERS tab
Click INVITE MORE USERS at the bottom of the page

ml4 The users you inviting should have a Microsoft account or a work/school account from Azure Active Directory. Two user access levels can be defined as “Users” and “Owners”.

02. Copy experiment to an AzureML workspace

If you want to migrate an experiment from the current workspace to another, you can go for the experiments pane and click “Copy to workspace”. Note that you only can copy experiments to the workspaces in the same Azure region. This is important if you want to move your experiment from a free tier workspace to a paid standard tier.

ml6 You’ll not be able to copy multiple experiments using a single click. If you have such kind of scenario, use poweshell scripts as instructed in this descriptive post.

03. Publish to Gallery

ml7 For me this is one of the most useful options. You can use this option in two ways. One is to make the experiments public and in a way that only accessible through a shared link. If you share the experiment publicly that will be listed in the Cortana Intelligence Gallery.

ml8 If you want to share an experiment only with your peer group, publishing as an ‘unlisted’ experiment is the best way. Users can open the experiment in their own AzureML studio. This option can be used to migrate your experiment within different workspaces as well as between different azure regions. Only the users who’s having the link you shared can only view or use the experiment you shared.

Time Series Forecasting with Azure ML

Posted on December 20, 2016 by Haritha Thilakarathne

airline1_web-0 When we have a series of data points indexed in time order we can define that as a “Time Series”. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Monthly rainfall data, temperature data of a certain place are some examples for time series.

In the field of predictive analytics, there are many incidents that need to analyze time series data and forecast the future values of that based on the previous values. Think of a scenario where you’ve to do a time series prediction for your business data or an incident where part of your predictive experiment contains a time series field that need to predict the future data points… There are many algorithms and machine learning models that you can use for forecasting time series values.

Multi-layer perception, Bayesian neural networks, radial basis functions, generalized regression neural networks (also called kernel regression), K-nearest neighbor regression, CART regression trees, support vector regression, and Gaussian processes are some machine learning algorithms that can be used for time series forecasting.

See here for more about these methods

Autoregressive Moving Average (ARIMA), Seasonal-ARIMA, Exponential smoothing (ETS) are some algorithms that widely used for this kind of time series analysis. I’m not going to dig deep into the algorithms, trend analysis and all numbers & characteristics bound with time series. Just going to demonstrate a simple way that you can do time series analysis in your deployments using Azure ML Studio.

After adding a dataset that contains a time series data into AzureML Studio, you can perform the time series analysis and predictions by using python or R scripts. In addition to that ML Studio offers a pre-built module for Anomaly detection of time series datasets. It can learn the normal characteristics of the provided time series and detect deviations from the normal pattern.

Here I’ve used forecast R package to write code snippets enabling AzureML Studio to do TS forecasting using popular time series algorithms namely as ARIMA, Seasonal ARIMA and ETS.

ARIMA seasonal & ARIMA non-seasonal

#ARIMA Seasonal / ARIMA non-seasonal 
library(forecast)
# Map 1-based optional input ports to variables
dataset1 <- maml.mapInputPort(1) # class: data.frame
dataset2 <- maml.mapInputPort(2) # class: data.frame

#Enter the seasonality of the timeseries here
#For non-seasonal model use '1' as the seasonality
seasonality<-12
labels <- as.numeric(dataset1$data)
timeseries <- ts(labels,frequency=seasonality)
model <- auto.arima(timeseries)
numPeriodsToForecast <- ceiling(max(dataset2$date)) - ceiling(max(dataset1$date))
numPeriodsToForecast <- max(numPeriodsToForecast, 0)
forecastedData <- forecast(model, h=numPeriodsToForecast)
forecastedData <- as.numeric(forecastedData$mean)

output <- data.frame(date=dataset2$date,forecast=forecastedData)
data.set <- output

# Select data.frame to be sent to the output Dataset port
maml.mapOutputPort("data.set");

ETS seasonal & ETS non-seasonal

#ETS seasonal / ETS non-seasonal 
library(forecast)
# Map 1-based optional input ports to variables
dataset1 <- maml.mapInputPort(1) # class: data.frame
dataset2 <- maml.mapInputPort(2) # class: data.frame

#Add the seasonality here
#Assign seasonality as 'a' for non-seasonal ETS  
seasonality<-12
labels <- as.numeric(dataset1$data)
timeseries <- ts(labels,frequency=seasonality)
model <- ets(timeseries)
numPeriodsToForecast <- ceiling(max(dataset2$date)) - ceiling(max(dataset1$date))
numPeriodsToForecast <- max(numPeriodsToForecast, 0)
forecastedData <- forecast(model, h=numPeriodsToForecast)
forecastedData <- as.numeric(forecastedData$mean)

output <- data.frame(date=dataset2$date,forecast=forecastedData)
data.set <- output

# Select data.frame to be sent to the output Dataset port
maml.mapOutputPort("data.set");

The advantage of using R script for the prediction is the ability of customizing the script as you want. But if you want looking for an instant solution for doing time series prediction, there’s a custom module in Cortana Intelligence gallery to do time series forecasting.

https://gallery.cortanaintelligence.com/Experiment/Time-Series-Forecasting-using-Custom-Modules-1

You just have to open that in your studio and re-use the built modules in your experiment. See what’s happening to your sales in next December! 🙂

Competing in Kaggle with Azure Machine Learning

Posted on November 17, 2016 by Haritha Thilakarathne

Data science is one of the most trending buzz words in the industry today. Obviously you’ve to have hell a lot of experience with data analytics, understanding on different data science related problems and their solutions to become a good data scientist.

Kaggle (www.kaggle.com) is a place where you can explore the possibilities of data science, machine learning and related stuff. Kaggle is also known as “the home of data science” because of it’s rich content and the wide community behind it. You can find out hundreds of interesting datasets uploaded by data science enthusiasts all around the world on Kaggle. The most fascinating thing that you can find on Kaggle is competitions! Some competitions are bound with exciting prize tags while some competitions offer wonderful job opportunities when you score a top rank on it.

As we discussed in previous posts, Azure Machine Learning enables you to deploy and test predictive analytics experiments easily. Sometimes you need to not to code a single line to develop a machine learning model. So let’s start our journey on Kaggle with Azure Machine Learning.

01. Sign up for Kaggle – Go to kaggle.com & sign up using your Facebook/Google or LinkedIn account. It’s totally free! 🙂

Kaggle landing page

02. Register for a Kaggle competition – Under the competition section, you can find out many competitions. Will start from a simple experiment that doesn’t go with any prize tag or job offering but worth enough to try out as your first experience on Kaggle.

Can you classify monsters?

03. Ghouls, Goblins, and Ghosts… Boo! – Search for this competition categorized under ‘Knowledge’ sector of the competitions. The task you have to do in the competition is described precisely on ‘Competition Details’

04. Get the data – After accepting the terms and conditions of Kaggle, you can download the training dataset, test dataset and the sample submission in .csv format. Make sure to take a deep look on features and understand whether you need some kind of data preprocessing before jumping into the task 😉

05. Understand the problem – You can easily figure out this is a multi-class classification machine learning problem. So let’s handle it on that way!

06. Get the data to your Studio – Here comes Azure Machine learning! Go to AML Studio (Setting up Azure Machine Learning is discussed here) and upload the data files through ‘Add Files’ option.

07. Build the classifier experiment – Same as building a normal AML experiment. Here I’ve split the training dataset to evaluate the model. The model with highest accuracy has chosen to do the predictions. ‘Tune model hyperparameter’ has used to find the optimal model parameters.

Classifier Experiment

08. Do the prediction – Now it’s time to use the trained model to predict the type of the ghost using the data in test dataset. You can download the predicted output using ‘Convert to CSV’ module.

Predicting with the trained model

09. Submission – Make sure to create the output according to the sample submission.

10. Upload the submission to Kaggle – You can compete as a team or individual. See where you are in the list!

Here’s I’m the 278th! 🙂

That’s it! You’ve just completed your first Kaggle competition. This might not lift you to the top of the competitors list. But it’s not impossible to use Azure Machine Learning in real world machine learning related problem solving.

Azure ML Web Services gets a new look

Posted on September 27, 2016 by Haritha Thilakarathne

Huge buzz going on Machine Learning. What for? Building intelligent apps is one of the dominant usages of machine learning. Web service is one of the understandable “language” for software developers. If the data scientists can provide a web service for the line of devs, they’ll be super excited because they only have to deal with JSON; not regression algorithms or neural networks! 😀

Azure ML studio provides you the power to deploy web services easily and nice interface that a software developer can understand. Consuming a web service built with Azure machine learning has become pretty easy because it even provide you the code samples and the sample JSONs that transfer in and out.

services.azureml.net

Recently AzureML Studio has come out with a new interface for managing the web services. Now it’s pretty easy for manage and monitor the behavior of your web services.

Go for your ML Studio. In web services section, you’ll find a new link directing to “New web services experience”. Currently it’s in the preview.

New web services dashboard

Dashboard shows the performance of the web service that you built. The average execution time is shown there. Even you can get a glimpse on monetary terms attached with consuming the web service with the dashboard.

Testing the web services can be done through the new portal. If you want to build web application to consume the web service you built, can direct to the azure web app template that is pre-built for consuming ML web services.

Take a look from (http://services.azureml.net) you’ll get used to it! 😀

Building a News Classifier with Azure ML

Posted on September 6, 2016 by Haritha Thilakarathne

news Classification is one of the most popular machine learning applications used. To classify spam mails, classify pictures, classify news articles into categories are some well known examples where machine learning classification algorithms are used.

This sample demonstrates how to use multiclass classifiers and feature hashing in Azure ML Studio to classify BBC news dataset into appropriate news category.

The popular 2004-2005 BBC news dataset has been used for this experiment. The dataset consists of 2225 documents from the BBC news website corresponding to stories in five topical areas from 2004-2005. The news is classified into five classes as Business, Entertainment, Politics, Sports and Tech.

Original dataset downloaded from “Insight Resources” Dataset consisted 5 directories, each containing text files with the news articles of particular category.

The data has been converted to a CSV file that fits with ML Studio by running a C# console application.

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
			//Specify the Directory location 
            string dir = @"D:\Document_Classification\bbc full text\bbc"; 
            var dirs = Directory.EnumerateDirectories(dir);

            List<string> csv = new List<string>();

            StreamWriter sw = new StreamWriter(dir + @"\BBCNews.csv");
            int index = 1;
            foreach(var d in dirs)
            {
                foreach(var file in Directory.EnumerateFiles(d))
                {
                    Console.WriteLine(file);
                    string content = File.ReadAllText(file).Replace(',', ' ').Replace('\n',' ');
                    sw.WriteLine((index++)+","+content+","+new DirectoryInfo(d).Name);
                    sw.Flush();
                }
            }

            Console.WriteLine("DONE");
            Console.Read();
        }
    }
}

The names of the categories has been used as the class label, or attribute to predict. The CSV file has uploaded to Azure ML Studio to use for the experiment.

Data Preparation –

The dummy column headings was replaced with meaningful column names using Metadata Editor. Missing values were cleared by removing the entire row of containing the missing value.

Term frequency–inverse document frequency (TF-IDF) of each unigram was calculated. The bit-size as 15 bits was specified to extract 2^15 = 32,768 hashing features. Top 5000 related features were selected for this experiment.

Feature Engineering –
I used the Feature Hashing module to convert the plain text of the articles to integers and used the integer values as input features to the model.

Model

Predictive Experiment built on Azure ML Studio

Multiclass Neural Networks module with default parameters has been used for training the model. The parameters were tuned using “Tune model Hyperparameters” module.

R script for creating word vocabulary –

# Map 1-based optional input ports to variables
dataset <- maml.mapInputPort(1) # class: data.frame
input.dictionary <- maml.mapInputPort(2) # class: data.frame
##################################################
# Determine the following input parameters:-
# minimum length of a word to be included into the dictionary. 
# Exclude any word if its length is less than *minWordLen* characters.
minWordLen <- 3

# maximum length of a word to be included into the dictionary. 
# Exclude any word if its length is greater than *maxWordLen* characters.
maxWordLen <- 25
##################################################

# we assume that the text is the first column in the input data frame
label_column <- dataset[[2]]
text_column <- dataset[[1]]

# Contents of optional Zip port are in ./src/
source("src/text.preprocessing.R");
data.set <- calculate.TFIDF(text_column, input.dictionary, 
	minWordLen, maxWordLen)
data.set <- cbind(label_column, data.set)

# Select the document unigrams TF-IDF matrix to be sent to the output Dataset port
maml.mapOutputPort("data.set")

R Script for text preprocessing

# Map 1-based optional input ports to variables
dataset <- maml.mapInputPort(1) # class: data.frame
##################################################
# Determine the following input parameters:-
# minimum length of a word to be included into the dictionary. 
# Exclude any word if its length is less than *minWordLen* characters.
minWordLen <- 3

# maximum length of a word to be included into the dictionary. 
# Exclude any word if its length is greater than *maxWordLen* characters.
maxWordLen <- 25

# minimum document frequency of a word to be included into the dictionary. 
# Exclude any word if it appears in less than *minDF* documents.
minDF <- 9

# maximum document frequency of a word to be included into the dictionary. 
# Exclude any word if it appears in greater than *maxDF* documents.
maxDF <- Inf
##################################################
# we assume that the text is the first column in the input data frame
text_column <- dataset[[1]]

# Contents of optional Zip port are in ./src/
source("src/text.preprocessing.R");

# the output dictionary includes each word, its DF and its IDF
input.voc <- create.vocabulary(text_column, minWordLen, 
	maxWordLen, minDF, maxDF)
 
# the output dictionary includes each word, its DF and its IDF 
data.set <- calculate.IDF (input.voc, minDF, maxDF)

# Select the dictionary to be sent to the output Dataset port
maml.mapOutputPort("data.set")

Results –
All accuracy values were computed using evaluate module.

This sample can be deployed as a web service and consume for a news classification application. But make sure that you are training the model using the appropriate training data.

Here’s the confusion matrix came as the output. Seems pretty good!

5cfa71bcddf14589a7693b8edf8b1194

Azure Machine Learning provide you the power of cloud to make complex time consuming machine learning problems more easy to compute. Build your own predictive module using AML Studio and see how easy it is. 🙂

You can check out the built experiment in Cortana Intelligence Gallery here! 🙂

Citation for the dataset –
D. Greene and P. Cunningham. “Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering”, Proc. ICML 2006.

Modules & Capabilities of Azure Machine Learning – Azure ML Part 03

Posted on August 28, 2016 by Haritha Thilakarathne

Through the journey of getting familiar with Azure Machine Learning, cloud based machine learning platform of Microsoft, we discussed about the very first steps of getting started.
When you open up the online studio through your favorite web browser, you’ll directed to create a blank experiment. Let’s start with it.

: Blank Experiment in Azure ML Studio

In your left hand side of the studio, you can see the pre-built modules that you can use to develop your experiments. If they are not enough for your case, you can use R or Python scripts in your experiment.
With Azure ML Studio, you get the ability to deploy models for almost all the machine learning problem types. The algorithms you can use for classification, regression and clustering are in the AML cheat sheet that you can download from here.(http://download.microsoft.com/download/A/6/1/A613E11E-8F9C-424A-B99D-65344785C288/microsoft-machine-learning-algorithm-cheat-sheet-v6.pdf)

Will take a look into the sections that modules are categorize. If you want to find a specific module, what you have to do is search the experiment item from the search box.

Saved datasets – You can find out a set of sample datasets that you can use for experiments. Most of the popular machine learning related datasets like “iris dataset” are available here. If you want your own dataset in the studio, you can upload it to here.

Trained models – These are the models that you get as the output after training the data using an appropriate algorithm and methodology. They can be used for building another experiment or a web service later.

Data Format Conversions – The data comes in and going out from the experiment can be converted into a desired format using the modules in this section. If you wish to convert the output of your experiment to ARFF format (which supported in Weka) or to a CSV file you can use the modules here.

Data input & output – Azure ML has the ability to get data from various sources directly. You can use an Azure SQL database, Azure BLOB storage or a hive query to get the data. Fetching data from a local SQL server is on preview yet (August 2016).

Data transformation – Data transformation tasks like normalization, clipping etc. can be done using the modules listed in this section. You can use SQL queries to do the data transformations if want.

Feature Selection – Appropriate feature selection increases the accuracy of your machine learning model drastically. There are three different methods as “Filter bases feature selection, Fisher linear discrimination and Permutation feature importance” that you can use according to your requirement.

Machine Learning – Within this section you can find out the modules built for training machine learning models, evaluate accuracy etc. Most of the popular machine learning algorithms used for classification, clustering and regression problems are listed down here as modules. The parameters of each module can be changed or use can you Tune Model Hyperparameters module to tune-up the experiment to get the optimal output.

OpenCV library Modules – ML is widely using in image recognition. In Azure ML there’s Predefined Cascade Image Classification that is trained to identify the images with front facing human faces.

Python language models – Python is one of the widely using languages in data mining and machine learning applications. With Azure ML studio you have the ability to execute your own python script using this module. 200+ common python libraries are supported with Azure ML right now.

R language models – Same as Python, R is one of the most favorite statistical languages among data scientists. You can use your favorite R scripts and train models with R using these modules. Most of the R packages are supported in Azure ML. If the package is not there you can import the packages for the experiment. (Unfortunately there are some limitations in this. Some R packages like RJava, openNLP are not supported yet with Azure ML – Aug.2016)

Statistical Functions – If you want to do some mathematical functions for the data or perform statistical operations, here you can find out the modules for that. A basic descriptive statistical analysis on the dataset also can be performed using the modules.

Text Analytics – Machine learning models can be used for text analytics. There are some modules included in Azure ML studio for text preprocessing (omit the stop words, punctuation marks, white spaces etc.), Named entity recognition (Pre trained module) and many more. Vawpal Wabbit learning system library is also included in the modules for the use.

Web service – One of the most notable advantages in Azure ML is the ability to deploy as a web service. Here’s the web service input and output modules that can be used for the built experiments.

Deprecated – Assigning data for clusters, binning, quantizing data, cleansing missing data can be done using these modules.

Building Azure ML experiments and deploying web applications using them are not that hard.

This is one of the best step by step guide for that task from MSDN.

In the coming posts will discuss on interesting applications in Azure ML hacks to build your predictive models.
Play with the tool and leave your experience as comments below. 🙂

Behind the Scene – Azure ML Part 02

Posted on July 18, 2016 by Haritha Thilakarathne

OverviewOfAzureML_960 With the power of cloud, we going to play with data now! 🙂

Machine Learning is a niche part of predictive analysis. Predictive analysis gets its power from the tools and techniques like mathematics, statistics, data mining, machine learning etc.… Predictive analysis doesn’t refer only predicting future events; real-time fraud credit card transaction detection also falls under a usage of predictive analysis.

Am not going to discuss the usages of machine learning and what you can do with machine learning methods. Let’s see what are the benefits that you getting by using Azure ML Studio for your analysis.

Fully managed scalable cloud service –You have to deal with thousands, mostly with millions of data records when you doing your analysis. The computation power of the local machine may not be sufficient for those kind of mammoth tasks. Get the use of Azure scalable & efficient cloud. It’ll make your predictions super-fast.

Ability to develop & deploy –Want to deploy an application that get intelligence with a ML backend? AzureML Studio is the best solution then. It provides you the ability to easily deploy a web service from your built ML model and use that in your application. REST will do the rest. J

Friendly user interface for data science workflow –I’m pretty sure dragging and dropping is your ‘thing’ right? So AML Studio suits for you! D from data loading to deployment of the web service, you get a friendly UI where mostly you can just drag and srop the modules into the workspace without bothering about their underlying complex algorithms.

Wide range of ML algorithms inbuilt –No need to start from the scratch. There are plenty of ML algorithms pre built as models in AML Studio. You can use them right away for building models.

R & Python integration –For data scientists, R and Python are like life blood. IF you wish to do intergrade your own scripts in the model, with AML Studio you have the chance here. You can choose either R/python or the both. AML Studio takes care of it.

Support for R libraries –R language has its vibrant user community and the rich set of libraries. With AML studio you get the access for most of the R libraries and you can add more libraries if want too.

Azure Machine Learning Process

Let’s go with the process. All starts with defining the objective. Before jumping into the problem, you should have a clear idea on what you going to do. Whether it’s a classification, linear regression, recommendation… you should be able to figure out it by skimming through the data sources and the problem definition.

Then the Data! Data maybe a set of sales data in your enterprise cloud or in your local storage. Identify the relevant data fields and components that you want for building up the model. If dataset exceeds 10GB, it’s better to store the data in Azure SQL database first and get the data through the ‘Import Data’ module. You can use HDInsight stored data using Hive queries too.

Pay attention on the data quality. Normally real world data is noisy, full of outliers, error values, missing values etc. So data preprocessing should be done first. Make sure the data fields are in the appropriate type (Numerical, categorical, etc.) In Azure ML there are plenty of modules that you can perform data preprocessing tasks.

Model Development! Here’s the fun part. You can use ML algorithms comes with studio or you can go with your own scripts in R or python here. If you familiar with ML model development platforms like Weka, RapidMiner, Orange you will find out this is, it is not so different. You have to put the right module at the right place. Have to use right algorithm to take the right decision.

After developing the model, normally we should train the models. For that you can use the past data that you have. You must always keep a portion from your dataset for testing the model too.

Is it over after training the model? No. Many more in the process. You should score and evaluate the model you built. It is useless if the predictions you making with the model you built is having a high error rate. You may haven’t use the appropriate algorithm or you may haven’t use the correct and optimal parameters. So using the ‘score model’ and ‘evaluate model’ you can compare different algorithms for the particular task and pick the best one out from them.

It’s obvious that ML algorithms are not 100% accurate always. But the model you building should have an accuracy more than a wild guessing.

After building your predicting magic box, you can publish it as a web service. This allows you to consume it either by a custom application, Microsoft Excel or similar tool.

For more accuracy, normally this process goes in an iterative manner.

Finishing up the theories and let’s get our hands dirty with our experiments!

Simply there are 3 steps to start working with Azure ML

Navigate to AzureML and choose your subscription plan
Create a Machine Learning workspace in Azure Portal
Sign in to ML Studio

Step 01 –Go to http://www.azure.com and products -> Analytics -> Machine Learning

cap1 You can use AzureML absolutely for free. But if you want to deploy a web service and play with serious tasks have to go for an appropriate subscription. If you have a MSDN subscription, you can use it here 🙂

Azure ML subscriptions

Step 02 –You need an Azure account here. If you don’t have one go for the 3-month free trial.

cap3 In the portal go for new -> data + analytics -> Machine Learning

From there you can create your workspace to do the machine learning tasks.

Step 03 –Sign in to the Azure ML Studio from https://studio.azureml.net

cap4 Now you are there! Click on the new -> Blank experiment!

We are ready to start the now.

The GUI of the AML Studio is pretty clear and easy to understand. Try to find out the way to upload the datasets and the modules that contains the ML algorithms from the pane in the left hand side.

Will explore some cool capabilities of Azure ML in the coming posts. Here’s a video for your motivation.

Part 01

Let’s Jump In! – Azure ML Part 01

Posted on June 23, 2016 by Haritha Thilakarathne

“In the world of intelligent applications, data will be the king!”. Despite of way they making the revenue, data has become the main asset of each company. Sales and distribution data, customer data repos, employee records, all sort of structured and unstructured data have become the life blood of the company’s business process because it is vital to get the accurate and relevant data to get the correct business decisions and do relevant business related predations.

Digital data and cloud storage follow Moore’s law: the world’s data doubles every two years, while the cost of storing that data declines at roughly the same rate.

pic67f7d1878ee27c87a401e8948934f751

This abundance of large amounts of data enables more features and tasks, and better machine learning models and methodologies should to be created for predictive analytics.

When the data is widely available in the cloud, and when it needs large computation power and infrastructure to process and analyze data repositories, the best move is the cloud!

Machine learning (ML) is starting to move to the cloud, where a scalable web service is an API call away. Data scientists will no longer need to manage infrastructure or implement custom code. The systems will scale for them, generating new models on the fly, and delivering faster, more accurate results.

What is Machine Learning?

Simply, machine learning is teaching the silicon chips to think! 😀 If we use the general definition: “Machine learning is the systematic study of algorithms and systems that improve their knowledge or performance with experience”

When you going through the theories behind machine learning you may find it is closely related to computational statistics, where you use computers in prediction making. Machine learning comes out with range of computing tasks to solve problems where designing and programming explicit algorithms is unfeasible.

All of these things mean it’s possible to quickly and automatically produce models that can analyze bigger, more complex data and deliver faster, more accurate results – even on a very large scale. The result? High-value predictions that can guide better decisions and smart actions in real time without human intervention.

Where the hell ML is used?

Did you notice that eBay is pushing you to buy a protective glass after you buying a fancy phone case for your iPhone? Netflix is suggesting movies for you? Siri or Cortana speech recognition? All these tiny miracles have been possible with the power of machine learning. Spam filtering you emails, speech recognition, recommender systems in electronic commerce are some famous applications of machine learning.

So… How we going to do?

If you google or do a Bing search on machine learning, you’ll find out hundreds of ways of applying machine learning techniques in practical applications and tools that we can use to create machine learning models.

Screen-Shot-2016-06-08-at-3.35.53-PM-1024x730

Here’s a glimpse of Intelligent App Stack

With my post series, mainly am going to take you a journey with Azure Machine Learning Studio, which comes under the Cortana Intelligence Suite.

Why AzureML?

cortana-intel-suite-640x343

With advanced capabilities, free access, strong support for R, cloud hosting benefits, drag-and-drop development and many more features, Azure ML is ready to take the consumerization of ML to the next level.

It’s easy as ABC and powerful enough to handle petabytes of data with the power of Azure.

Theories??

Basics on computing and statistics will be useful to go forward. It’s fantastic if you have a rough idea about the machine learning algorithms, data pre preparation methods kind of stuff. Don’t worry. Here’s a book to read! 🙂

So will take the first step to Azure ML in the coming post.

Part 02

NaadiSpeaks

Where Data meet the Pulse

Category Archives: Machine Learning