Through the journey of getting familiar with Azure Machine Learning, cloud based machine learning platform of Microsoft, we discussed about the very first steps of getting started.
When you open up the online studio through your favorite web browser, you’ll directed to create a blank experiment. Let’s start with it.
In your left hand side of the studio, you can see the pre-built modules that you can use to develop your experiments. If they are not enough for your case, you can use R or Python scripts in your experiment.
With Azure ML Studio, you get the ability to deploy models for almost all the machine learning problem types. The algorithms you can use for classification, regression and clustering are in the AML cheat sheet that you can download from here.(http://download.microsoft.com/download/A/6/1/A613E11E-8F9C-424A-B99D-65344785C288/microsoft-machine-learning-algorithm-cheat-sheet-v6.pdf)
Will take a look into the sections that modules are categorize. If you want to find a specific module, what you have to do is search the experiment item from the search box.
Saved datasets – You can find out a set of sample datasets that you can use for experiments. Most of the popular machine learning related datasets like “iris dataset” are available here. If you want your own dataset in the studio, you can upload it to here.
Trained models – These are the models that you get as the output after training the data using an appropriate algorithm and methodology. They can be used for building another experiment or a web service later.
Data Format Conversions – The data comes in and going out from the experiment can be converted into a desired format using the modules in this section. If you wish to convert the output of your experiment to ARFF format (which supported in Weka) or to a CSV file you can use the modules here.
Data input & output – Azure ML has the ability to get data from various sources directly. You can use an Azure SQL database, Azure BLOB storage or a hive query to get the data. Fetching data from a local SQL server is on preview yet (August 2016).
Data transformation – Data transformation tasks like normalization, clipping etc. can be done using the modules listed in this section. You can use SQL queries to do the data transformations if want.
Feature Selection – Appropriate feature selection increases the accuracy of your machine learning model drastically. There are three different methods as “Filter bases feature selection, Fisher linear discrimination and Permutation feature importance” that you can use according to your requirement.
Machine Learning – Within this section you can find out the modules built for training machine learning models, evaluate accuracy etc. Most of the popular machine learning algorithms used for classification, clustering and regression problems are listed down here as modules. The parameters of each module can be changed or use can you Tune Model Hyperparameters module to tune-up the experiment to get the optimal output.
OpenCV library Modules – ML is widely using in image recognition. In Azure ML there’s Predefined Cascade Image Classification that is trained to identify the images with front facing human faces.
Python language models – Python is one of the widely using languages in data mining and machine learning applications. With Azure ML studio you have the ability to execute your own python script using this module. 200+ common python libraries are supported with Azure ML right now.
R language models – Same as Python, R is one of the most favorite statistical languages among data scientists. You can use your favorite R scripts and train models with R using these modules. Most of the R packages are supported in Azure ML. If the package is not there you can import the packages for the experiment. (Unfortunately there are some limitations in this. Some R packages like RJava, openNLP are not supported yet with Azure ML – Aug.2016)
Statistical Functions – If you want to do some mathematical functions for the data or perform statistical operations, here you can find out the modules for that. A basic descriptive statistical analysis on the dataset also can be performed using the modules.
Text Analytics – Machine learning models can be used for text analytics. There are some modules included in Azure ML studio for text preprocessing (omit the stop words, punctuation marks, white spaces etc.), Named entity recognition (Pre trained module) and many more. Vawpal Wabbit learning system library is also included in the modules for the use.
Web service – One of the most notable advantages in Azure ML is the ability to deploy as a web service. Here’s the web service input and output modules that can be used for the built experiments.
Deprecated – Assigning data for clusters, binning, quantizing data, cleansing missing data can be done using these modules.
Building Azure ML experiments and deploying web applications using them are not that hard.
This is one of the best step by step guide for that task from MSDN.
In the coming posts will discuss on interesting applications in Azure ML hacks to build your predictive models.
Play with the tool and leave your experience as comments below. 🙂