How To Determine Which Machine Learning Technique Is Right For You?

Machine Learning is a vast field with various techniques available to a practitioner. This blog is about how to navigate this space and apply the right methods for your problem.

What is Machine Learning?

Tom Mitchel provides a very apt definition: “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.”

E = the experience of playing many games.

T = the task of playing an individual game.

P = the probability that the program will win the next game.

For example a machine playing Go was able to beat the world’s best Go player. Earlier machines were dependent on humans to provide the example learning set. But in this instant, the machine was able to play against itself, and learn the basic Go techniques.

Broad classification of Machine Learning techniques are:

Supervised Learning: A set of problems where there is a relationship between input and output; Given a data set where we already know the correct output,  we can train a machine to derive this relationship and use this model to predict outcomes for previously unknown data points. These are broadly classified under “regression” and “classification” problems.

  • Regression: When we try to predict results within a continuous output meaning we try to map input variables to some continuous function.  For e.g. given the picture of a person, predicting the age of the person.
    1. Gradient Descent – or steepest descent is an optimization technique to follow the largest derivative to get to a local or global minima. This technique is often used in machine learning applications to calculate the coefficients in regression curve fitting over a training data set. Using these curve fitting coefficients, the program can then make  predictions on a continuous valued output for any new datasets presented to it.
    2. Normal Equation –  (\[\theta=(X^TX)^{-1}X^Ty\]) Refers to a set of simultaneous equations involving experimental unknowns and derived from a large number of observation equations using least squares adjustments.
    3. Neural Networks: Refers to a system of connected nodes that mimic our brains (biological neural networks). Such systems learn the model coefficients by observing real life data and once tuned can be used in output predictions for unseen data or observations outside the training set.  
  • Classification: When we try to predict results in a discrete output i.e. map input variables into discrete categories.  For e.g. given a patient with tumor, predicting whether its benign or malignant. Types of classification algorithms: 
    1. Large Margin Classification
    2. Kernels
    3. Support Vector Machines

 

Unsupervised Learning: When we derive the structure by clustering the data based on relationships among the variables in the data. With unsupervised learning there is no feedback based on the prediction results.

 

  • Clustering: Its the process of dividing a set of input data into possibly overlapping, subsets, where elements of each subset are considered related by some similarity measure. Take a collection of data, and find a way to automatically group this data that are similar or related by different variables. For e.g. the clustering of news on the google news home page.

Some classic graph clustering algorithms are the following:

  1. Kernel K-means : Select k data points from i/p as centroids, assign data points to nearest centroid; recompute centroid for each cluster till centroids do not change.
  2. K-spanning tree: Obtain the minimum spacing tree (MST) of an input graph; removing k-1 edges from the MST results in k clusters.
  3. Shared nearest neighbor: Obtain the shared nearest neighbor (SNN) graph the input graph; removing edges from the SNN with weight less than τ results in groups of non overlapping vertices. 
  4. Betweenness centrality based: quantifies the degree to which a vertex (or edge) occurs on the shortest path between all other pairs of nodes.  
  5. Highly connected components: the minimum set of edges whose removal disconnects a graph to produce a highly connected subgraph (HCS). 
  6.  Maximal clique enumeration : A subgraph C of graph G with edges between all pairs of nodes; Maximal clique is a clique not part of the larger clique; 

 

  • Non-Clustering: Allows you to find structure in a chaotic environment.
    1. Reinforced Learning: where software agents automatically determine ideal behavior to maximize performance.
    2. Recommender Systems: Is an information filtering system that seeks to predict the preference for an item from a user’s perspective by watching and learning the user’s behavior.
    3. Natural Language Processing: Is a field that deals with machine interaction with human languages. Specifically manages the following 3 challenges: speech recognition, understanding and response generation.

 

And finally, remember the 7 essential steps in accomplishing your machine learning project are the following:

  • Gathering the data
  • Preparing the data
  • Choosing a Model
  • Training your Model
  • Evaluating your Model parameters
  • Hyperparameter training
  • And finally prediction

 

Leave a Reply

Your email address will not be published. Required fields are marked *