Practical Machine Learning
Article Index
Practical Machine Learning
Chapters 4 - 10
Review continued


Chapter 4 Machine Learning Tools, Libraries, and Frameworks

This chapter opens with an outline of the current landscape for Machine Learning tools. This is followed by an overview of 5 of the more common tools, namely: Apache Mahout, R, Julia, Python and Apache Spark. For each tool, details are provided on how to install and configure it, how it integrates with Hadoop, its basic syntax, example usage, and its specific advantages.

This chapter provides a useful overview of the current state of Machine Learning tools. Helpful instructions are provided to get you up-and-running with each tool. I do wonder why we are looking at five different tools, there is not enough detail provided to become proficient with any of these tools.

Chapter 5 Decision Tree based learning

The first four chapters have provided general background information. This chapter, through to chapter 10, look at implementing the specific Machine Learning algorithms discussed previously (e.g. classification), using the previously discussed tools (e.g. Julia).

The chapter opens with an overview of decision trees (definition, terminology, purpose etc). Next, a simple decision tree is built, and its limitations are noted, measures of uncertainty are described, and means of pruning the trees to reduce over-fitting are discussed. Various decision tree algorithms are discussed (e.g. CART, C5.0) with the aid of helpful diagrams. Some specialized trees are briefly described (e.g. Random forest), again with helpful diagrams. The chapter ends rather abruptly, with a page about implementing decision trees, which contains links to decision tree example code in each of the 5 Machine Learning tools given in chapter 4 (i.e. Mahout, R, Spark, Python, Julia).

This chapter provides a useful overview of what decision trees are, its terminology, advantages, problems/solutions, and types. It would have been much more useful to have provided a step-by-step walkthrough of at least some of the code examples, rather than provide just a link.

Chapter 6 Instance and Kernel Methods Based Learning

This chapter opens with a look at Instance-based Learning, this stores training data which is then used subsequently for prediction. Both lazy and eager learning are described. There’s a brief look at some algorithms (e.g. Nearest Neighbor, Radial basic functions), before looking at a real-world use case solved using the KNN (k-Nearest Neighbor) algorithm.

The chapter next looks at Kernel methods-based learning algorithms, these take two input and return details of their similarity. There is a brief look at various algorithms (e.g. Support Vector Machines [SVM]), before looking at a real-world use case solved using the SVM algorithm.

I’m not sure why the two algorithms were included in the same chapter. In both detailed use cases, a step-by-step code walkthrough would have been useful instead of a link to code. Although the chapter contains plenty of math formulae, it is not discussed in any detail.

Chapter 7 Association Rules based learning

Association rules based learning is concerned with discovering associates that can be used for classification, and subsequent prediction. The chapter opens by briefly defining association rule, and then looks at the Apriori algorithm, illustrated with a step-by-step example, before highlighting the disadvantages of the Apriori algorithm. Next, the more efficient FP-growth algorithm is discussed, again a step-by-step example is described. The chapter ends with links to code that implement Apriori and FP-growth algorithms using each of the 5 tools.



Chapter 8 Clustering based learning

Clustering based learning is used to identify related groups (clusters) of data. The chapter opens with a look at the different types of clustering (e.g. Hierarchical, Partitional), before looking at the k-means clustering algorithm in detail – discussing its advantages and disadvantages. The importance of choosing the right number of clusters is discussed. The chapter ends with links to code that implement the k-means clustering algorithm using each of the 5 tools.

Chapter 9 Bayesian learning

Bayesian learning relates to the probability of data belonging to a given group. The chapter opens with a look at what Baysian learning is, and a short statistics overview is provided. Bayes’ theorem is then discussed, before providing a deeper look at the Naive Bayes algorithm and some of its variations (e.g. Bernoulli classifiers). The chapter ends with links to code that implement the Naive Bayes classifier algorithm using each of the 5 tools.

Chapter 10 Regression based learning

Regression based learning aims to discover the relationship between two or more variables. The chapter opens with a look at what regression analysis involves, with a further look into statistics (e.g. variance, covariance, correlation). Next, various regression methods are discussed (e.g. multiple, Poisson). The chapter ends with links to code that implements linear regression algorithm using each of the 5 tools. Generally, the various formulae are briefly explained, and some examples provided, however a math background would make the chapter easier to understand.




Last Updated ( Saturday, 28 November 2020 )