Intro books to machine learning

This page lists some good intro books to machine learning and artificial intelligence.

The author states that “intuition is what you use when you don’t have enough data“. The author will show heuristically how intuition is slowly being taken out of analyzing big data and being replaced with algorithms which teach themselves how to make the data speak for themselves.

All learning starts with some knowledge” (a quote from Hume that the author invokes), and from Hume we know that there is a problem with induction, no matter what the particular cannot prove the universal. The trick is to get from the data (the particular) to the universal and the author explains in detail the five general ways we learn and shows how they work in practice. The five ways are: 

  • Symbolic (think: rational thought – math & logic),
  • Connectivity (modeling like the neural networks in the brain),
  • Bayesian (probabilities – nothing is certain and all is contingent),
  • Evolutionary (genetic algorithms), and
  • Analogy (similar cases).


It is a great book about machine learning. See below for some reasons:

  • Some negative comments mentioned that the book is math heavy, BUT actually that is not true – the required essential math concepts needed are covered in chapter 2 of the book. Everything goes around that. Machine learning is math.
  • Basic concepts are repeated when they are needed- but that’s what make the book so useful, continuous reinforcement.
  • The basic ideas are explained, again, every time they are used. Yes, it takes up a few additional lines and makes the material a bit redundant but it serves to reinforce the basic ideas upon which everything is built. We do not need to struggle in endless loops. Everything is right there, in a clear way. We do not need to Google.
  • Consistent use of a small vocabulary and a few central ideas: all techniques are boiled down to basic fundamental ideas. The ideas are developed early on, very clearly and we are told early on that the rest of the book will grow on these ideas. In Chapters one and two, Dr. Bishop lays down the fundamentals for Maximum likelihood and Bayesian models, linear models, explains inference and decision, and builds upon these principles.
  • Often clearly illustrates the big picture and its relation to the basic ideas and what are the essential roles of basic ideas would play in more advanced topics. Not many authors of books is able to achieve this.
  • As for negative comments about: Too much theory, not enough practice: yep, there isn’t any python code in the book. But a practical text is for advanced user. For beginners, and intermediate, it is better to understand the fundamentals, otherwise, we would probably fall into the common trap of trying several different models on our data and averaging them. If you are looking for code, just go to scikit-learn.
  • Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd Edition). Biometrics. Download the book PDF (corrected 12th printing Jan 2017) (2017 version pdf backup)
  • Natural Language Processing with Python – Analyzing Text with the Natural Language Toolkit by Steven Bird, Ewan Klein, and Edward Loper (This version of the NLTK book is updated for Python 3 and NLTK 3.)
  • Introduction to Evaluation metrics can be found in Chapter 6 (here).
  • Chapter 8: Evaluation in information retrieval (pdfhtml)
  • Chapter 17: Hierarchical clustering (pdf, html)
  • Seni, G., & Elder, J. F. (2010). Ensemble methods in data mining: improving accuracy through combining predictions. Synthesis Lectures on Data Mining and Knowledge Discovery2(1), 1-126. (pdf)

P 26-28 has pretty good and concise introduction to cross-validation.



  • Vapnik, V. N. (1999). An overview of statistical learning theory. IEEE transactions on neural networks, 10(5), 988-999. (Cited by 35415 as of 9/28/2018) — pdf 
  • Chawla, N. V. (2009). Data mining for imbalanced datasets: An overview. In Data mining and knowledge discovery handbook (pp. 875-886). Springer, Boston, MA. (Cited by 735 as of 9/30/2018) — pdf

2. Performance Measure

2.1 ROC Curves

2.2 Precision and Recall

3. Sampling Strategies


  • Gunale, K., & Mukherji, P. (2018). Indoor Human Fall Detection System Based On Automatic Vision Using Computer Vision And Machine Learning Algorithms. Journal of Engineering Science and Technology13(8), 2587-2605. (pdf)

This paper has pretty good explanation to  SVM, Decision Tree, KNN, and Gradient Boosting algorithms.

  • Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297. (Cited by 32334 as of 11/4/2018, PDF)
  • Hofmann, T. (1999, July). Probabilistic latent semantic analysis. In Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence (pp. 289-296). Morgan Kaufmann Publishers Inc. (Cited by 2442 as of 11/4/2018, PDF)
  • Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77-84. (Cited by 2794 as of 11/4/2018, PDF, Slides)
  • Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993-1022. (Cited by 24764 as of 11/4/2018, PDF)
  • Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National academy of Sciences, 101(suppl 1), 5228-5235. (Cited by 4884 as of 11/4/2018, PDF)
  • Röder, M., Both, A., & Hinneburg, A. (2015, February). Exploring the space of topic coherence measures. In Proceedings of the eighth ACM international conference on Web search and data mining (pp. 399-408). ACM. (Cited by 151 as of 11/4/2018, PDF)
  • Newman, D., Lau, J. H., Grieser, K., & Baldwin, T. (2010, June). Automatic evaluation of topic coherence. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (pp. 100-108). Association for Computational Linguistics (Cited by 459 as of 11/4/2018, PDF)