Table Of Content

How to pick a Data Science book

A couple of months ago I was planning to review only 20 books at once, but when I started writing these lines, I was lost in an endless new hobby, collecting and reading, anything related to data science, and I decided to come back later to this post to write down what I learned (and am still learning) from this practice.

Choosing a data science book is one of the important steps to properly learn from the experts in the field. It doesn’t have to be labeled as a data science book as it can relate to one of its many branches.

In this short review, I am going to cover different points that can help you choosing the right data science book that fits your needs. Our needs keep changing with time so by looking at your library either on your bookshelf or on your tablet, you will notice the evolution of your interests and it will feel like you’re looking at an old picture of yourself years ago, it may even bring a little smile on your face.

It is not that hard to find a book nowadays for anything you want. Whatever you want to learn, there is someone, somewhere in the world, who got that covered in a book, a chapter of a book, may be few lines …

So the important thing here is to make a good decision as to pick a specific book or not because it can be a waste of time to pick the wrong/bad one, financially and also time wise.

Sometimes the outline is exactly what you hope for, you dig deep into details while going through the chapters, and you realize that the author just scratched the surface. This happened to me before and I am writing this post to spare you from doing this in the future.

This is a very quick checklist that will help you choose an interesting book, or at least prioritize properly.

  • Check the author’s bio : It can help to know the author(s), knowing the background of who wrote the book, his research, and main interests, is already an indication of the details level in the book. But, new authors deserve a chance too, don’t make this point crucial.
  • Read the introduction carefully : Most of the books online display the introduction for free. Don’t hesitate to download a preview and read the introductory pages carefully. Most of the time, authors describe not only the context in which the book was written but also the details of the different chapters.
  • Favor books with independent chapters : This is a personal preference, a technical book is not a fiction or a romance. Although it is important to learn gradually from a book, from the simpler aspects to the more complicated ones, choosing a book with more or less independent chapters will give you freedom on the structure you want to have to learn something, a book outline is the author’s personal perspective which is certainly helpful, but it was designed with a specific mindset that is not necessarily yours so you might keep some distance between the way you want to learn something and what others think.
  • Do you have a bookstore close by ? : Although you can find everything online nowadays, going at a bookstore will give you the chance to physically get in contact with what you need. Sometimes It happens that I just change my mind and get another book when I browse rapidly some key chapters of interests and get the feeling that it is not what I actually need.
  • Read online reviews : and don’t trust all of them, remember that reviews are subjective, but the online reviews will give you a general opinion of what people think of the book you want to get, and as we say: don’t judge a book by its cover :) I usually trust Amazon’s reviews, people leave insightful comments and criticisms to the authors and usually raise valid points that are worth taking in consideration.

Books of interest

There are tons of great books in data science, in this review, I am listing some of the most important ones, in my opinion. Please do add a comment with a reference to a book not covered in this post and that you think it is worth adding to the list.

Here is a list of the books I’ve read and that I really like (order doesn’t mean anything):

  Book Title Author(s)
Doing Data Science Cathy O’Neil and Rachel Schutt
Docker in Action Jeff Nickoloff
The Art Of R Programming Norman Matloff
Introducing Data Science Davy Cielen and Arno Meysman
Learning Predictive Analytics with Python Ashish Kumar
Data Structures and Algorithms in Python Michael T. Goodrich and Roberto Tamassia
Amazon Web Services in Action Andreas Wittig and Michael Wittig
Spark for Python Developers Amit Nandi
Machine Learning : A probabilistic perspective Kevin P. Murphy
Real World Machine Learning Henrik Brink and Joseph Richards
iPython Interactive Computing and Visualization Cookbook Cyrille Rossant
Mastering Machine Learning with scikit-learn Gavin Hackeling
Python Data Science Cookbook Gopi Subramanian
Building Machine Learning Systems with Python Willi Richert and Luis Pedro Coelho
Hadoop The Definitive Guide Tom White
Statistical Learning with Sparsity Trevor Hastie and Robert Tibshirani
The Elements of Statistical Learning Trevor Hastie and Robert Tibshirani
Fluent Python Luciano Ramalho
Thoughtful Machine Learning Matthew Kirk
Machine Learning with R Cookbook Yu-Wei, Chiu (David Chiu)
Docker in Practice Ian Miell and Aidan Hobson Sayers
Data Science and Big Data Analytics EMC Education Services
Mastering Object-Oriented Python Steven F. Lott
Machine Learning with Spark Nick Pentreath
Machine Learning for Hackers Drew Conway and John Myles White
Data Science for Business Foster Provost and Tom Fawcett
Developing Analytic Talent Vincent Granville
Think Python : How to Think Like a Computer Scientist Allen B. Downey
Python Algorithms Magnus Lie Hetland
Python Cookbook David Beazley and Brian K. Jones
Testing Python David Sale
Programming Collective Intelligence Toby Segaran
Data Analysis with open source tools Philipp K. Janert
Python in a Nutshell: A Desktop Quick Reference Alex Martelli, Anna Ravenscroft, Steve Holden
Python Machine Learning Sebastian Raschka
The Art of Data Science Roger D. Peng, Elizabeth Matsui
Machine Learning: The Art and Science of Algorithms that Make Sense of Data Peter Flach
Modern Python CookBook Steven F. Lott
Ensemble Methods: Foundations and Algorithms Zhi-Hua Zhou

Detailed review

Reviewing a bunch of books at once is a tough exercise. The reason I am bringing all these books together in a single post is that I think there is some sort of overlapping in the concepts and theories, the challenging part is that most of the time they are presented and explained differently, with a diverse vocabulary. Below is an attempt to picture what would be the ideal check-list to have before investing in a Data Science book. Also remember, you will never get enough from a single book, data science field is very complicated and cannot be contained in a single book.

In the following, I will pick the top 5 list of these books per criteria

Book size / Length (number of pages)

The length of a book is really dependent on the content being discussed. Although it is not a measure of quality, we will make the assumption here that the more you have content, the more you have substance and knowledge shared. I am ranking the top 5 books based on the content shared

length

Writing Style

Communicating science is very challenging, We can’t please everyone, and it really depends on the target audience. Some people are gifted though in communicating complex concepts in an easy and clear manner. Also having a good structure, rolling explanation in an organized and well-studied manner helps to get that gradual learning curve. I am ranking the top 5 books of the list above in terms of writing style

length

Structure

Teaching data science is not simple, and at the same time it’s not that difficult, we only need to know how to structure the content in order to ensure that the information is being retained. In that respect, there are may be two main ways of doing this. We can have an independent modules organization, where we talk about different concepts that are not really related but could be part of a data science analysis pipeline, dealing with these concepts separately does not require an order. In the other hand, one can structure the content in an incremental order of complexity, like we see in most books that teach, for example, regression, where they start by the most basic forms of regression and adding more and more variations and level of complexity of the most elaborate forms of regression. Here is my ranking for the top 5 books of this list

length

Content

How much is too much? Where to start? What subject to cover and what to skip? These are probably few of the questions that come into anyone’s head trying to write a book for Data Science. Some authors choose to cover a very specific and detailed area, when we look at the academic profile of these authors we can draw a correlation between their own research and their writings. Most of the times, these authors do not write general data science books, but rather part of their research. In that regards, their target audience is very narrow. In another hand, some authors have a more general view of teaching data science, focusing on the fundamental and global picture rather than details. Books like these, usually treat Regression, Classification, Using modules in R or Python to perform data analysis.
This is only a very subjective view that I have from the different resources in my own and personal library, this doesn’t mean it is a trend, but this how I see, for now, 99% of the books I came across in the last few years.

Judging the book by its cover?

Most people use this expression to tell you implicitly not to. I won’t do the same. Do we judge a book by its cover? definitely, we need to, we have to. Of course, I am not talking about the book external cover, but what you can read in the preface, in the first introductory paragraphs of the book, where authors most of the time give details about the different chapters being discussed in the book. Sometimes, authors deviate from their initial vision of what their books will look like at the end, which is normal, the field is moving fast, and so the thoughts. A good book respects always the initial vision.

Depth of explanation

How far can the authors go in their explanations? I think this is really related to a lot of points I’ve been talking about in this post. I think there is a correlation between this, the content, the structure and the length. The depth of explanation is what makes a good author, it’s a matter of dosage in the information release that is enough to make you absorb the knowledge and especially the kind of information that will stick into your brain for a very long time. For this, authors skills and initial training is a very important player here, as they must master the math behind the subject being discussed, which makes them go deep in the granularity level of the information being shared, with the right dosage to avoid losing the reader from the general frame of the book.

Code Explanation

Yes, notebooks, please ! With the popularity of tools like Jupyter notebooks, you can write an entire book as a collection of several notebooks with a bit of transformation according to the editor’s requirements of course. The code is important, but not necessary. If the main purpose of the book is to explain how specific approaches, algorithms, and methods work in the background, sometimes, a good idea I see a lot of authors doing is reimplementing an algorithm from Scratch. Although a lot of people would say: “Why bother, we have a module for that”, I simply would advise them to pick another book as they are reading the wrong one. Reimplementing will let you appreciate the amount of work put into these libraries that are optimized for scalability (and should definitely be used in production) Depending on the context, some books are just meant to teach you how to use a specific library or package, most of the times referred to as cookbooks, for this kind of books, authors rely on notebooks (also shared on GitHub or other versioning platforms as support for their books), and depending on the author, you will find enough code explanation to show you the common and best practices to master a specific subject.

Content classification meets exactly the structure for my own taste, again this is a personal classification you might comment/discuss your preferences below this post

length

Conclusion

This is a very subjective classification, I might come back and edit/append this post as I am continuing to read more books, if you have a book suggestion please comment below and let’s discuss

By the time I finished this post and edited the content, my library was already 20% bigger then when I started writing, so I think I will be editing this post quite often, make sure you subscribe below in order to keep informed about new reviews/additions.