21 Essential Books for Data Scientists

21 Essential Books for Data Scientists
Image generated by Midjourney

They used to say ‘an apple a day keeps the doctor away’, but we say ‘a chapter a day keeps the doctor away’. Okay, maybe that’s a slight exaggeration, but over the past couple of decades there has been mounting scientific evidence to support the health benefits of reading. Here, we’ve compiled a list of essential reading material for data scientists.

Reading books improves your mental and physical health. Scientific studies suggest that reading strengthens your brain, enriches your vocabulary (which is super helpful for improving soft skills, because the better you express yourself, the more effectively you communicate), reading can increase your ability to empathise, help reduce stress (apparently, it works faster than other relaxation methods, such as listening to music), and makes you appear... sexier! Do we need to say more?

Currently, one of the hottest scientific topics is longevity. Committed bookworms may live longer. A study by researchers at Yale University suggests that book readers live longer... so they can read more! ‘Compared to non-book readers, book readers had a 4-month survival advantage at the point of 80% survival. Book readers also experienced a 20% reduction in risk of mortality over the 12 years of follow up compared to non-book readers.’.

So, there you have it – reading is good for you. And it doesn't matter what you read – as long as you enjoy it. We really hope the following suggestions will be included in your all-time-favourite reads.

Books on Technical Data Science for Beginners

As a data scientist you’ll come across different types of data – images, videos, written text, spoken words, and, obviously, numerical values. If all this sounds a little confusing, a good way to start is by acquainting yourself with some of the fundamental concepts because you’ll be able to apply them to almost any discipline and data type.

Data Science from Scratch: First Principles with Python by Joel Grus

In this book, you will learn the fundamentals of linear algebra, statistics, and probability as well as how to explore and clean your dataset. You will also get to know the fundamentals of machine learning by implementing linear regression, logistic regression, decision trees, and neural network models. Finally, you will get an introduction to the recommender system, natural language processing, and network analysis.

Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python by Peter Bruce and Andrew Bruce

Data science is based on statistical methods, yet not every data scientist is familiar with statistical training. This is a perfect guide, with practical examples, to applying statistical methods to data science and avoiding their misuse.

The numerous examples cited are suitable for both Python and R.

Introduction to Linear Algebra by Gilbert Strang

Linear algebra is a field of mathematics that is required for a deeper understanding of machine learning. It is the study of lines and planes, vector spaces, and mapping that is required for linear transforms. If you want to learn linear algebra, you need to check this book out as well as the great additional resources that it provides.

Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurélien Géron

This is a guidebook on how to practically implement machine learning and deep learning algorithms using Scikit-Learn, Tensorflow, and Keras libraries. It explores several training models, including support vector machines, decision trees, random forests, neural networks, and ensemble methods.

Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville

This is another great book with practical examples and implementations of deep learning algorithms. The best thing about this book is that it is also intended for those who are not familiar with deep learning and machine learning. It is also freely available online.

Deep Learning for Coders with FastAI & PyTorch by Jeremy Howard and Sylvain Gugger

A hands-on guide on how to implement deep learning algorithms from scratch in computer vision, natural language processing, and tabular data using PyTorch. In addition, the authors also teach you how to improve accuracy, as well as speed and reliability, and turn your models into web applications.

Non-technical Books for Data Scientists

If you’re new to data science, before you run any code, you’ll probably want to get a better understanding of how to approach any data science problem. This will help you to avoid the main mistakes, for example, immediately recommending the consumption of curly fries to become smarter. Here are some non-technical books that will help you become familiar with statistics, probability, and other useful concepts.

Weapons of Math Destruction by Cathy O’Neil

How would you feel if you were fired on the basis of a random number generator or if you weren’t hired because the interview test contained illegal questions about your mental health? Mathematician Cathy O’Neil explains the risks of using unregulated and biased big data algorithms in a variety of fields, including insurance, education, and advertising, and how this can lead to decisions that amplify inequality and affect socially vulnerable individuals.

Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are by Seth Stephens-Davidowitz

Do parents treat their sons differently from their daughters? Can you game the stock market? How many people actually read the books they buy? The premise of this book is that by using big data we can learn what people actually think, what they really want and what they really do.

Naked Statistics: Stripping the Dread from the Data by Charles Wheelan

How does Netflix know which movies you’ll like? What is causing the rising incidence of autism? This book shows how using the right data and well-chosen statistical tools and models can help us answer such questions and more by focusing solely on the underlying intuitions and concepts that drive statistical analysis.

The Black Swan by Nassim Taleb

Nassim Taleb expertly illustrates how we fool ourselves into thinking we know more than we actually do and how we forget to take into consideration what we don’t know. This is a universal book on probabilities and the impact of highly improbable events.

The Signal and the Noise: Why So Many Predictions Fail – but Some Don't by Nate Silver

Why do many of the predictions people make fail to happen? Nate Silver argues that the reason is that most of us make a poor distinction between probability and uncertainty – we often mistake more confident predictions for more accurate ones. So what can we do better to separate a true signal from noisy data?

Artificial Intelligence: A Guide for Thinking Humans by Melanie Mitchell

Humans dream of superintelligent machines, but how intelligent really are the best AI programs and how do they work? Melanie Mitchell separates science fiction from current AI achievements and provides a clear picture of what the AI field has accomplished so far and how much further it has to go.

Books for Data Scientists on Developing Soft Skills

Some time ago we wrote an article on how, in addition to technical skills, soft skills are important in the data science world. The good news is that you can improve your soft skills by reading books. What follows is a selection of somewhat unusual reads that are definitely worthy of your attention.

Infinite Jest by David Foster Wallace

This is probably the hippest book out there. Many people claim to have read it, but of those, few actually completed it. David Foster Wallace gave his all when writing Infinite Jest. While reading, you will improve not only patience, attention to detail (when you turn the pages in search of the footnote numbers, you’ll understand what I’m talking about), but also storytelling at its best. Infinite Jest challenges you to learn to read in a new way and turns reading into meditation.

We recommend watching this video as well!

Waking Up by Sam Harris

Talking of meditation, I often raised my eyebrows when my meditation teacher started talking about ‘connecting with the universe’, ‘opening up to the universe’ or ‘embracing my inner energy’. It was difficult for an atheist to find meditation techniques that did not have either a religious, or a pseudoscientific element (such as ‘karma’). Then Sam Harris shared his mindfulness technique on the Waking Up app, and wrote the book Waking Up about spirituality without religion. This accessible book gives the word spirituality a whole different meaning.

Why We Sleep by Matthew Walker

Next comes Matthew Walker on the subject of a more balanced life, with his book Why We Sleep. While some of his claims are debated by the academic community, the Lithuanian neuroscientist Dr. Laura Bojarskaitė, who studies sleep, assured me a while ago that the book is well worth reading to get a better understanding of sleep.

Thinking, Fast and Slow by Daniel Kahneman; Behave: The Biology of Humans at Our Best and Worst by Robert Sapolsky; The Idiot Brain by Dean Burnett; The Brain That Changes Itself by Norman Doidge

In order to make better decisions, it is necessary to understand how our brain works and what it is capable of, how we make decisions, and why we act in a certain way. I suggest you start with Thinking, Fast and Slow by Daniel Kahneman. There’s no denying it's a really hard psychology read, but it’s worth persevering with to understand your own behavior patterns (it has also been debated amongst the academic community). And then you can move on to some easier ‘self-help’ reads – Behave: The Biology of Humans at Our Best and Worst by Robert Sapolsky; The Idiot Brain by Dean Burnett and The Brain That Changes Itself by Norman Doidge.

Demon Haunted World by Dr Carl Sagan and The Skeptics' Guide To the Universe by Dr Steven Novella, Bob Novella, Cara Santa Maria, Jay Novella, Evan Bernstein

How can we tell what's real in a world that is increasingly full of the fake? Well, it's complicated but possible. You can start with Demon Haunted World by the legendary astronomer and unsurpassed science communicator Carl Sagan. It’s true that some things in it are outdated (for example, back then the largest mediums were TV and print media), but the general constructiveness of mind and reason makes this book timeless. While The Skeptics' Guide To the Universe might not be as powerful or poetic as Demon Haunted World, it’s nevertheless a great book to help us recognise our biases and logical fallacies, pointing out the things that hinder critical thinking really well.

Feynman by Jim Ottaviani, Leland Myrick

Richard Feynman simultaneously contributed to one of the greatest and most terrible creations of humankind – the atomic bomb. The legacy of which continues to haunt us today in the context of Russia's war in Ukraine... None of this was lost on Feynman and he only reluctantly accepted the Nobel Prize awarded to him in 1965. What he was really good at was teaching and empowering curiosity. There’s a reason why his textbooks are still bestsellers decades after they were first published. Jim Ottaviani and Leland Myrick’s autobiographical graphic novel offers you an insight into Feynman's world in one evening's read.

Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World by Dr. Bruce Schneier

For every data scientist and any other person who uses the Internet, cryptographer and writer Dr. Bruce Schneier's book Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World is crucial reading. How often do you think about the moral aspects of data collecting?

Author: Goda Raibyte