The Data Scientist.

“The Data scientist is someone who can obtain, scrub, explore, model, and interpret data, blending hacking, statistics, and machine learning. Data scientists not only are adept at working with data, but appreciate data itself as a first-class product.”

Hillary Mason


Do we all know who a scientist is? A scientist is a person who conducts scientific research to advance knowledge in an area of interest. In our case, the field of interest is Data hence a Data Scientist. Data science is generally a new field in the 21st century, being the age of information data is produced in large amounts. on a single day, we produce over 2.4 quintillion bytes of data in research conducted in 2020.

With this amount of data, the need for specialized persons rose The Data Scientist. Today we take a look at what it is like to be a data scientist and what the world of data entails.

The Data.

We need to understand that for Data Scientists to be there they need data. What is Data? Data is the representation of facts, concepts, or instructions in a formalized manner, which should be suitable for communication, interpretation, or processing by a human or electronic machine. In simple terms, we can define data as processed information.

Data is produced from the simplest task we perform like sending a tweet on Twitter a Facebook post we create to the data collected in cancer treatment. With this data, we can answer simple and even complex answers that end up saving lives.

Data Science.

It involves the use of scientific methods, processes, analysis, algorithms, and systems to extract information from data and create insights. It is the art of telling a story from raw data. Data science involves thinking of a problem, getting the right data for the problem, cleaning the data, pre-processing the data, modeling the data, visualizing the data getting the relationship from the data, and telling a story from the data that solves the problem.

The Scientist.

“Being a data scientist is not only about data crunching. It’s about understanding the business challenge, creating some valuable actionable insights to the data, and communicating their findings to the business.”

— Jean-Paul Isson.

The work of a Data Scientist is shown in the flow chart. This shows the tasks that a data scientist does.

To define who a Data Scientist is we shall use three words.

  1. Collect
  2. Analyze
  3. Interpret

What is it that is collected analyzed then interpreted? Data is the playground of a Data Scientist.

Collecting the right data.

It is the first task you do. Data contains a lot, and it can tell stories. By the time we get to collect the data, we must have thought of a problem we intend to solve. Therefore we need the right data to solve our specific problem. Collecting data involves gathering, storing, accessing, and using the original information.

Data is collected in two methods based on quality and quantity. We have qualitative data and quantitative data. Qualitative data is descriptive information about characteristics that are difficult to define or measure and cannot be expressed numerically. Quantitative data is numerical information that can be measured or counted.

Analyze The Data.

Data Analysis is a process that involves cleaning the data, preprocessing the data, and modeling the data. A Data Scientist inspects, cleans, transforms, and models the data, to discover useful information, to draw conclusions, that support the decision they make.

The need for data analysis

Data analytics enables both prediction and knowledge discovery of capabilities.

Data analysis helps us to understand the customer’s needs this makes marketing campaigns more customer-oriented and improves customer satisfaction.

Data analysis enables us to discover and understand patterns in the data. Knowing the relationship between the past and existing data to make more effective decisions.

Interpret the Data.

This is the last task a data scientist undertakes. The first idea of getting a data set is to get insights from the data. According to Data pine, “Data interpretation refers to the implementation of processes through which data is reviewed to arrive at an informed conclusion. The interpretation of data assigns a meaning to the information analyzed and determines its signification and implications”

Data visualization by using python tools and libraries is an important task that helps one to see relationships in the data in dimensional form.


In conclusion, a data scientist is a person who can obtain, scrub, explore, model, and interpret data by blending in programming, statistics, and machine learning. A Data scientist appreciates data as a product and works on solving a problem with the data at hand.

I hope, you liked our article. Share your feedback through comments.





Intensive training for a career in artificial intelligence and machine learning.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Open Book: Qiang Zhu, Head of Data Science & Analytics

Data Science Summit 2018 hosted by IPL School of Data Science: A Glance

Reflecting and Comparing different Sentiment Classification Models for Restaurant Reviews

Statistics for Machine Learning — I

Image of a Graph

Extracting data from semi-structured tweets using Pandas and regex

The Definitive Guide to Delivering Value from Data using DataOps

ROK Beats Niche: A Summary of a Coffee Grinder Comparison

Azure Databricks (part 2/2)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Africa Data School

Africa Data School

Intensive training for a career in artificial intelligence and machine learning.

More from Medium

Content Summary  —  All About Data Science

What is statistics and why should you learn it?

Guiding Principles For Data Science | Peak Indicators

Data Science, Big Data, Data Analytics, and Data Mining: Understanding the Terms