Professional Certificate in Data Science

Add your review

  • Course platform: EDX
  • Offered by: Harvard University
  • Level: Beginner
  • Price: Paid Course (paid Certificate)
  • Class length: 1 Year, 5 months (2-3 hrs/week)

Add to wishlistAdded to wishlistRemoved from wishlist 1
Add to compare

Harvard Data Science Certificate Program

About Data Science

Data science is a branch of computer science dealing with capturing, processing, and analyzing data to gain new insights about the systems being studied. Data scientists deal with vast amounts of information from different sources and in different contexts, so the processing they must do is usually unique to each study, utilizing custom algorithms, artificial intelligence (AI), machine learning, and human interpretation. It’s a broad field expanding rapidly across many industries, including medicine, astronomy, meteorology, marketing, sociology, visual effects, and much more.

Importance of Data Science

Science is based on gathering evidence and interpreting the evidence to draw logical conclusions. This principle has served civilization well enough to enable trans-Atlantic flights, telephony, disease treatments, landing rovers on the surface of Mars, and much more. In the modern world, a proliferation of data is being gathered. Data about lifestyle habits, dietary preferences, music choices, purchasing habits, energy consumption, weather systems, migratory patterns, seismic activity, flight times, and so much more. Computers are everywhere, so there’s almost constant input into a pool of big data.

What you will learn from this course?

This course has been organized by HarvardX, thus at, we named it a Harvard Data Science course for a clear understanding of viewers. Here, you will learn about:

  • Fundamental R programming skills.
  • Statistical concepts such as probability, inference, and modeling and how to apply them in practice.
  • Gain experience with the tidyverse, including data visualization with ggplot2 and data wrangling with dplyr.
  • Become familiar with essential tools for practicing data scientists such as Unix/Linux, git and GitHub, and RStudio.
  • Implement machine learning algorithms.
  • In-depth knowledge of fundamental data science concepts through motivating real-world case studies.

About Professional certificate Program:

HarvardX requires individuals who enroll in its courses on edX to abide by the terms of the edX admiration code. HarvardX will take appropriate corrective action in response to violations of the edX honor code, which may include dismissal from the HarvardX course; revocation of any certificates received for the HarvardX course or other remedies as circumstances warrant. No refunds will be issued in the case of corrective action for such violations. Enrollees who are taking HarvardX courses as part of another program will also be governed by the academic policies of those programs.

About the instructors:

Rafael Irizarry

-Professor of Biostatistics at Harvard University

Rafael Irizarry is a Professor of Biostatistics at the Harvard T.H. Chan School of Public Health and a Professor of Biostatistics and Computational Biology at the Dana Farber Cancer Institute. For the past 15 years, Dr. Irizarry’s research has focused on the analysis of genomics data. During this time, he has also has taught several classes, all related to applied statistics.


There are 9 Courses in this Harvard Data Science certificate program:

1. Data Science: R Basics

The demand for skilled data science practitioners in industry, academia, and the government is rapidly growing. The Harvard Data Science Series prepares you with the required knowledge base and skills to tackle real-world data analysis challenges.

In this course you will able to learn:

  • R Basics, Functions, and Data Types- You will learn R’s functions and Datatypes.
  • Vectors and Sorting- You will learn to operate on vectors and advanced functions such as sorting.
  • Indexing, Data Manipulation, and Plots- You will learn to wrangle, analyze, and visualize data.
  • Programming Basics- You will learn to use general programming features like ‘if-else’, and ‘for loop’ commands.

2. Data Science: Visualization

The growing availability of informative datasets and software tools has led to increased reliance on data visualizations across many industries, academia, and government. Data visualization provides a powerful way to communicate data-driven findings, motivate analyses, or detect flaws.

In this Harvard Data Science Certificate Program, you will cover the following points:

  • Introduction to Data Visualization and Distributions- You will introduce about data visualization and distributions in R.
  •  Introduction to ggplot2- You will learn how to use ggplot2 to create plots.
  • Summarizing with dplyr- You will learn how to summarize data using dplyr.
  • Gapminder- You will see examples of ggplot2 and dplyr in action with the Gapminder dataset.
  •  Data Visualization Principles- You will learn general principles to guide you in developing effective data visualizations.

3. Data Science: Probabilit

Probability theory is the mathematical foundation of statistical inference which is indispensable for analyzing data affected by chance, and thus essential for data scientists.

To understand data science probability you must need the following points:

  • Discrete Probability- You will learn about the basic principles of probability related to categorical data using card games as examples.
  • Continuous Probability- You will learn about the basic principles of probability related to numeric and continuous data.
  • Random Variables, Sampling Models, and the Central Limit Theorem- You will learn about random variables numeric outcomes resulting from random processes, and the Central Limit Theorem, which applies to large sample sizes.
  • The Big Short- You will learn how interest rates are determined.

4. Data Science: Inference and Modeling

Statistical inference and modeling are indispensable for analyzing data affected by chance, and thus essential for data scientists. In this course, you will learn these key concepts through a motivating case study on election forecasting.

In this course you will be able to learn the below points:

  •  Parameters and Estimates- You will learn how to estimate population parameters.
  • The Central Limit Theorem in Practice- You will be relevant to the central limit theorem to assess how close a sample estimate is to the population parameter of interest.
  • Confidence Intervals and p-Values- You will learn how to calculate confidence intervals and learn about the relationship between confidence intervals and p-values.
  • Statistical Models- You will learn about statistical models in the context of election forecasting.
  • Bayesian Statistics- You will learn about Bayesian statistics by looking at examples from rare disease diagnosis and baseball.
  • Election Forecasting- You will learn about election forecasting, building on what you’ve learned in the previous sections about statistical modeling and Bayesian statistics.
  • Association Tests- You will learn how to use association and chi-squared tests to perform inference for binary, categorical, and ordinal data through an example looking at research funding rates.

5. Data Science: Productivity Tools

A typical data analysis project may involve several parts, each including several data files and different scripts with code. Keeping all these organized can be challenging.

In this Harvard Data Science Certificate Program, you will able to learn:

  • Installing Software- You will learn how to install R, R Studio, git, create a GitHub account, and connect these tools to each other
  • Unix- You will learn the basics of the file system, the terminal, and Unix commands and conceptually how these commands work within your filesystem
  • Reproducible Reports- You will learn the tools to create beautiful and easy to edit data science reports.
  • Git and GitHub- You will learn to clone and create version-controlled GitHub repositories using the command line.
  • Advanced Unix- You will learn other Unix commands that will increase your productivity as a data scientist.

6. Data Science: Wrangling

In the data science project, the data is easily accessible. It’s more probable for the data to be in a file, a database, or extracted from documents such as web pages, tweets, or PDFs. In these cases, the first step is to import the data into R and tidy the data, using the tidyverse package.

In this course you will be able to learn the below points:

  • Data Import- You will learn how to import various types of data into R.
  • Tidy Data- You will learn how to convert data from raw into a tidy form.
  • String Processing- You will learn how to process strings using regular expressions (regex).
  • Dates, Times, and Text Mining- You will learn how to work with dates and times as file formats and how to mine the text for analysis.

7. Data Science: Linear Regression

Linear regression is commonly used to quantify the relationship between two or more variables. It is also used to adjust for confounding. This course, part of our Professional Certificate Program in Data Science, covers how to implement linear regression and adjust for confounding in practice using R.

To understand data science linear regression you just need the following points:

  •  Introduction to Linear Regression- In this course you will learn the basics of the linear regression through this course’s motivating example, the data-driven approach used to construct baseball teams.
  • Linear Models- In this course, you will learn about linear models, least squares estimates, multivariate regression, and several useful features of R
  • Confounding-  In this course, you will learn about confounding and several reasons that correlation is not the same as causation, such as spurious correlation, outliers, reversing cause and effect, and confounders.

8. Data Science: Machine Learning

In this course, you will learn about how to use R to build a movie recommendation system using the basics of machine learning, the science behind the most popular and successful data science techniques.

In this Harvard Data Science Certificate Program, you will cover the following points:

  • Introduction to Machine Learning- In this course you will be introduced to some of the terminology and concepts you will need going forward.
  • Machine Learning Basics- You will learn how to start building a machine learning algorithm using training and test data sets and the importance of conditional probabilities for machine learning.
  • Linear Regression for Prediction, Smoothing, and Working with Matrices- You will learn why linear regression is a useful baseline approach but is often insufficiently flexible for more complex analyses, how to smooth noisy data.
  • Distance, Knn, Cross-Validation, and Generative Models- In this course, you will learn different types of discriminative and generative approaches for machine learning algorithms.
  • Classification with More than Two Classes and the Caret Package- you will learn how to overcome the curse of dimensionality using methods that adapt to higher dimensions and how to use the caret package to implement many different machine learning algorithms.
  • Model Fitting and Recommendation Systems- In this course, you will learn how to apply machine learning algorithms.

9. Data Science: Capstone

This course is very different from the previous courses in the series. Unlike the rest of the courses in the Professional Certificate Program, you will receive much less guidance from the instructors. You will show what you’ve learned so far by working independently on data science projects of your own.

In this course you will able to learn:

  • Movielens Project (all learners)- In this course, open to all learners, you will do a short preparatory quiz to familiarize yourself with the dataset you’ll be using and then complete a project using a dataset from Movielens.
  • Choose-Your-Own Project (Verified learners only)-in open to Verified learners only, you’ll work on your own project using a dataset of your choosing.


There are no prerequisites for the first course, but the later courses assume knowledge from the prior courses in the series.

Note: Your review matters

If you have already done this course, kindly drop your review in our reviews section. It would help others to get useful information and better insight into the course offered.


Specification: Professional Certificate in Data Science

Course Platform EDX
Level Beginner
Class length <2 years
Program details Professional certificate
Enrollment Paid Course (paid certificate)
Course Subjects Data Analytics, Data Science, Data Science with 'R', Machine learning
Offered by University Harvard University

Videos: Professional Certificate in Data Science

User Reviews

0.0 out of 5
Write a review

There are no reviews yet.

Be the first to review “Professional Certificate in Data Science”

Your email address will not be published. Required fields are marked *

Professional Certificate in Data Science
Professional Certificate in Data Science
Compare items
  • Total (0)