# Data Science Specialization by JHU

**#38**in category Data Science

Learner rating | 9.0 |
---|---|

Content ratings | 9.3 |

Data Science specialization covers concepts & tools for the data science pipeline. Build a product at Capstone Project using real-world data.

## About Data Science Specialization

The data science specialization covers the concepts and tools you’ll need throughout the entire data science pipeline, from asking the right kinds of questions to making inferences and publishing results. In the final Capstone Project, you’ll apply the skills learned by building a data product using real-world data. At completion, students will have a portfolio demonstrating their mastery of the material.

To learn more about how Coursera’s Specialization works? Kindly refer to this link here or visit our FAQ section.

## Syllabus

There are ten-course in this data science specialization of data science, developed and taught by leading professors.

### 1. The Data Scientist Toolbox (Content rating: 96%)

- An introductory course to the main tools and ideas in the data scientist’s toolbox.
- The course provides an overview of the data, questions, and tools that data analysts and data scientists work.
- There are two components to this course: The first is a conceptual introduction to the ideas behind turning data into actionable knowledge. The second is a practical introduction to the tools used in the program like version control, markdown, git, GitHub, R, and RStudio.

### 2. Data Science Specialization: Programming in R (94%)

- Programming in R and use of R for effective data analysis
- Install and configure necessary programming software
- Describe generic programming language concepts implemented in a high-level statistical language.
- The course covers practical issues in statistical computing, which include programming in R, reading data into R, accessing R packages, writing R functions, debugging, profiling R code, and organizing and commenting on R code.

### 3. Getting and Cleaning Data (90%)

- The course covers: How to obtain data from the web, APIs, databases, and colleagues in various formats.
- Basics of data cleaning and how to make data “tidy.”
- Components of a complete data set including raw data, processing instructions, codebooks, and processed data.
- Overall, the course will cover the basics needed for collecting, cleaning, and sharing data.

### 4. Data Science Specialization: Exploratory Data Analysis (94%)

As exploratory techniques are essential for developing more complex statistical models, eliminating or sharpening potential hypotheses about the world can be addressed by the data.

- This course covers the essential exploratory techniques for summarizing data.
- In detail, plotting systems in R and some of the basic principles of constructing data graphics.
- Some of the common multivariate statistical techniques used to visualize high-dimensional data.

### 5. Reproducible Research (92%)

- This course focuses on the concepts and tools behind reporting modern data analyses in a reproducible manner.
- Reproducible research, more generally scientific claims, are published with their data and software code so that others may verify the findings and build upon them.
- Reproducibility makes an analysis more sensible & useful as the analytical data and code are available.
- This course will focus on statistical analysis tools that allow one to publish data analyses in a single document, allowing others to execute the same analysis to obtain the same results easily.

### 6. Statistical Inference (90%)

- Statistical inference is the process of concluding populations or scientific truths from data.
- There are many performing inference modes, including statistical modeling, data-oriented strategies, and explicit use of designs and randomization in analyses.
- Furthermore, there are broad theories (frequentists, Bayesian, likelihood, design-based, …) and numerous complexities (missing data, observed and unobserved confounding, biases) for performing inference.
- A practitioner could be in a debilitating maze of techniques, philosophies, and nuance.
- This course presents the fundamentals of inference in a practical approach to getting things done.
- After taking this course, students will understand the broad directions of statistical inference and use it to make informed choices in analyzing data.

### 7. Regression Models (92%)

Linear models, as their name implies, relate an outcome to a set of predictors of interest using linear assumptions. Regression models, a subset of linear models, are the most important statistical analysis tool in a data scientist’s toolkit.

- This section of the data science online course covers regression analysis, least squares, and inference using regression models.
- Special cases of the regression model, ANOVA, and ANCOVA.
- Investigation of residuals and variability analysis.
- The course will cover modern thinking on model selection and novel uses of regression models, including scatterplot smoothing.

### 8. Data Science Specialization: Practical Machine Learning (88%)

One of the most common tasks performed by data scientists and data analysts is prediction and machine learning.

- This course will cover the basic components of building and applying prediction functions with practical applications.
- Clarity on basic concepts such as training and test sets, overfitting, and error rates.
- We will introduce a range of model-based and algorithmic machine learning methods, including regression, classification trees, Naive Bayes, and random forests.
- Overall, the course will cover the complete process of building prediction functions, including data collection, feature creation, algorithms, and evaluation.

### 9. Developing Data Products (94%)

A data product is the production output from a statistical analysis. Data products automate complex analysis tasks or use technology to expand the utility of a data-informed model, algorithm, or inference.

- This section of the data science online course covers creating data products using Shiny, R packages, and interactive graphics.
- Will focus on the statistical fundamentals of creating a data product, which will tell a story about data to a mass audience.

### 10. Data Science Specialization Capstone Project (95%)

- The capstone project class in the data science online course will allow students to create a usable/public data product that can be used to show their skills to potential employers.
- Projects will be drawn from real-world problems and conducted with industry, government, and academic partners.

*Note: Your review matters*

* If you have already done this course, kindly post your review in our reviews section. It would help others to get useful information and better insight into the course offered.*

FAQ

- What is Coursera’s specialization?
- About our policies and review criteria.
- How can you choose and compare online courses?
- How to add Courses to your Wishlist?
- You can suggest courses to add to our website.

## Description

## About Data Science Specialization

The data science specialization covers the concepts and tools you’ll need throughout the entire data science pipeline, from asking the right kinds of questions to making inferences and publishing results. In the final Capstone Project, you’ll apply the skills learned by building a data product using real-world data. At completion, students will have a portfolio demonstrating their mastery of the material.

To learn more about how Coursera’s Specialization works? Kindly refer to this link here or visit our FAQ section.

## Syllabus

There are ten-course in this data science specialization of data science, developed and taught by leading professors.

### 1. The Data Scientist Toolbox (Content rating: 96%)

- An introductory course to the main tools and ideas in the data scientist’s toolbox.
- The course provides an overview of the data, questions, and tools that data analysts and data scientists work.
- There are two components to this course: The first is a conceptual introduction to the ideas behind turning data into actionable knowledge. The second is a practical introduction to the tools used in the program like version control, markdown, git, GitHub, R, and RStudio.

### 2. Data Science Specialization: Programming in R (94%)

- Programming in R and use of R for effective data analysis
- Install and configure necessary programming software
- Describe generic programming language concepts implemented in a high-level statistical language.
- The course covers practical issues in statistical computing, which include programming in R, reading data into R, accessing R packages, writing R functions, debugging, profiling R code, and organizing and commenting on R code.

### 3. Getting and Cleaning Data (90%)

- The course covers: How to obtain data from the web, APIs, databases, and colleagues in various formats.
- Basics of data cleaning and how to make data “tidy.”
- Components of a complete data set including raw data, processing instructions, codebooks, and processed data.
- Overall, the course will cover the basics needed for collecting, cleaning, and sharing data.

### 4. Data Science Specialization: Exploratory Data Analysis (94%)

As exploratory techniques are essential for developing more complex statistical models, eliminating or sharpening potential hypotheses about the world can be addressed by the data.

- This course covers the essential exploratory techniques for summarizing data.
- In detail, plotting systems in R and some of the basic principles of constructing data graphics.
- Some of the common multivariate statistical techniques used to visualize high-dimensional data.

### 5. Reproducible Research (92%)

- This course focuses on the concepts and tools behind reporting modern data analyses in a reproducible manner.
- Reproducible research, more generally scientific claims, are published with their data and software code so that others may verify the findings and build upon them.
- Reproducibility makes an analysis more sensible & useful as the analytical data and code are available.
- This course will focus on statistical analysis tools that allow one to publish data analyses in a single document, allowing others to execute the same analysis to obtain the same results easily.

### 6. Statistical Inference (90%)

- Statistical inference is the process of concluding populations or scientific truths from data.
- There are many performing inference modes, including statistical modeling, data-oriented strategies, and explicit use of designs and randomization in analyses.
- Furthermore, there are broad theories (frequentists, Bayesian, likelihood, design-based, …) and numerous complexities (missing data, observed and unobserved confounding, biases) for performing inference.
- A practitioner could be in a debilitating maze of techniques, philosophies, and nuance.
- This course presents the fundamentals of inference in a practical approach to getting things done.
- After taking this course, students will understand the broad directions of statistical inference and use it to make informed choices in analyzing data.

### 7. Regression Models (92%)

Linear models, as their name implies, relate an outcome to a set of predictors of interest using linear assumptions. Regression models, a subset of linear models, are the most important statistical analysis tool in a data scientist’s toolkit.

- This section of the data science online course covers regression analysis, least squares, and inference using regression models.
- Special cases of the regression model, ANOVA, and ANCOVA.
- Investigation of residuals and variability analysis.
- The course will cover modern thinking on model selection and novel uses of regression models, including scatterplot smoothing.

### 8. Data Science Specialization: Practical Machine Learning (88%)

One of the most common tasks performed by data scientists and data analysts is prediction and machine learning.

- This course will cover the basic components of building and applying prediction functions with practical applications.
- Clarity on basic concepts such as training and test sets, overfitting, and error rates.
- We will introduce a range of model-based and algorithmic machine learning methods, including regression, classification trees, Naive Bayes, and random forests.
- Overall, the course will cover the complete process of building prediction functions, including data collection, feature creation, algorithms, and evaluation.

### 9. Developing Data Products (94%)

A data product is the production output from a statistical analysis. Data products automate complex analysis tasks or use technology to expand the utility of a data-informed model, algorithm, or inference.

- This section of the data science online course covers creating data products using Shiny, R packages, and interactive graphics.
- Will focus on the statistical fundamentals of creating a data product, which will tell a story about data to a mass audience.

### 10. Data Science Specialization Capstone Project (95%)

- The capstone project class in the data science online course will allow students to create a usable/public data product that can be used to show their skills to potential employers.
- Projects will be drawn from real-world problems and conducted with industry, government, and academic partners.

*Note: Your review matters*

* If you have already done this course, kindly post your review in our reviews section. It would help others to get useful information and better insight into the course offered.*

FAQ

## Specification:

- Coursera
- Johns Hopkins University
- Microdegree
- Self-paced
- Beginner
- 3+ Months
- Free Trial (Paid Course & Certificate)
- English
- Data Analysis Data Science Data Science with 'R' Machine learning

There are no reviews yet.