About this course
In this course, part of the Professional Certificate Program in Data Science, you will cover several standard steps of the data wrangling process like importing data into R, tidying data, string processing, HTML parsing, working with dates and times, and text mining. Rarely are all these wrangling steps necessary in a single analysis, but a data scientist will likely face them all at some point.
Very rarely is data easily accessible in a data science project. It’s more likely for the data to be in a file, a database, or extracted from documents such as web pages, tweets, or PDFs. In these cases, the first step is to import the data into R and tidy the data, using the tidy verse package. The steps that convert data from its raw form to the tidy form is called data wrangling.
This process is a critical step for any data scientist. Knowing how to wrangle and clean data will enable you to make critical insights that would otherwise be hidden.
This is the sixth course in the HarvardX Data Science Professional Certificate Series. It is strongly recommended that you take the first five courses in the series before taking this course. At a minimum, you should have taken Data Science: R Basics.
Do I have to take the courses in sequence?
The courses in the HarvardX Data Science Professional Certificate are designed to be taken in the following order:
- R Basics
- Inference and Modeling
- Productivity Tools
- Linear Regression
- Machine Learning
Each subsequent course assumes familiarity with the content in the preceding courses. Depending on your experience with data science generally and R specifically, you may be able to take the courses out of sequence if you choose.
What you will learn from this Data Wrangling Course?
- Importing data into R from different file formats.
- Web scraping.
- How to tidy data using the tidy verse to better facilitate analysis.
- String processing with regular expressions (regex).
- Wrangling data using dplyr.
- How to work with dates and times as file formats.
- Text mining.
Introduction and Welcome
- Welcome to Data Science: Wrangling!
- Important Pre-Course Survey
Section 1: Data Import
1.1: Data Import
Section 2: Tidy Data
2.1: Reshaping Data
2.2: Combining Tables
2.3: Web Scraping
Section 3: String Processing
3.1: String Processing Part 1
3.2: String Processing Part 2
3.3: String Processing Part 3
Section 4: Dates, Times, and Text Mining
4.1: Dates, Times, and Text Mining
Comprehensive Assessment and Course Wrap-up
- Comprehensive Assessment: Puerto Rico Hurricane Mortality
Note: Your review matters
If you have already done this course, kindly drop your review in our reviews section. It would help others to get useful information and better insight into the course offered.
- Harvard University
- Online Course
- 1-3 Months
- Free Course (Affordable Certificate)
- Data Analysis Data Science Data Science with 'R' Machine learning Probability Web Scraping