Data Exploration
One of the first and most important things you should do after obtaining a dataset is to explore the data thoroughly using a combination of summary statistics and visualizations. Careful data exploration will help you do the following:
- Understand the dataset,
- Discover interesting, sometimes unexpected patterns and trends,
- Identify potential sources of problems (e.g. errors, biases, other obstacles to later analysis),
- Formulate meaningful questions to ask using the data, and
- Choose the most appropriate path of analysis.
In this next section, we will cover a variety of data exploration techniques, starting with descriptive (i.e. summary) statistics, then move on to data visualization (i.e. graphing/plotting).