Data Exploration

One of the first and most important things you should do after obtaining a dataset is to explore the data thoroughly using a combination of summary statistics and visualizations. Careful data exploration will help you do the following:

  1. Understand the dataset,
  2. Discover interesting, sometimes unexpected patterns and trends,
  3. Identify potential sources of problems (e.g. errors, biases, other obstacles to later analysis),
  4. Formulate meaningful questions to ask using the data, and
  5. Choose the most appropriate path of analysis.

In this next section, we will cover a variety of data exploration techniques, starting with descriptive (i.e. summary) statistics, then move on to data visualization (i.e. graphing/plotting).