Welcome!

This is the in-progress working draft of a revamp of the STAT 240 course notes. Significant changes have been made to the order of topics, degree of coverage, and examples employed. Currently, this is intended to supplement rather than replace the existing body of STAT 240 notes.

Prerequisites/scope

For these notes, no prior R or computer science knowledge is assumed; everything is taught from scratch.

As you consult these notes, please keep in mind the aim is NOT to teach you everything you need to know about each topic, but rather equip you with a foundational understanding and encourage you to learn and explore further on your own. As such, we will typically only cover the basic usage of most operations and demonstrate a few key examples, leaving the details for you to practice.

Also note occasionally some bonus/extra/aside content considered advanced knowledge may be mentioned in passing for sake of completeness in discussing a topic, but these are considered outside the scope of what you need to know.

How to use this book

Here’s a few tips on how to get the most out of this book.

Organization

These notes are loosely organized into the following order of topics:

Setup,
R crash course, to rapidly bring you up to speed on basic R usage,
Data exploration, to introduce you to data exploration in R,
Data transformation, to demonstrate common data cleaning techniques,
Probability theory (in progress), to introduce basic probability theory,
Inference (to be added soon), to teach foundational inference techniques, specifically:
1. Inference on means,
2. Inference on proportions,
3. Inference in regression.

There are also some appendices with additional info:

Datasets: info on the sources and preprocessing done for data set examples,
Cheat sheets: list of cheat sheets for various packages/programs used in the notes.

Notes layout

First, note the table of contents on the left and chapter navigation bar on the right of each page. Use these to quickly navigate around the notes. Also note the search bar in the top corner; use this to search and highlight keywords across the entire site. On smaller phone screens these elements may collapse, but they should be fully visible on wider laptop/tablet screens.

The notes pages are mostly composed of paragraph of text (like this one), and code chunks (see below). You will also occasionally encounter additional reference links, embedded images, footnotes with extra info, tables, and other elements.

The block below is a code chunk. They are frequently annotated with comments. Note you can copy the contents of a chunk using the clipboard icon in the corner. Also note functions automatically link to their help pages with usage notes, argument explanations, and examples.

# this is a code chunk; lines starting with # are comments
# R code in here will be run and output shown below
print("Hello world!")

[1] "Hello world!"

Important notes, often warning you against common mistakes/errors, will appear in yellow alert boxes.

Tips on improving your R understanding or optimizing your workflow will sppear in green alert boxes.

Source code

These notes are open-sourced on GitHub and built using bookdown and served by GitHub pages, which provides a convenient, easily editable, and reproducible workflow. Each page has a link to “View source” of the page in the right-side navbar, if you want to see what’s under the hood.

Note the code base is primarily written in R Markdown syntax, which may include ordinary text, markdown code, YAML headers, R chunks, knitr tweaks, \(\LaTeX\) formulae, and pandoc elements (especially fenced divs and braced attributes). In some auxiliary files, you may even find HTML/CSS/jQuery snippets. These are obviously not made for you to read/understand, so browse at your own curiosity.

Contributing

We work hard to avoid errors, but alas nothing is perfect! If you notice any errors, please consider contributing a suggestion! You can do this in 2 ways. (Note: both ways require a GitHub account, so make sure to sign in or register first!¹)

Directly propose a change in a GitHub pull request:
1. On the page with the error, click the “Edit this page” in the right-side navbar.
2. If this is your first time contributing to this project, you will be asked to “Fork this repository”, i.e. make a copy.
3. After forking, make your edits in the text editor window that appears and click “Commit changes…”. Make sure to add a brief, descriptive title, as well as any additional necessary details in the description box. Note the description box supports markdown syntax.
4. Next, click “Propose changes”, then click “Create pull request”. Again, make sure you have a good title and description, then click “Create pull request” again. Make sure to leave “Allow edits by maintainers” checked, so I can modify your edit if I want!
5. You can check the status of your pull request (PR) in the PR tab of the repo.
  - If the PR looks perfect, I may immediately merge it.
  - If the PR is good but not perfect, I may make further comments/edits before eventually merging.
  - If the PR isn’t up to par for some reason, I may discuss it more, ask follow up questions, or simply close it. If I close your PR, don’t be discouraged! You’re welcome to make further contributions, just make sure you understand why I didn’t merge it and try to make a better PR next time!
If that seems like too much work, you can also simply raise an issue and point me to the error. Note since this is more work for me, it will usually have lower priority than a well-written PR, which can be easily merged with a single click.

Acknowledgements

This is a good time to acknowledge people that have made contributions. Bret Larget is the original creator of STAT 240 and author of the first set of STAT 240 notes, which is a primary source of inspiration for many aspects of these notes. Cameron Jones has also agreed to help write some practice materials as these notes evolve. Beyond that, thanks also to @jennamotto1 for also contributing to the repo (make a successful PR to get your name on this list!).

Future ideas

Below is a list of additional ideas for future improvements to these notes, to be considered for implementation at an unspecified future time (not to be prioritized over finishing the first-pass writeup).

dark mode?
add exercises to each page
automagic index generator using _common.R?
glossary?
DT datatable fancy printouts??

1 Setup