Summarizing Categorical Data

class date

September 2, 2022

Code
library(tidyverse)
library(infer)
data(gss)

ggplot(gss, aes(x = partyid, 
                fill = college)) +
    geom_bar(position = "dodge")

Code
ggplot(gss, aes(x = partyid, 
                fill = college)) +
    geom_bar(position = "fill") +
    labs(y = "proportion")

Question 1

Which of the following is FALSE regarding a contingency table?

( ) It has the levels of one variable along the top of the columns and the levels of a second variable down the left side of the rows. ( ) It displays the counts that occur in every combination of levels of two categorical variables. ( ) They are used to summarize the information found in a data frame. (X) They are interchangable with a data frame.

Question 2

The following questions deal with data from the General Social Survey that takes a representative sample of Americans and asks them a range of questions about their opinions and lifestyle. One of the questions asked about whether they had a college degree or not. The other asked the political party that they identify with: dem (democrat), ind (independent), rep (republican), or other.

Question 2.1

What was the most frequently cited party that respondents identified with?

( ) dem (X) ind ( ) rep ( ) other ( ) no degree ( ) degree

Question 2.2

True or False: most of the survey respondents had a college degree.

( ) True (X) False

Question 3

The same data can be re-expressed as a normalized stacked bar chart.

Which of the following statements is True?

  1. This plot shows a weak association between party id and whether or not someone has a college degree. ( ) This plot shows absolutely no association between party id and whether or not someone has a college degree. ( ) This plot shows a very strong association between party id and whether or not someone has a college degree.