数据分析习题1(3)

来源：六九路网

Question 1

Consider the table below describing a data set of individuals who have registered to volunteer at a public school. Which of the choices below listscategorical variables?

Your Answer

Score

number of siblings and year born Inorrect 0.00 Thname and number of siblings

annual income and phone number

phone number and name

Total 0.00 / 1.00

Question ExplanationThis question refers to the following learning objective(s):

Identify variables as numerical and categorical.

• If variable is numerical, further classify as continuous or discrete based on whether or not the variable can take on an infinite number of values or only

non-negative whole numbers, respectively.

• If variable is categorical, determine if it is ordinal based on whether or not the levels have a natural ordering.

Question 2

The General Social Survey conducted annually in the United States asks how many friends people have and how they would rate their happiness level (very happy, pretty happy, not too happy). In order to evaluate the relationship between these two variables a researcher calculates the average number of friends for people who categorize themselves as very happy, pretty happy, and not too happy. Which of the following correctlyidentifies the variables used in the study as explanatory and response?

Your Answer Score Explanation

explanatory:number friends of Inorrect 0.00 Having more friends might cause people to be hapeople to have more friends. So we can’t easily response: happiness level (categorical with 3 levels) explanatory and which the response based on whHowever in this particular analysis the happiness we first divide the data into groups based on thisstatistics of number of friends of people who fall number of friends is the response variable. explanatory:number friends

response: very happy, pretty happy, not too happy

explanatory:very happy,

pretty happy, not too happy response: number of friends

explanatory:happiness level (categorical with 3 levels) response: number of friends

Total 0.00 / 1.00

Question ExplanationThis question refers to the following learning objective(s):

Identify the explanatory variable in a pair of variables as the variable suspected of affecting the other, however note that labeling variables as explanatory and response does not guarantee that the relationship between the two is actually causal, even if there is an association identified between the two variables.

Question 3

Past research suggests that students who study with fewer distractions (internet, cell phone, etc.) tend to get higher grades. Which of the following is the best scenario for being able to generalize this finding to the population of all students?

Your Answer Score Explanation

None of the students in the sample has any misdemeanors; their answers can’t be trusted.

A student list for the college is obtained and students are Correct 1.00 randomly selected from the list, and all selected students participate in the study.

Random samplin

the population a

simple random s

A survey is emailed to all registered students, and the results are based on the sample of returned surveys.

Sample only includes students who are in classes that the researcher teaches.

Total 1.00 / 1.00

Question ExplanationThis question refers to the following learning objective(s):

Classify a study as observational or experimental, and determine whether the study’s results can be generalized to the population and whether they suggest

correlation or causation.

• If random sampling has been employed in data collection, the results should be generalizable to the target population.

• If random assignment has been employed in study design, the results suggest causality.

Question 4

A school district is considering whether it will no longer allow students to park at school after two recent accidents where students were severely injured. As a first step, they survey parents of high school students by mail, asking them whether or not the parents would object to this policy change. Of 5,799 surveys that go out, 1,209 are returned. Of these 1,209 surveys that were completed, 926 agreed with the policy change and 283 disagreed. Which of the following statements is the most plausible?

Your Answer Score

It is possible that 80% of the parents of high school students disagree with the policy change.

The survey is unlikely to have any bias because all parents were mailed a Inorrect 0.00 survey. The school district has strong support from parents to move forward with the

policy approval.

Total 0.00 / 1.00

Question ExplanationThis question refers to the following learning objective(s):

Question confounding variables and sources of bias in a given study.

Question 5

As part of a statistics project, Andrea would like to collect data on household size in her city. To do so, she asks each person in her statistics class for the size of their household, and reports that her sample is a simple random sample. However, this is not a simple random sample. Which of the following is the best reasoning for why this is not a random sample that is appropriate for this research question?

Your Answer

Andrea did not block for any variables that might influence the response.

Andrea asked everybody in her class instead of asking her classmates to volunteer.

In this investigation of household size, each household represents a case. Andrea incorrectly sampleindividuals instead of households.

Andrea did not use a random number table to randomize the order in which she collected thstudents’ responses, so the sample cannot be random.

Total

Question ExplanationThis question refers to the following learning objective(s):

Distinguish between simple random, stratified, and cluster sampling, and recognize the benefits and drawbacks of choosing one sampling scheme over another.

Question 6

True or False: Stratified sampling allows for controlling for possible confounders in the sampling stage, while blocking allows for controlling for such variables during random assignment.

Your Answer

Score Explanation

False Inorrect 0.00 Stratifying and blocking both allow for controlling for potential conf

study design. We stratify when we sample (divide population into st

stratum), and block in the process of random assignment (divide samfrom within each block to treatment groups).

True

Total 0.00 / 1.00

Question ExplanationThis question refers to the following learning objective(s):

Identify the four principles of experimental design and recognize their purposes:

• control any possible confounders,

• randomize into treatment and control groups,

• replicate by using a sufficiently large sample or repeating the experiment, • block any variables that might influence the response.

Question 7

Which of the below data sets has the lowest standard deviation? You do not need to calculate the exact standard deviations to answer this question.

Your Answer Score Explanation

100, 100, 100, 100, 100, 100, Correct 1.00 101

The dataset with the most repeated observatiolowest standard deviation.

0,1,2,3,4,5,6

0, 25, 50, 100, 125, 150, 1000

0,1,3,3,3,5,6

Total 1.00 / 1.00

Question ExplanationThis question refers to the following learning objective(s):

Note that there are three commonly used measures of center and spread:

• center: mean (the arithmetic average), median (the midpoint), mode (the most frequent observation)

• spread: standard deviation (variability around the mean), range (max-min), interquartile range (middle 50% of the distribution)

Question 8

True or False: The statistic mean/median (mean divided by median) can be used as a measure of skewness (either right or left). If this statistic is less than 1, the distribution is most likely left skewed.

Your Answer

Score Explanation

False

True Correct 1.00 In a left skewed distribution the median is greater than the meamean/median to be less than 1.

Total 1.00 / 1.00

Question ExplanationThis question refers to the following learning objective(s):

Identify the shape of a distribution as symmetric, right skewed, or left skewed, and unimodal, bimodoal, multimodal, or uniform.

Question 9

Based on the relative frequency histogram below, which of the following statements is supported by the plot?

Your Answer Score Explanation

The IQR of the distribution is roughly 10.

The mean of the distribution is smaller than its median.

The

distribution is multimodal.

It is not

possible estimate median without knowing

to the

the

sample size.

There are no Inorrect 0.00 outliers in the distribution.

Using the relative frequency histogram, we can tell that 10% of

bin), 40% are between 5 and 10, 20% are between 10 and 15, a

Q1 is in the second bin (between 5 and 10) and Q3 is in the fou

confirms that the IQR is roughly 10. Using this same approach w

second bin, therefore we don’t need to know the sample size observations more than 1.5×IQR below the first quartile, but th1.5×IQR above the third quartile, therefore there are indeed ou

Total 0.00 / 1.00

Question ExplanationThis question refers to the following learning objective(s):

Use histograms and box plots to visualize the shape, center, and spread of numerical distributions, and intensity maps for visualizing the spatial distribution of the data.

Question 10

A recent housing survey was conducted to determine the price of a typical home in a city that is mostly middle-class, with one very expensive suburb. The mean price of a house in this city is roughly $650,000. Which of the following statements is most likely to be true?

Your Answer Score Explanation

There are about as many houses in this city that cost more than $650,000 than less than this amount.

Majority of houses in this city Correct 1.00 cost less than $650,000.

Since the city is mostly middle-class, with one

expect the distribution to be right skewed, and

than the median. Since 50% of observations fa

observations (i.e. majority) will cost less than $

We need to know the standard deviation question

answer

this

Majority of houses in this city cost more than $650,000.

Total 1.00 / 1.00

Question ExplanationThis question refers to the following learning objective:

Define a robust statistic (e.g. median, IQR) as a statistics that is not heavily affected by skewness and extreme outliers, and determine when such statistics are more appropriate measures of center and spread compared to other similar statistics.

Question 11

Phi Delta Kappa (PDK) is an international professional organization for educators that, in collaboration with Gallup, has been conducting polls on the public’s attitudes toward the public schools since 1969. The following was one of the questions on the 2011 poll:

”Most teachers in the nation now belong to unions or associations that bargain over salaries, working conditions, and the like. Has unionization, in your opinion, helped, hurt, or made no difference in the quality of public school education in the United States?”

The respondents’ answers broken down by party affiliation are shown below. Which of the following statements is most justified by these data?

Your Answer Score Explanation

14% of Republicans and 58% of Democrats

think

that

teachers

belonging to unions or bargaining associations helped the quality of public school education in the United States.

A histogram or a box plot would be useful for investigating if distribution of opinion on teachers belonging to unions or bargaining associations varies by political party affiliation.

The results of the survey suggest a Correct 1.00 relationship between opinion on teachers belonging to unions or bargaining associations and political party affiliation.

35/290 ≈ 12% of Republicans, 146/341

20% of Independents think that teachers

associations helped the quality of public

Since there is considerable differences b

of the survey suggest a relationship betw

unions or bargaining associations and po

The results of the survey suggest that opinion on teachers belonging to unions or bargaining associations and political party affiliation appear to be independent.

Total 1.00 / 1.00

Question ExplanationThis question refers to the following learning objective(s):

Use contingency tables and segmented bar plots or mosaic plots to assess the relationship between two categorical variables.

Question 12

In 1948, Austin Bradford Hill, designed a study to test a new treatment for tuberculosis that at the beginning of the study there was no evidence whether it would be any better or worse than bed rest. He randomly assigned some patients who volunteered to be a part of this study to receive the treatment Streptomycin, an antibiotic. The other patients received only bed rest as the control group. Hill then observed the patients’ outcomes: which patients died and which recovered. The results of the study are shown below.

We use the following simulation test if there is a difference between the recovery rates under the two treatments: We write “died” on 18 index cards and “survived” on index cards to indicate whether or not a patient died. Next, we shuffle the cards and deal them into two groups of 52 and 55, for control and treatment, respectively. We then calculate the simulated difference between the recovery rates in Streptomycin and control groups (p̂Streptomycin − p̂Control),

and record this value. We repeat this simulation 100 times. The histogram below shows the distribution simulated difference between the recovery rates in these 100 simulations.

Which of the following is correct? Choose all that apply (there are multiple correct answers).

Your Answer Score Explanation

The conclusion of this study is Correct generalizable to all 0.11 Since the sample is comprised of volunteers, we tuberculosis patients. tuberculosis patients. The alternative hypothesis Correct should be that there is a difference

between

the

0.11 The evidence could go either way so we should ctwo treatments.

recovery rates under the two treatments.

Streptomycin treatment Inorrect 0.00

The observed difference betwe

appears to be effective in treating tuberculosis since the observed

difference

isp^Streptomycin−p^control=5155−

There is 1 simulation where the simulated differe

recovery rates would be considered unusual based on the simulation results.

two sided hypothesis test, the p-value is 0.01×low.

Based on this study we can Correct conclude a causal relationship between Streptomycin and better tuberculosis recovery rate.

0.11 Also, since this is an experiment we can deduce

Streptomycin treatment does Inorrect 0.00 not appear to be effective in treating tuberculosis since the The observed difference betweisp^Streptomycin−p^control=5155−There is 1 simulation where the simulated differeobserved number of deaths in the treatment group would not be considered unusual based on the simulation results. two sided hypothesis test, the p-value is 0.01×low, and hence we would reject the null hypothesuggest a difference between the two treatmentThe alternative hypothesis is Inorrect 0.00 that the Streptomycin The evidence could go either way so we should ctwo treatments. treatment is more effective than bed rest. If Streptomycin and bed rest Inorrect 0.00 are equally effective in curing tuberculosis, the probability of observing a difference in the recovery rates at least as high as the one observed is 2%.

The observed difference betwe

isp^Streptomycin−p^control=5155−

There is 1 simulation where the simulated differe

two sided hypothesis test, the p-value is 0.01×

The difference between the Correct survival rates in the control and treatment groups appear to be simply due to chance. 0.11 The observed difference betweisp^Streptomycin−p^control=5155−There is 1 simulation where the simulated differetwo sided hypothesis test, the p-value is 0.01×low, and hence not due to chance. Hill’s study is observational. Correct 0.11 No, this is an experiment. Total 0.56 / 1.00

Question ExplanationThis question refers to the following learning objective:

Note that an observed difference in sample statistics suggesting dependence between variables may be due to random chance, and that we need to use hypothesis testing to determine if this difference is too large to be attributed to random chance. Set up null and alternative hypotheses for testing for independence between variables, and evaluate the data support for these hypotheses using a simulation technique.

因篇幅问题不能全部显示，请点此查看更多更全内容

查看全文