Introduction
This post is for the week 1 assignment of the Coursera course Data Analysis Tools by Wesleyan University. It’s the 2nd course for the Data Analysis and Interpretation specialization.
I will use the NESARC dataset. Participants are all American Adults.
“The NESARC is a representative sample of the United States population and 43,093 Americans participated in the first Wave of the survey. During Wave 2, an attempt to re-interview all 43,093 of these respondents will be made. The target population of the NESARC is the non-institutionalized household population, 18 years and older, residing in the United States including the District of Columbia, Alaska, and Hawaii.” – according to the webpage of Population Studies Center, University of Michigan
Research question
Among American adults who had used cocaine , does the average amount of daily cocaine use when using the most differ between different groups of people with different history of parents’ drug use problems?
The null hypothesis: The means of amount of daily cocaine use are all the same between groups.
The alternative hypothesis: The means of amount of daily cocaine use are not all the same between groups.
Codebook
The response variable I am going to use is S3BQ7B (the number of grams of cocaine usually used in a day when using cocaine the most).
Original variables:
The explanatory variable here is parents’ history of drug use, represented by secondary variable PARENTSGROUP. I defined three groups:
group 1: both mother and father never had drug problems
group 2: one parent had drug problems while the other parent never did
group 3: both father and mother had drug problems
Original variables:
CODE
Results
We can see that p=0.6104>0.05, so the differences between group means are not statistical significant. The null hypothesis cannot be rejected. Among those who had used cocaine, the amount of cocaine use between people with different family history of parents drug problems.
Post hoc test:
From the table above, we can also see that all group means are not significantly different.
Limitation
The sample size is too small – only 50 observations were used. The reason is that the majority of people were missing data for those questions. Actually, with such a mall sample size, I shouldn’t have used ANOVA. I think this is something to notice in the future research – always examine whether the assumptions of ANOVA are met first.