Research Project(5): run ANOVA in SAS


This post is for the week 1 assignment of the Coursera course Data Analysis Tools by Wesleyan University. It’s the 2nd course for the Data Analysis and Interpretation specialization.

I will use the NESARC dataset. Participants are all American Adults.

“The NESARC is a representative sample of the United States population and 43,093 Americans participated in the first Wave of the survey. During Wave 2, an attempt to re-interview all 43,093 of these respondents will be made. The target population of the NESARC is the non-institutionalized household population, 18 years and older, residing in the United States including the District of Columbia, Alaska, and Hawaii.” – according to the webpage of Population Studies Center, University of Michigan

Research question

Among American adults who had used cocaine , does the average amount of daily cocaine use when using the most differ between different groups of people with different history of parents’ drug use problems?

The null hypothesis: The means of amount of daily cocaine use are all the same between groups.

The alternative hypothesis: The means of amount of daily cocaine use are not all the same between groups.


The response variable I am going to use is S3BQ7B (the number of grams of cocaine usually used in a day when using cocaine the most).

Original variables:



The explanatory variable here is parents’ history of drug use, represented by secondary variable PARENTSGROUP. I defined three groups:

group 1: both mother and father never had drug problems

group 2: one parent had drug problems while the other parent never did

group 3: both father and mother had drug problems

Original variables:






We can see that p=0.6104>0.05, so the differences between group means are not statistical significant. The null hypothesis cannot be rejected. Among those who had used cocaine, the amount of cocaine use between people with different family history of parents drug problems.

Post hoc test:


From the table above, we can also see that all group means are not significantly different.


The sample size is too small – only 50 observations were used. The reason is that the majority of people were missing data for those questions. Actually, with such a mall sample size, I shouldn’t have used ANOVA. I think this is something to notice in the future research – always examine whether the assumptions of ANOVA are met first.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s