My Profile Photo

ML ideabook


A notebook for Coursera Machine Learning course ideas


  1. Testing a moderator

    I picked BLOOD/NATURAL FATHER EVER AN ALCOHOLIC OR PROBLEM DRINKER (S2DQ1 as the coding) as a candidate moderator. By common sense it’s not related with tobacco dependence. Let’s test it. …


  2. Pearson correlation test

    This week is much easier, no post-hoc test is needed, thus the code is much simpler. …


  3. Chi-q test

    Based on the samples, I decided to research if there’s significant difference among different ethnics on drug use. Here’s the code: …


  4. Drug vs ethnicity

    This week starts easy, and I chose to investigate if ethnicity is a factor which affects the youngest age of cannabis. The reason I used cannabis instead of amphetamines is on account of sample size. Here’s the code: …


  5. Plot

    The final week. Not much to explain, simply print the plots. Check it here . I used proc freq for frequency distribution plot, since the code doesn’t make sense if proc univariate is used. …


  6. Tiny tiny edit

    So this week is going to be quite easy, only modifying a bit will do the work. Let’s take a look at my code. …


  7. Distribution unexpected

    So I am not expecting this to happen.. I thought they would distribute like normal distribution. Here’s the code: …


  8. Rich junkies?!

    Question selection

    The drug abuse study attracts me much for some reason. I chose NESARC’s dataset. From experience of myself, it seems the reason why a person started using drugs isn’t simple. Was it for fun? Or was the life too hard without some kind of matter for relief? And different from tobacco use, the access to drug can be significantly harder. There are many factors involved, like income, use of drug etc. …