My Profile Photo

ML ideabook


A notebook for Coursera Machine Learning course ideas


Chi-q test

Based on the samples, I decided to research if there’s significant difference among different ethnics on drug use. Here’s the code:

ods html close;
ods html;
libname worklib 'e:\Course\'; 
proc import
	datafile="E:\SPM\Coursera_datasets\nesarc_pds.csv"
	out=worklib.nesarc_pds;
	delimiter=',';
run;
data worklib.working;
	set worklib.nesarc_pds;
	where ETHRACE2A=1 or ETHRACE2A=2;
	if DGSTATUS=2 then DGSTATUS=1;
run;
proc sort; 	by idnum;	
proc freq;	tables DGSTATUS*ETHRACE2A / chisq;
run;
data worklib.working;
	set worklib.nesarc_pds;
	where ETHRACE2A=1 or ETHRACE2A=3;
	if DGSTATUS=2 then DGSTATUS=1;
run;
proc sort; 	by idnum;	
proc freq;	tables DGSTATUS*ETHRACE2A / chisq;
run;
data worklib.working;
	set worklib.nesarc_pds;
	where ETHRACE2A=1 or ETHRACE2A=4;
	if DGSTATUS=2 then DGSTATUS=1;
run;
proc sort; 	by idnum;	
proc freq;	tables DGSTATUS*ETHRACE2A / chisq;
run;
data worklib.working;
	set worklib.nesarc_pds;
	where ETHRACE2A=1 or ETHRACE2A=5;
	if DGSTATUS=2 then DGSTATUS=1;
run;
proc sort; 	by idnum;	
proc freq;	tables DGSTATUS*ETHRACE2A / chisq;
run;
data worklib.working;
	set worklib.nesarc_pds;
	where ETHRACE2A=2 or ETHRACE2A=3;
	if DGSTATUS=2 then DGSTATUS=1;
run;
proc sort; 	by idnum;	
proc freq;	tables DGSTATUS*ETHRACE2A / chisq;
run;
data worklib.working;
	set worklib.nesarc_pds;
	where ETHRACE2A=2 or ETHRACE2A=4;
	if DGSTATUS=2 then DGSTATUS=1;
run;
proc sort; 	by idnum;	
proc freq;	tables DGSTATUS*ETHRACE2A / chisq;
run;
data worklib.working;
	set worklib.nesarc_pds;
	where ETHRACE2A=2 or ETHRACE2A=5;
	if DGSTATUS=2 then DGSTATUS=1;
run;
proc sort; 	by idnum;	
proc freq;	tables DGSTATUS*ETHRACE2A / chisq;
run;
data worklib.working;
	set worklib.nesarc_pds;
	where ETHRACE2A=3 or ETHRACE2A=4;
	if DGSTATUS=2 then DGSTATUS=1;
run;
proc sort; 	by idnum;	
proc freq;	tables DGSTATUS*ETHRACE2A / chisq;
run;
data worklib.working;
	set worklib.nesarc_pds;
	where ETHRACE2A=3 or ETHRACE2A=5;
	if DGSTATUS=2 then DGSTATUS=1;
run;
proc sort; 	by idnum;	
proc freq;	tables DGSTATUS*ETHRACE2A / chisq;
run;
data worklib.working;
	set worklib.nesarc_pds;
	where ETHRACE2A=4 or ETHRACE2A=5;
	if DGSTATUS=2 then DGSTATUS=1;
run;
proc sort; 	by idnum;	
proc freq;	tables DGSTATUS*ETHRACE2A / chisq;
run;

And the output is here

And the letter table:

Group ing ETHRACE2A
A   1
B   2
C   3
D   4
B D 5

According to the result, the drug use between Asian/Native Hawaiian/Pacific Islander, Not Hispanic or Latino and Hispanic or Latino is not significantly different, but among other every two ethnics, it’s different. Be sure that statistically significant is only a part - we have to associate the result with real life too.