4. Non-parametric Tests with R

Batur Şeker

5 min readJan 30, 2021

Used dataset

This story is the continuation of this article.

#Get working directory
getwd()

#Set working directory
setwd(“C:\\Users\\batur\\Desktop\\R Tutorial”)

#Read csv data file and store as data frame
bankChurnersData=read.csv(file=”BankChurners.csv”)

#Drop columns has number of 22 and 23
df <- bankChurnersData[-c(22:23)]

#Encode Attrition_Flag column of df as a factor — Binary variable
df$Attrition_Flag=factor(df$Attrition_Flag,levels=c(“Attrited Customer”,”Existing Customer”))

#Encode Gender column of df as a factor — Binary variable
df$Gender=factor(df$Gender,levels=c(“M”,”F”))

#Encode Education_Level column of df as an ordered factor — Ordinal variable
df$Education_Level=factor(df$Education_Level, ordered=TRUE, levels=c(“Unknown”,”Uneducated”,”High School”,”College”,”Graduate”,”Post-Graduate”,”Doctorate”) )

#Encode Marital_Status column of df as a factor — Nominal variable
df$Marital_Status=factor(df$Marital_Status,levels=c(“Married”,”Single”,”Unknown”,”Divorced”))

#Encode Income_Category column of df as an ordered factor — Ordinal variable
df$Income_Category=factor(df$Income_Category,ordered=TRUE,levels=c(“Unknown”,”Less than $40K”,”$40K — $60K”,”$60K — $80K”,”$80K — $120K”,”$120K +”))

#Encode Card_Category column of df as an ordered factor — Ordinal variable
df$Card_Category<-factor(df$Card_Category,ordered=TRUE,levels = c(“Blue”,”Silver”,”Gold”,”Platinum”))

#first 100 row
df_4=head(df,100)

#last 100 row
df_6=tail(df,100)

1. Chi-Squared Test
To use this test, normal distribution and equality of variances are not required.

1.1. Normality Test with Shapiro
1.Determine Hypothesis
H0: Examples have normal distribution.
HA: Examples have not normal distribution

2. Select α (Significant Level)
α : 0.05

3.Test Statistics
#Normality Test with Shapiro
model_5<-lm(Credit_Limit~Income_Category, data = df_4)
shapiro.test(residuals(model_5))

data: residuals(model_5)
W = 0.94116, p-value = 0.000227

4.Make Decision
p<=0.05 H0 is rejected
p>0.05 H0 is accepted
p-value = 0.000227
decision: p<=0.05 H0 is rejected

5.Interpret
Examples have not normal distribution.

Not: There is no need to test Bartlett Test.

1.2. Chi-Squared Test
1.Determine Hypothesis
H0: The variables are independent, there is no relationship between the two categorical variables.
HA: The variables are dependent, there is a relationship between the two categorical variables.

2. Select α (Significant Level)
α : 0.05

3.Test Statistics
tab=xtabs(Credit_Limit~Income_Category, data=df_4)
#Chi-squared contingency table test
chisq.test(tab)

data: tab
X-squared = 420884, df = 5, p-value < 2.2e-16

4.Make Decision
p<=0.05 H0 is rejected
p>0.05 H0 is accepted
p-value < 2.2e-16
decision: p<=0.05 H0 is rejected

5.Interpret
The variables are dependent, there is a relationship between the two categorical variables.

2. One-sample Wilcoxon Signed Rank Test
To use this test, normal distribution and equality of variances are not required.

2.1. Normality Test with Shapiro
1.Determine Hypothesis
H0: Examples have normal distribution.
HA: Examples have not normal distribution

2. Select α (Significant Level)
α : 0.05

3.Test Statistics
#Normality Test with Shapiro
shapiro.test(df_4$Credit_Limit)

data: df_4$Credit_Limit
W = 0.80145, p-value = 2.745e-10

4.Make Decision
p<=0.05 H0 is rejected
p>0.05 H0 is accepted
p-value = 2.745e-10
decision: p<=0.05 H0 is rejected

5.Interpret
Examples have not normal distribution.

Not: There is no need to test Bartlett Test.

2.2. One-sample Wilcoxon Signed Rank Test
1.Determine Hypothesis
H0: The median of the sample is equal to the theoretical value (mu)
HA: The median of the sample is not equal to the theoretical value (mu)

2. Select α (Significant Level)
α : 0.05

3.Test Statistics
#(one-sample) Wilcoxon signed rank test
wilcox.test(df_4$Credit_Limit, mu=150)

data: df_4$Credit_Limit
V = 5050, p-value < 2.2e-16
alternative hypothesis: true location is not equal to 150

4.Make Decision
p<=0.05 H0 is rejected
p>0.05 H0 is accepted
p-value < 2.2e-16
decision: p<=0.05 H0 is rejected

5.Interpret
The median of the sample is not equal to 150.

3. Unpaired two-samples Wilcoxon rank sum test (equivalent of Mann-Whitney U)
To use this test, normal distribution and equality of variances are not required.

3.1. Normality Test with Shapiro
1.Determine Hypothesis
H0: Examples have normal distribution.
HA: Examples have not normal distribution

2. Select α (Significant Level)
α : 0.05

3.Test Statistics
#Normality Test with Shapiro
model_6<-lm(Credit_Limit~Gender, data = df_4)
shapiro.test(residuals(model_6))

data: residuals(model_6)
W = 0.8861, p-value = 3.32e-07

4.Make Decision
p<=0.05 H0 is rejected
p>0.05 H0 is accepted
p-value = 3.32e-07
decision: p<=0.05 H0 is rejected

5.Interpret
Examples have not normal distribution.

Not: There is no need to test Bartlett Test.

3.2. Unpaired two-samples Wilcoxon rank sum test (equivalent of Mann-Whitney U)
1.Determine Hypothesis
H0: Medians of two independent groups are same.
HA: Medians of two independent groups are different.

2. Select α (Significant Level)
α : 0.05

3.Test Statistics
#(Unpaired two-samples) Wilcoxon rank sum test
#equavalant of Mann-Whitney U
wilcox.test(Credit_Limit~Gender, data=df_4)

data: Credit_Limit by Gender
W = 1536, p-value = 0.0005135
alternative hypothesis: true location shift is not equal to 0

4.Make Decision
p<=0.05 H0 is rejected
p>0.05 H0 is accepted
p-value = 0.0005135
decision: p<=0.05 H0 is rejected

5.Interpret
Medians of two independent groups are different.

4. Paired Samples Wilcoxon Signed Rank Test
To use this test, normal distribution and equality of variances are not required. In addition to this, normally, this test should be used to test two related groups of samples. However, there is not related data to use Paired Samples Wilcoxon Signed Rank Test in my dataset. Because of this, I use two separated parts of Credit_Limit column to use Paired Samples Wilcoxon Signed Rank Test.

4.1. Normality Test with Shapiro
1.Determine Hypothesis
H0: Examples have normal distribution.
HA: Examples have not normal distribution

2. Select α (Significant Level)
α : 0.05

3.Test Statistics
before=df_4$Credit_Limit
after=df_6$Credit_Limit
#Normality Test with Shapiro
shapiro.test(before)

data: before
W = 0.80145, p-value = 2.745e-10

#Normality Test with Shapiro
shapiro.test(after)

data: after
W = 0.79485, p-value = 1.726e-10

4.Make Decision
p<=0.05 H0 is rejected
p>0.05 H0 is accepted
p-value = 2.745e-10, p-value = 1.726e-10
decision: For both p-value, p<=0.05 H0 is rejected

5.Interpret
Examples have not normal distribution.

Not: There is no need to test Bartlett Test.

4.2. Paired Samples Wilcoxon Signed Rank Test
1.Determine Hypothesis
H0: Medians of two related groups are same.
HA: Medians of two related groups are different.

2. Select α (Significant Level)
α : 0.05

3.Test Statistics
#(Paired samples) Wilcoxon signed rank test
wilcox.test(before,after, paired = TRUE)

data: before and after
V = 2143, p-value = 0.2473
alternative hypothesis: true location shift is not equal to 0

4.Make Decision
p<=0.05 H0 is rejected
p>0.05 H0 is accepted
p-value = 0.2473
decision: p>0.05 H0 is accepted

5.Interpret
Medians of two related groups are same.

5. Kruskal Wallis
To use this test, normal distribution and equality of variances are not required.

5.1. Normality Test with Shapiro
1.Determine Hypothesis
H0: Examples have normal distribution.
HA: Examples have not normal distribution

2. Select α (Significant Level)
α : 0.05

3.Test Statistics
#Normality Test with Shapiro
model_7<-lm(Credit_Limit~Income_Category, data = df_4)
shapiro.test(residuals(model_7))

data: residuals(model_7)
W = 0.94116, p-value = 0.000227

4.Make Decision
p<=0.05 H0 is rejected
p>0.05 H0 is accepted
p-value = 0.000227
decision: p<=0.05 H0 is rejected

5.Interpret
Examples have not normal distribution

Not: There is no need to test Bartlett Test.

5.2. Kruskal Wallis
1.Determine Hypothesis
H0: The medians of the different groups are the same (At least 3 groups)
HA: At least one sample median is not equal to the others.

2. Select α (Significant Level)
α : 0.05

3.Test Statistics
#Kruskal Wallis
kruskal.test(Credit_Limit~Income_Category, data = df_4)

data: Credit_Limit by Income_Category
Kruskal-Wallis chi-squared = 31.449, df = 5, p-value = 7.636e-06

4.Make Decision
p<=0.05 H0 is rejected
p>0.05 H0 is accepted
p-value = 7.636e-06
decision: p<=0.05 H0 is rejected

5.Interpret
At least one sample median is not equal to the others.

4. Non-parametric Tests with R

Written by Batur Şeker