Title: | Predictive Power of Linear and Tree Modeling |
---|---|
Description: | Runs generalized and multinominal logistic (GLM and MLM) models, as well as random forest (RF), Bagging (BAG), and Boosting (BOOST). This package prints out to predictive outcomes easy for the selected data and data splits. |
Authors: | Seyma Kalay <[email protected]> |
Maintainer: | Seyma Kalay <[email protected]> |
License: | GPL-3 |
Version: | 3.8.0 |
Built: | 2025-02-18 05:44:05 UTC |
Source: | https://github.com/seymakalay/pomodoro |
Bagging Model
BAG_Model(Data, xvar, yvar)
BAG_Model(Data, xvar, yvar)
Data |
The name of the Dataset. |
xvar |
X variables. |
yvar |
Y variable. |
Decision trees suffer from high
variance (If we split the training data-set randomly into two parts and set a decision tree to both parts, the results might be quite different).
Bagging is an ensemble procedure which reduces the variance and increases the prediction accuracy of a statistical learning method
by considering many training sets
()
from the population. Since we can not have multiple training-sets, from a single training data-set, we can generate
different bootstrapped training data-sets
(
)
by each
trees and take a majority vote. Therefore, bagging for classification problem defined as
The output from BAG_Model
.
yvar <- c("Loan.Type") sample_data <- sample_data[c(1:750),] xvar <- c("sex", "married", "age", "havejob", "educ", "political.afl", "rural", "region", "fin.intermdiaries", "fin.knowldge", "income") BchMk.BAG <- BAG_Model(sample_data, c(xvar, "networth"), yvar ) BchMk.BAG$Roc$auc
yvar <- c("Loan.Type") sample_data <- sample_data[c(1:750),] xvar <- c("sex", "married", "age", "havejob", "educ", "political.afl", "rural", "region", "fin.intermdiaries", "fin.knowldge", "income") BchMk.BAG <- BAG_Model(sample_data, c(xvar, "networth"), yvar ) BchMk.BAG$Roc$auc
Combined Performance of the Data Splits
Combined_Performance(Sub.Est.Mdls)
Combined_Performance(Sub.Est.Mdls)
Sub.Est.Mdls |
is the total perfomance of exog. |
The output from Combined_Performance
.
sample_data <- sample_data[c(1:750),] yvar <- c("Loan.Type") xvar <- c("sex", "married", "age", "havejob", "educ", "political.afl", "rural", "region", "fin.intermdiaries", "fin.knowldge", "income") CCP.RF <- Estimate_Models(sample_data, yvar, xvec = xvar, exog = "political.afl", xadd = c("networth", "networth_homequity", "liquid.assets"), type = "RF", dnames = c("0","1")) Sub.CCP.RF <- list (Mdl.1 = CCP.RF$EstMdl$`D.1+networth`, Mdl.0 = CCP.RF$EstMdl$`D.0+networth`) CCP.NoCCP.RF <- Combined_Performance (Sub.CCP.RF)
sample_data <- sample_data[c(1:750),] yvar <- c("Loan.Type") xvar <- c("sex", "married", "age", "havejob", "educ", "political.afl", "rural", "region", "fin.intermdiaries", "fin.knowldge", "income") CCP.RF <- Estimate_Models(sample_data, yvar, xvec = xvar, exog = "political.afl", xadd = c("networth", "networth_homequity", "liquid.assets"), type = "RF", dnames = c("0","1")) Sub.CCP.RF <- list (Mdl.1 = CCP.RF$EstMdl$`D.1+networth`, Mdl.0 = CCP.RF$EstMdl$`D.0+networth`) CCP.NoCCP.RF <- Combined_Performance (Sub.CCP.RF)
Results of the Each Data and Data Splits
Estimate_Models(DataSet, yvar, exog = NULL, xvec, xadd, type, dnames)
Estimate_Models(DataSet, yvar, exog = NULL, xvec, xadd, type, dnames)
DataSet |
The name of the Dataset. |
yvar |
Y variable. |
exog |
is a vector to be subtract from the calculation. |
xvec |
is a vector of the variables to be used. |
xadd |
is an additional vector to be used. |
type |
can be RF, GLM, MLM, BAG, and GBM. |
dnames |
is the unique values of exog. |
The output from Estimate_Models
.
sample_data <- sample_data[c(1:750),] m2.xvar0 <- c("sex","married","age","havejob","educ","rural","region","income") CCP.RF <- Estimate_Models(sample_data, yvar = c("Loan.Type"), exog = "political.afl", xvec = m2.xvar0, xadd = "networth", type = "RF", dnames = c("0","1"))
sample_data <- sample_data[c(1:750),] m2.xvar0 <- c("sex","married","age","havejob","educ","rural","region","income") CCP.RF <- Estimate_Models(sample_data, yvar = c("Loan.Type"), exog = "political.afl", xvec = m2.xvar0, xadd = "networth", type = "RF", dnames = c("0","1"))
Gradient Boosting Model
GBM_Model(Data, xvar, yvar)
GBM_Model(Data, xvar, yvar)
Data |
The name of the Dataset. |
xvar |
X variables. |
yvar |
Y variable. |
Unlike bagging trees, boosting does not use bootstrap sampling, rather each tree is fit using information from previous trees. An event probability of stochastic gradient boosting model is given by
where is in the range of
and its initial estimate of the model is
,
where
is the estimated sample proportion of a single class from the training set.
The output from GBM_Model
.
yvar <- c("Loan.Type") sample_data <- sample_data[c(1:120),] xvar <- c("sex", "married", "age", "havejob", "educ", "political.afl", "rural", "region", "fin.intermdiaries", "fin.knowldge", "income") BchMk.GBM <- GBM_Model(sample_data, c(xvar, "networth"), yvar ) BchMk.GBM$finalModel BchMk.GBM$Roc$auc
yvar <- c("Loan.Type") sample_data <- sample_data[c(1:120),] xvar <- c("sex", "married", "age", "havejob", "educ", "political.afl", "rural", "region", "fin.intermdiaries", "fin.knowldge", "income") BchMk.GBM <- GBM_Model(sample_data, c(xvar, "networth"), yvar ) BchMk.GBM$finalModel BchMk.GBM$Roc$auc
Generalized Linear Model
GLM_Model(Data, xvar, yvar)
GLM_Model(Data, xvar, yvar)
Data |
The name of the Dataset. |
xvar |
X variables. |
yvar |
Y variable. |
Let y be a vector of response variable of accessing credit for each applicant
, such that
if the applicant-
has access to credit, and zero otherwise. Furthermore, let
let
, where
and
characteristics of the applicants.
The log-odds can be define as:
is the intercept,
is
a
vector of coefficients and
is the
row of x.
The output from GLM_Model
.
yvar <- c("multi.level") sample_data <- sample_data[c(1:750),] xvar <- c("sex", "married", "age", "havejob", "educ", "political.afl", "rural", "region", "fin.intermdiaries", "fin.knowldge", "income") BchMk.GLM <- GLM_Model(sample_data, c(xvar, "networth"), yvar ) BchMk.GLM$finalModel BchMk.GLM$Roc$auc
yvar <- c("multi.level") sample_data <- sample_data[c(1:750),] xvar <- c("sex", "married", "age", "havejob", "educ", "political.afl", "rural", "region", "fin.intermdiaries", "fin.knowldge", "income") BchMk.GLM <- GLM_Model(sample_data, c(xvar, "networth"), yvar ) BchMk.GLM$finalModel BchMk.GLM$Roc$auc
Multinominal Logistic Model
MLM_Model(Data, xvar, yvar)
MLM_Model(Data, xvar, yvar)
Data |
The name of the Dataset. |
xvar |
X variables. |
yvar |
Y variable. |
Multi-nominal model is the generalized form of generalized logistic model and can be define as
where presents the class labels ("1-of-h") on the basis of an input vector
, in our case
is loan types ("Formal Loan", "Informal Loan", "Both Loan", and "No Loan"). Furthermore,
if the weight w
of
corresponds to belong a class and
otherwise.
For
and
the weight vectors w^i corresponds to class
.
We set and the parameters to be learned are the weight vectors w^i
for
. And the class probabilities must satisfy
The output from MLM_Model
.
yvar <- c("Loan.Type") sample_data <- sample_data[c(1:750),] xvar <- c("sex", "married", "age", "havejob", "educ", "political.afl", "rural", "region", "fin.intermdiaries", "fin.knowldge", "income") BchMk.MLM <- MLM_Model(sample_data, c(xvar, "networth"), yvar ) BchMk.MLM$finalModel BchMk.MLM$Roc$auc
yvar <- c("Loan.Type") sample_data <- sample_data[c(1:750),] xvar <- c("sex", "married", "age", "havejob", "educ", "political.afl", "rural", "region", "fin.intermdiaries", "fin.knowldge", "income") BchMk.MLM <- MLM_Model(sample_data, c(xvar, "networth"), yvar ) BchMk.MLM$finalModel BchMk.MLM$Roc$auc
Random Forest
RF_Model(Data, xvar, yvar)
RF_Model(Data, xvar, yvar)
Data |
The name of the Dataset. |
xvar |
X variables. |
yvar |
Y variable. |
Rather than considering the random sample of predictors
from the total of
predictors in each split,
random forest does not consider a majority of the
predictors, and considers in each split a
fresh sample of
which we usually set to
Random forests which de-correlate the trees by considering
show an improvement over bagged trees
.
The output from RF_Model
.
sample_data <- sample_data[c(1:750),] yvar <- c("Loan.Type") xvar <- c("sex", "married", "age", "havejob", "educ", "political.afl", "rural", "region", "fin.intermdiaries", "fin.knowldge", "income") BchMk.RF <- RF_Model(sample_data, c(xvar, "networth"), yvar ) BchMk.RF
sample_data <- sample_data[c(1:750),] yvar <- c("Loan.Type") xvar <- c("sex", "married", "age", "havejob", "educ", "political.afl", "rural", "region", "fin.intermdiaries", "fin.knowldge", "income") BchMk.RF <- RF_Model(sample_data, c(xvar, "networth"), yvar ) BchMk.RF
Sample data for analysis.
A dataset containing information of access to credit.
sample_data
sample_data
A data_frame
with 53940 rows and 10 variables:
hhid, household id number
swgt, survey weight
region, 3 factor level, west, east, and center
No.Loan, if the household has no loan
Formal, if the household has formal loan
Both, if the household has both loan
Informal, if the household has informal loan
sex, if the household has male
Loan.Type, 4 factor level type of the loan
multi.level, 2 factor level if the household has access to loan or not
...