This white paper will discuss Multi Step Logistic Model which is a Non-Optimization technique but may be one of the most useful technique when cost of Maximization/Minimization is high i.e. when proportion of 1 and 0 are distinctly different i.e. when we can only achieve goal (Tagging actual 1 as 1 and actual Zero as 0) by few % at cost of huge misclassification of other.
IMPROVING PREDICTIVE POWER OF BINARY
RESPONSE MODEL USING MULTI STEP LOGISTIC
APPROACH Sandeep Das * Senior Consultant, Analytics Genpact India, th thDLF IT Park, Tower 1, 7 & 8 Floor 8 Major Arterial Road, New Town Rajarhat, Kolkata - 700156 India April 2009 * Corresponding author: Tel: +91-9836268676 E-mail address: sandeep.das1@genpact.com
ABSTRACT:
This paper discusses a methodology called "Multi Step Logistic Regression" to improve the
predictive power of the binary logistic regression model in terms of a higher Hit/Miss ratio. A
'Hit' is defined as right classification/tagging and a 'Miss' is defined as wrong classification
obtained from cross tabulation between actual vs. predicted tagging. In this approach, after
choosing the final cut logistic model, the model building population is segregated into two
parts - predicted 1 and predicted 0 by selecting a cut off on predicted probability distribution.
For predicted 1 group, parameter estimates are re-estimated keeping the same variables came
significant for initial model. User may choose to introduce new variables in each iteration and
keep them in the model as per significance. These steps are iteratively repeated till we get a
good cost-benefit cause to stop. The conventional logistic method (single step) doesn't help to
tackle a situation where the proportion of 1 & 0 distinctly different or cost of misallocation is
high. To tackle such a situation, we will discuss this alternative approach. This paper targets
to improve the concentration in Hit cells with (without) tolerable/regulated (alarming) increase
in concentration of misclassification compared to the Single step approach.
KEY WORDS: Multistep Logistic, Binary Response Model, Improving predictive power of
Probability of Default (PD) model.
PAPER TYPE: Data Analysis in Banking and Financial Services
1 1. INTRODUCTION:
In the standard binary logistic model building exercise, we first finalize the choice of a model.
We then choose cut off on predicted probability distribution to define Predicted 1 & 0. But we
have no control on the Hit/Miss ratio associated with choice of cut off. As we can not regulate
migration of the cell elements within different Hit & Miss cells only by choice of different
cutoffs. In this paper, we will discuss Multi Step Logistics as a probable way to help improve
Hit/Miss ratio of the model. The industry verticals where this could be applicable are Risk,
Credit Risk, Bankruptcy forecasting etc. This methodology can also be used in other domains
where we want to predict dichotomous response variables like Yes/No or , Good/Bad and when
one of the category has very high/low population.
2. BACKGROUND:
Let's take a standard logistic score card building scenario where we are modeling for Bad, at
the end essentially this generates predicted probability distribution of Bad. In many business
scenarios, we find that proportion of bad vs good is distinctly different. For example, if we
build a response model of a mailing campaign then it is expected that a very few actual
responses will be available. In such a situation, the stability of the logistic model is itself
questionable. Here, we either increase the number of responses by biased sampling or use
specific algorithm(s) like 'zero inflated models' or 'modeling rare events' or sometimes 'neural
network' logic to improve model prediction power. These algorithms are complex and also not
user friendly for implementation. In this paper, we will discuss a methodology which is
suitable when we want to increase Hit/Miss ratio compared to that of Single Step logistic 1regression. A Single Step logistic approach is the standard ... [download for more]