Find White Papers
Home
About Us
List Your Papers
    
> Genpact > Variable Reduction in Logistic & Choosing the correct transformations

Variable Reduction in Logistic & Choosing the correct transformations

White Paper Published By: Genpact

This technique is a time reducing and one which searches through various transformations to check for the best fit of a variable to the dependent variable. On comparing it with a very accepted method of a reputed firm, we have seen this new technique working wonders



Tags : 
genpact, data mining, statistical use, predictive/targeting modeling, finance & risk mamangement, retail analytics, enterprise applications, data management

Genpact
Published:  Oct 15, 2008
Type:  White Paper
Length:  6 pages

Variable Reduction in Logistic & Choosing the correct transformations Arijit Das, Senior Consultant of Genpact, Kolkata, INDIA ABSTRACT: This paper provides a powerful tool to deal with the problem of plenty where we have a large number of variables and the regression function used is Logistic. This method weaves through huge number of transformations of each variable and helps you choose the best form of each variable and all this while reducing the variables to be considered at a very fast pace and helping you get a very robust and statistically sound model which can stand the test of time. UNDERLINED BENEFITS: This method that I will discuss of choosing the transformations and reduction of variables from the many transformations to the vital few has proved to be very effective in a very competitive card environment of US where external data from agencies and internal data is robust and in plenty. This technique is a time reducing and one which searches through various transformations to check for the best fit of a variable to the dependent variable. On comparing it with a very accepted method of a reputed firm, we have seen this new technique working wonders! (Figure 1)
Figure 1 INTRODUCTION: As students of statistics, we know that in logistic regression, given the equation form, has two major problems to deal around - first is choosing the right variable and second, to get around the problem of multicollinearity (MC). In this paper, I will be concerned with cases where there are huge number of observations and huge number of variables - typically the problem of plenty. And, in such situations, MC is not a problem at all as Kent Leahy aptly says, " A common solution to get rid of MC, therefore, has been to delete one or more of the offending collinear model variables or to use factor or principal components analysis to reduce the amount of redundant variation present in the data. MC, however, is not always harmful, and deleting a variable or variables under such circumstances can be the real problem. Unfortunately, this is not well understood by many in the industry, even among those with substantial statistical backgrounds. " It should be well appreciated that if there was no correlation between predictors, this regression form would have been reduced to a mere method of processing a series of bivariate regressions - thus it is these relationships between variables that actually give life to logistic regression. FIT THE CURVE TO THE DATA: When there is very high number of variables, choosing the elite few is a problem and the techniques of factor or principal component is not useful here. What I propose is if we know that logistic is the functional form for the predicted variable which is many times the case in marketing problems of response (1/0) kind of data, then the best way to find out which variable is significant is to use logistic itself. For this, I have built a simple code with known options in SAS which will help you use it like an effective tool and it has delivered very high impact projects and has performed extremely well in very competitive credit card acquisition campaigns and lifecycle campaigns. STEP TO EFFECTIVE TRANSFORMATIONS: This step takes a variable and makes 19 transformations including basic squares, cubes, their roots, inverses, logarithmic and some sin, cosine transformations as well - the idea is to make it as exhaustive as possible - these are some which I have seen have come significant in the projects/models I have built. Consider a dataset called sample which has all the variables and the Y variable called response. Then, in the first step make transformations of the variable concerned, say, var2 (Ref: Figure 2). /*/*/*/*Part One of Code*/*/*/*/ data test1; set sample(keep = response var2); var2_sq = var2**2; /*squared*/ var2_cu = var2**3; /*cubed*/ var2_sqrt = sqrt(var2); /*square root*/ var2_curt = var2**.3333; /*cube root*/ var2_log = log(max(.0001,var2)); /*log*/ var2_exp = exp(max(.0001,var2)); /*exponent*/ var2_tan = tan(var2); /*tangent*/ var2_sin = sin(var2); /*sine*/ var2_cos = cos(var2); /*cosine*/ var2_inv = 1/max(.0001,var2); /*inverse*/ var2_sqi = 1/max(.0001,var2**2); /*squared inverse*/ var2_cui = 1/max(.0001,var2**3); /*cubed inverse*/ var2_sqri = 1/max(.0001,sqrt(var... [download for more]

Browse Technology Topics

Data Center

Virtualization, Cloud Computing, Infrastructure, Design and Facilities, Power and Cooling, Green Computing  
    

Data Management

Application Integration, Analytical Applications, Business Intelligence, Configuration Management, Database Development, Data Integration, Data Mining, Data Protection, Data Quality, Data Replication, Database Security, EDI, SOAP, Service Oriented Architecture, Web Service Management, Data Warehousing  
    

Enterprise Applications

Application Integration, Application Performance Management, Best Practices, Business Activity Monitoring, Business Analytics, Business Integration, Business Intelligence, Business Management, Business Metrics, Business Process Automation, Business Process Management, Call Center Management, Call Center Software, Change Management, Corporate Governance, Customer Interaction Service, Customer Relationship Management, Customer Satisfaction, Customer Service, EBusiness, Enterprise Resource Planning, Enterprise Software, EProcurement, Extranets, Groupware Workflow, HIPAA Compliance, IP Faxing, IT Spending, Marketing Automation, Performance Testing, Product Lifecycle Management, Project Management, Return On Investment, Risk Management, Sales & Marketing Software, Sales Automation, Server Virtualization, Simulation Software, Supply Chain Management, System Management Software, Total Cost of Ownership, Video Conferencing, Voice Recognition, Voice Over IP, Workforce Management, Incentive Compensation, Spend Management, Manufacturing Execution Systems, International Computing  

Human Resource Technology

Human Resources Services, Payroll Software, Time and Attendance Software, Workforce Management Software, Financial Management, Employee Monitoring Software, Employee Training Software, Recruiting Software/Services, Employee Performance Management, ELearning, Benefits Management, Expense Management  
    

IT Career Advancement

Cisco Certification, Microsoft Certification, Linux Certification, Network Security Certification, Software Development Certification  

IT Management

Employee Performance, ITIL, Productivity, Project Management, Software Compliance, Sarbanes Oxley Compliance, Service Management, Desktop Management  
    

Knowledge Management

Collaboration, Collaborative Commerce, Contact Management, Content Delivery, Content Integration, Content Management System, Corporate Portals, Customer Experience Management, Document Management, Information Management, Intranets, Messaging, Records Management, Search And Retrieval, Search Engines, Secure Content Management, SLA  

Networking

Active Directory, Bandwidth Management, Convergence, Distributed Computing, Ethernet Networking, Fibre Channel, Gigabit Networking, Governance, Grid Computing, Infrastructure, Internetworking Hardware, Interoperability, IP Networks, IP Telephony, Local Area Networking, Load Balancing, Migration, Monitoring, Network Architecture, Network Management, Network Performance, Network Performance Management, Network Provisioning, Network Security, OLAP, Optical Networking, Quality Of Service, Remote Access, Remote Network Management, Server Hardware, Servers, Small Business Networks, TCP/IP Protocol, Test And Measurement, Traffic Management, Tunneling, Utility Computing, VPN, Wide Area Networks, Green Computing, Cloud Computing, Power and Cooling, Data Center Design and Management, Colocation and Web Hosting  
    

Platforms

AS/400, Domino, Linux, Microsoft Exchange, Oracle, PeopleSoft, SAP, Siebel, Solaris, Tivoli, Unix, Web Sphere, Windows, Windows Server  

Security

Access Control, Anti Spam, Anti Spyware, Anti Virus, Application Security, Auditing, Authentication, Biometrics, Business Continuity, Compliance, DDoS, Disaster Recovery, Email Security, Encryption, Firewalls, Hacker Detection, High Availability, Identity Management, Internet Security, Intrusion Detection, Intrusion Prevention, IPSec, Network Security Appliance, Password Management, Patch Management, Phishing, PKI, Policy Based Management, Security Management, Security Policies, Single Sign On, SSL, Secure Instant Messaging, Web Service Security, PCI Compliance, Vulnerability Management  
    

Software Development

.NET, C++, Database Development, Java, Middleware, Open Source, Software Outsourcing, Quality Assurance, Scripting, SOAP, Software Testing, Visual Basic, Web Development, Web Services, Web Service Security, XML  

Storage

Backup And Recovery, Blade Servers, Clustering, IP Storage, ISCSI, Network Attached Storage, RAID, Storage Area Networks, Storage Management, Storage Virtualization, Email Archiving, Data Deduplication  
    

Wireless

802.11, Bluetooth, CDMA, GPS, Mobile Computing, Mobile Data Systems, Mobile Workers, PDA, RFID, Smart Phones, WiFi, Wireless Application Software, Wireless Communications, Wireless Hardware, Wireless Infrastructure, Wireless Messaging, Wireless Phones, Wireless Security, Wireless Service Providers, WLAN  
Search