Find White Papers
Home
About Us
List Your Papers
    
> Genpact > How to choose which transformation to use while using Regression Model

How to choose which transformation to use while using Regression Model

White Paper Published By: Genpact

Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin;} This white paper will discuss some techniques to choose one transformation of variables while using Regression Model among many others.  This paper will also discuss sanity checks.



Tags : 
regression model, transformation of variables, genpact, software development, c++, database development

Genpact
Published:  Nov 20, 2008
Type:  White Paper
Length:  22 pages

How to choose which transformation to use while using Regression Model Sandeep Das, Senior Consultant of Genpact Kolkata, INDIA Abstract: Taking transformation of variables is quite common practice while building model. In real life scenario some time it becomes evident to take variable transformation to improve the model performance. One can try out what ever transformation one want to take but that should be logical and meaningful with respect to business sense and value addition due to use of transformation needs to be significant with respect to no transformation situation i.e if the transformation is adding marginal value in terms of explanatory power then the original variable is recommended. In this writing we will discuss some techniques to choose one transformation among many others. We will also discuss sanity checks that we need to go through. The Concept: Suppose lets take a variable X. Let the transformation function used on this variable is 'f' So transformed variables if f(X). The first order derivative (f') of this function could be +ve or -ve i.e if we have chosen square transformation then f'>0 (assuming the values of 'X' are greater than 1) where as if we take inverse transformation then f'<0 (provided all values in 'X' are non zero). Example and Interpretation of Transformation: Variable Nature Log Square Root Inverse Out Continuous change in log of a variable can be For variables like Not recommended Standing interpreted as rate of change in income or balance as Outstanding Balance outstanding Balance when the which can take variable may take variable changes by 1 unit huge values it value zero for a helps to scale down number of the huge values. accounts Utilization Percentage Will shift the transformed value Helps to Skew Helps to assign to negative if it is a proper faction Distribution higher weight to or if % is less than 100% lower values Inquires Number Helps to scale down fluctuations Effect depends on Helps to assign and range value of the higher weight to variable < = > 1 lower values
The Problem: Theoretically we can take as many types of transformations i.e if we have "K" number of transformations in mind and we have "N" number of variables then we generate N X K number of derived variables. Now question become how to cut this huge list down that is how to choose the best among these? Way Out - Proposed Solution: To solve this problem we first need to decide what should be our basis of transformation reduction and which will be dependent on the types of modeling we are doing. Step1: Following question to be answered Q1: Objective - Transformation of Y (Dependent) or X (Independent) This is crucial as objective of transformation is generally differs for Dependent(Y) and Independent (X). When we take transformation on Y then in most of the cases our objective is to minimize fluctuation or Scale down (up) values or tackle sqewness distribution pattern. In this case it is advisable to choose transformation looking into univariate distribution alone. Box Cox would help to arrive at a transformation following the normal distribution.Where as while choosing transformation for Independent variables bivariate or multivariate analysis should be more prioritized than univariate as there normally the objective is to choose a transformation will increase the predictive power of model in other words minimize residual. For continuous independent variables, we could look at PROC GAM plots to see non-linear / piecewise linear fits are appropriate. This would also sometimes improve the fit. However the challenge is to unearth the actual pattern and making business sense out of it. Q2: Nature of Variable 'X's could be Discrete or Continuous. If Discrete then only binning or creating Dummy is the option, where as for continuous variables we can try out different transformation. While taking transformation of continuous variables it would be advisable to perform pre-modeling steps like Outlier detection, missing value treatment before choosing transformation. For continuous independent variables, we could look at PROC GAM plots to see non-linear / piecewise linear fits are appropriate. This would also sometimes improve the fit. However the challenge ... [download for more]

Browse Technology Topics

Data Center

Virtualization, Cloud Computing, Infrastructure, Design and Facilities, Power and Cooling, Green Computing  
    

Data Management

Application Integration, Analytical Applications, Business Intelligence, Configuration Management, Database Development, Data Integration, Data Mining, Data Protection, Data Quality, Data Replication, Database Security, EDI, SOAP, Service Oriented Architecture, Web Service Management, Data Warehousing  
    

Enterprise Applications

Application Integration, Application Performance Management, Best Practices, Business Activity Monitoring, Business Analytics, Business Integration, Business Intelligence, Business Management, Business Metrics, Business Process Automation, Business Process Management, Call Center Management, Call Center Software, Change Management, Corporate Governance, Customer Interaction Service, Customer Relationship Management, Customer Satisfaction, Customer Service, EBusiness, Enterprise Resource Planning, Enterprise Software, EProcurement, Extranets, Groupware Workflow, HIPAA Compliance, IP Faxing, IT Spending, Marketing Automation, Performance Testing, Product Lifecycle Management, Project Management, Return On Investment, Risk Management, Sales & Marketing Software, Sales Automation, Server Virtualization, Simulation Software, Supply Chain Management, System Management Software, Total Cost of Ownership, Video Conferencing, Voice Recognition, Voice Over IP, Workforce Management, Incentive Compensation, Spend Management, Manufacturing Execution Systems, International Computing  

Human Resource Technology

Human Resources Services, Payroll Software, Time and Attendance Software, Workforce Management Software, Financial Management, Employee Monitoring Software, Employee Training Software, Recruiting Software/Services, Employee Performance Management, ELearning, Benefits Management, Expense Management  
    

IT Career Advancement

Cisco Certification, Microsoft Certification, Linux Certification, Network Security Certification, Software Development Certification  

IT Management

Employee Performance, ITIL, Productivity, Project Management, Software Compliance, Sarbanes Oxley Compliance, Service Management, Desktop Management  
    

Knowledge Management

Collaboration, Collaborative Commerce, Contact Management, Content Delivery, Content Integration, Content Management System, Corporate Portals, Customer Experience Management, Document Management, Information Management, Intranets, Messaging, Records Management, Search And Retrieval, Search Engines, Secure Content Management, SLA  

Networking

Active Directory, Bandwidth Management, Convergence, Distributed Computing, Ethernet Networking, Fibre Channel, Gigabit Networking, Governance, Grid Computing, Infrastructure, Internetworking Hardware, Interoperability, IP Networks, IP Telephony, Local Area Networking, Load Balancing, Migration, Monitoring, Network Architecture, Network Management, Network Performance, Network Performance Management, Network Provisioning, Network Security, OLAP, Optical Networking, Quality Of Service, Remote Access, Remote Network Management, Server Hardware, Servers, Small Business Networks, TCP/IP Protocol, Test And Measurement, Traffic Management, Tunneling, Utility Computing, VPN, Wide Area Networks, Green Computing, Cloud Computing, Power and Cooling, Data Center Design and Management, Colocation and Web Hosting  
    

Platforms

AS/400, Domino, Linux, Microsoft Exchange, Oracle, PeopleSoft, SAP, Siebel, Solaris, Tivoli, Unix, Web Sphere, Windows, Windows Server  

Security

Access Control, Anti Spam, Anti Spyware, Anti Virus, Application Security, Auditing, Authentication, Biometrics, Business Continuity, Compliance, DDoS, Disaster Recovery, Email Security, Encryption, Firewalls, Hacker Detection, High Availability, Identity Management, Internet Security, Intrusion Detection, Intrusion Prevention, IPSec, Network Security Appliance, Password Management, Patch Management, Phishing, PKI, Policy Based Management, Security Management, Security Policies, Single Sign On, SSL, Secure Instant Messaging, Web Service Security, PCI Compliance, Vulnerability Management  
    

Software Development

.NET, C++, Database Development, Java, Middleware, Open Source, Software Outsourcing, Quality Assurance, Scripting, SOAP, Software Testing, Visual Basic, Web Development, Web Services, Web Service Security, XML  

Storage

Backup And Recovery, Blade Servers, Clustering, IP Storage, ISCSI, Network Attached Storage, RAID, Storage Area Networks, Storage Management, Storage Virtualization, Email Archiving, Data Deduplication  
    

Wireless

802.11, Bluetooth, CDMA, GPS, Mobile Computing, Mobile Data Systems, Mobile Workers, PDA, RFID, Smart Phones, WiFi, Wireless Application Software, Wireless Communications, Wireless Hardware, Wireless Infrastructure, Wireless Messaging, Wireless Phones, Wireless Security, Wireless Service Providers, WLAN  
Search