> SPSS > Hard Hat Area: Myths and Pitfalls of Data Mining.
Hard Hat Area: Myths and Pitfalls of Data Mining. White Paper Published By:
SPSS
The intrepid data miner runs many risks, including being buried under mountains of data or disappearing along with the "mysterious disappearing terabyte." This article outlines some risks, debunks some myths, and attempts to provide some protective "hard hats" for data miners in the technology sector.
Javascript Disabled To use our site, you must enable JavaScript.
Published:
Jun 30, 2009
Type:
White Paper
Length:
8 pages
Executive brief
Hard Hat Area: Myths and
Pitfalls of Data Mining
By Tom Khabaza
Director of Data Mining, SPSS
Table of contentsIntroduction........................................................................................................................... 2Myths and misconceptions about data mining....................................................................... 2Pitfalls of data mining and how to avoid them....................................................................... 4Conclusion............................................................................................................................ 8
SPSS is a registered trademark and the other SPSS products named are trademarks of SPSS Inc. All other names are trademarks of their respective owners. © 2007 SPSS Inc. All rights reserved. HHAEB-1207IntroductionThe intrepid data miner runs many risks, including being buried under mountains of data or disappearing along with the "mysterious disappearing terabyte." Myths and misconceptions create their own risks and need to be debunked. This article outlines some risks, debunks some myths, and attempts to provide some protective "hard hats" for data miners.
It's critical to understand that data mining is a business process-a way of finding patterns in your data that provide insight you can use to conduct your business more effectively. Data mining also makes predictions to guide customer interactions and other business decisions.
Myths and misconceptions about data miningMyth #1: Data mining is done in the lab, by a technology expertData mining uses advanced technology, and its workings, particularly those of modeling techniques, are unlikely to be understood by the wider IT community. Does this mean that data mining should take place in the laboratory and be conducted only by those who understand every nuance of the technology that is involved?
Quite the opposite is true, because data mining is a business process in which business knowledge is of paramount importance: the value of data mining is realized only when the results are put to use in business operations.
When performed without business knowledge, data mining can produce nonsensical or useless results (see pitfall #4, below), so it is essential that data mining be performed by someone with extensive knowledge of the business problem. Very seldom is this the same person who has extensive knowledge of the data mining technology. It is the responsibility of data mining tool providers to ensure that tools are accessible to business users.
Of equal importance is the need to deploy results into the business, to put them to use. Data miners should plan at the start of a project how their results will fit into business operational processes. Organizations should acquire infrastructure which enables them to deploy data mining results efficiently across the organization, and tool providers should ensure that their tools fit easily into this infrastructure.
"Data mining is a business process in which business knowledge is of paramount importance: the value of data mining is realized only when the results are put to use in business operations."
2 Hard Hat Area: Myths and Pitfalls of Data MiningMyth #2: Data mining is all about algorithmsA businessperson attending a typical data mining conference or reading its proceedings might form the impression that data mining is all about advanced data analysis algorithms. This misconception might be summarized as follows: "All you need for data mining is good algorithms. The better your algorithms, the better your data mining. Advancing the effectiveness of data mining means advancing our knowledge of algorithms."
To hold this view is to misunderstand the data mining process. Data mining is a process consisting of many elements, such as formulating business goals, mapping business goals to data mining goals, acquiring, understanding, and pre-processing the data, evaluating and presenting the results of analysis, and deploying these results to achieve business benefits.
This is not to minimize the importance of new or improved data mining algorithms. The problem occurs when data miners focus too much on the algorithms and ignore the other 90-95 percent of the data mining process.
The consequences of this misconception can be disastrous for a data mining project, possibly res... [download for more]
Browse Technology Topics
Application Integration ,
Analytical Applications ,
Business Intelligence ... more , Configuration Management , Database Development , Data Integration , Data Mining , Data Protection , Data Quality , Data Replication , Database Security , EDI , SOAP , Service Oriented Architecture , Web Service Management , Data Warehousing less Analog Communications ,
Digital Signal Processing ,
Electronic Design Automation ... more , System On A Chip , Electronic Test and Measurement , Embedded Design , Boards & Modules , Embedded Systems and Networking , Electromechanical & Mechanical , Optoelectonics & Displays , Packaging and Interconnects , Passive & Discrete Components , Power Sources & Conditioning Devices , Integrated Circuits and Semiconductors , Sensors & Actuators less Application Integration ,
Application Performance Management ... more , Best Practices , Business Activity Monitoring , Business Analytics , Business Integration , Business Intelligence , Business Management , Business Metrics , Business Process Automation , Business Process Management , Call Center Management , Call Center Software , Change Management , Corporate Governance , Customer Interaction Service , Customer Relationship Management , Customer Satisfaction , Customer Service , EBusiness , Enterprise Resource Planning , Enterprise Software , EProcurement , Extranets , Groupware Workflow , HIPAA Compliance , IP Faxing , IT Spending , Marketing Automation , Performance Testing , Product Lifecycle Management , Project Management , Return On Investment , Risk Management , Sales & Marketing Software , Sales Automation , Server Virtualization , Simulation Software , Supply Chain Management , System Management Software , Total Cost of Ownership , Video Conferencing , Voice Recognition , Voice Over IP , Workforce Management , Incentive Compensation , Spend Management , Manufacturing Execution Systems , International Computing less Human Resources Services ,
Payroll Software ,
Time and Attendance Software ... more , Workforce Management Software , Financial Management , Employee Monitoring Software , Employee Training Software , Recruiting Software/Services , Employee Performance Management , ELearning , Benefits Management , Expense Management less Collaboration ,
Collaborative Commerce ,
Contact Management ... more , Content Delivery , Content Integration , Content Management System , Corporate Portals , Customer Experience Management , Document Management , Information Management , Intranets , Messaging , Records Management , Search And Retrieval , Search Engines , Secure Content Management , SLA less Active Directory ,
Bandwidth Management ,
Convergence ,
Distributed Computing ... more , Ethernet Networking , Fibre Channel , Gigabit Networking , Governance , Grid Computing , Infrastructure , Internetworking Hardware , Interoperability , IP Networks , IP Telephony , Local Area Networking , Load Balancing , Migration , Monitoring , Network Architecture , Network Management , Network Performance , Network Performance Management , Network Provisioning , Network Security , OLAP , Optical Networking , Quality Of Service , Remote Access , Remote Network Management , Server Hardware , Servers , Small Business Networks , TCP/IP Protocol , Test And Measurement , Traffic Management , Tunneling , Utility Computing , VPN , Wide Area Networks , Green Computing , Cloud Computing , Power and Cooling , Data Center Design and Management , Colocation and Web Hosting less AS/400 ,
Domino ,
Linux ,
Microsoft Exchange ,
Oracle ,
PeopleSoft ... more , SAP , Siebel , Solaris , Tivoli , Unix , Web Sphere , Windows , Windows Server less Access Control ,
Anti Spam ,
Anti Spyware ,
Anti Virus ,
Application Security ... more , Auditing , Authentication , Biometrics , Business Continuity , Compliance , DDoS , Disaster Recovery , Email Security , Encryption , Firewalls , Hacker Detection , High Availability , Identity Management , Internet Security , Intrusion Detection , Intrusion Prevention , IPSec , Network Security Appliance , Password Management , Patch Management , Phishing , PKI , Policy Based Management , Security Management , Security Policies , Single Sign On , SSL , Secure Instant Messaging , Web Service Security , PCI Compliance , Vulnerability Management less .NET ,
C++ ,
Database Development ,
Java ,
Middleware ,
Open Source ... more , Software Outsourcing , Quality Assurance , Scripting , SOAP , Software Testing , Visual Basic , Web Development , Web Services , Web Service Security , XML less Backup And Recovery ,
Blade Servers ,
Clustering ,
IP Storage ... more , ISCSI , Network Attached Storage , RAID , Storage Area Networks , Storage Management , Storage Virtualization , Email Archiving , Data Deduplication less 802.11 ,
Bluetooth ,
CDMA ,
GPS ,
Mobile Computing ,
Mobile Data Systems ... more , Mobile Workers , PDA , RFID , Smart Phones , WiFi , Wireless Application Software , Wireless Communications , Wireless Hardware , Wireless Infrastructure , Wireless Messaging , Wireless Phones , Wireless Security , Wireless Service Providers , WLAN less