Overview and concepts Data Warehousing and Business Intelligence:
Why reporting and Analysing data, Raw data to valuable information-Lifecycle of Data - What is Business Intelligence - BI and DW in today’s perspective - What is data warehousing - The building Blocks: Defining Features - Data warehouses and data 1marts - Overview of the components - Metadata in the data warehouse - Need for data warehousing - Basic elements of data warehousing - trends in data warehousing.
The Architecture of BI and DW
BI and DW architectures and its types - Relation between BI and DW - OLAP (Online analytical processing) definitions - Difference between OLAP and OLTP - Dimensional analysis - What are cubes? Drill-down and roll-up - slice and dice or rotation - OLAP models - ROLAP versus MOLAP - defining schemas: Stars, snowflakes and fact constellations.
Introduction to data mining (DM):
Motivation for Data Mining - Data Mining-Definition and Functionalities – Classification of DM Systems - DM task primitives - Integration of a Data Mining system with a Database or a Data Warehouse - Issues in DM – KDD Process
Data Pre-processing:
Why to pre-process data? - Data cleaning: Missing Values, Noisy Data - Data Integration and transformation - Data Reduction: Data cube aggregation, Dimensionality reduction - Data Compression - Numerosity Reduction - Data Mining Primitives - Languages and System Architectures: Task relevant data - Kind of Knowledge to be mined - Discretization and Concept Hierarchy.
Concept Description and Association Rule Mining
What is concept description? - Data Generalization and summarization-based characterization - Attribute relevance - class comparisons Association Rule Mining: Market basket analysis - basic concepts - Finding frequent item sets: Apriori algorithm - generating rules – Improved Apriori algorithm – Incremental ARM – Associative Classification – Rule Mining.
Classification and prediction:
What is classification and prediction? – Issues regarding Classification and prediction: Classification methods: Decision tree, Bayesian Classification, Rule based, CART, Neural Network Prediction methods: Linear and nonlinear regression, Logistic Regression. Introduction of tools such as DB Miner /WEKA/DTREG DM Tools.
Data Mining for Business Intelligence Applications:
Data mining for business Applications like Balanced Scorecard, Fraud Detection, Clickstream Mining, Market Segmentation, retail industry, telecommunications industry, banking & finance and CRM etc., Data Analytics Life Cycle: Introduction to Big data Business Analytics - State of the practice in analytics role of data scientists Key roles for successful analytic project - Main phases of life cycle - Developing core deliverables for stakeholders.
Question Paper Pattern:
• The Question paper will have TEN questions
• Each full question will be for 20 marks
• There will be 02 full questions (with maximum of four sub questions) from each module.
• Each full question will have sub questions covering all the topics under a module.
• The students will have to answer FIVE full questions, selecting one full question from each module.
Textbook
1. J. Han, M. Kamber, “Data Mining Concepts and Techniques”, Morgan Kaufmann
2. M. Kantardzic, “Data mining: Concepts, models, methods and algorithms, John Wiley &Sons Inc.
3. PaulrajPonnian, “Data Warehousing Fundamentals”, John Willey.
4. M. Dunham, “Data Mining: Introductory and Advanced Topics”, Pearson Education.
5. G. Shmueli, N.R. Patel, P.C. Bruce, “Data Mining for Business Intelligence: Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner”, Wiley India
Course Outcomes:
At the end of the course, students will be able to
CO1: Analyse the concept of data warehouse, Business Intelligence and OLAP
CO2: Demonstrate data pre-processing techniques and application of association rule mining algorithms
CO3: Apply various classification algorithms and evaluation of classifiers for the given problem
CO4: Analyse data mining for various business intelligence applications for the given problem
CO5: Apply classification and regression techniques for the given problem.