Introduction to Data Warehousing:
Heterogeneous information, Integration problem. Warehouse architecture. Data warehousing, Warehouse vs DBMS. Aggregations: SQL and Aggregations, Aggregation functions and Grouping. Data Warehouse Models and OLAP Operations: Decision support; Data Marts, OLAP vs OLTP. Multi- Dimensional data model. Dimensional Modelling. ROLAP vs MOLAP; Star and snowflake schemas; the MOLAP cube; roll-up, slicing, and pivoting.
Issues in Data Warehouse Design:
Design issues - Monitoring, Wrappers, Integration, Data cleaning, Data loading, Materialised views, Warehouse maintenance, OLAP servers and Metadata. Building Data Warehouses: Conceptual data modeling, Entity-Relationship (ER) modeling and Dimension modeling. Data warehouse design using ER approach. Aspects of building data warehouses.
Introducing Data Mining:
KDD Process, Problems and Techniques, Data Mining Applications, Prospects for the Technology. CRISP-DM Methodology: Approach, Objectives, Documents, Structure, Binding to Contexts, Phases, Task, and Outputs. Data Mining Inputs and Outputs: Concepts, Instances, Attributes. Kinds of Learning, Kinds of Attributes and Preparing Inputs. Knowledge representations – Decision tables and Decision trees, Classification rules, Association rules, Regression trees & Model trees and Instance-Level representations.
Data Mining Algorithms:
One-R, Naïve Bayes Classifier, Decision trees, Decision rules, Association Rules, Regression, K-Nearest Neighbour Classifiers.
Evaluating Data Mining Results:
Issues in Evaluation; Training and Testing Principles; Error Measures, Holdout, Cross Validation. Comparing Algorithms; Taking costs into account and TradeOffs in the Confusion Matrix.
Course outcomes:
At the end of the course the student will be able to:
Question paper pattern:
The SEE question paper will be set for 100 marks and the marks scored will be proportionately reduced to 60.
Textbook/ Textbooks
1 Fundamentals of Data Warehouses M. Jarke, M. Lenzerini, Y. Vassiliou , Springer-Verlag P. Vassiliadis (ed.), 1999
2 Data Mining: Concepts and Techniques J. Han and M. Kamber, Morgan Kaufman 2000
Reference Books
1 The Data Warehouse Toolkit Ralph Kimball, Wiley 1996
2 Principles of Data Mining D. Hand, H. Mannila and P. Smyth MIT Press 2001
3 Data Mining: Introductory and Advanced Topic M. H. Dunham, Prentice Hall 2003