18CS641 Data Mining and Data Warehousing syllabus for CS



A d v e r t i s e m e n t

Module-1 Data Warehousing & modeling 8 hours

Data Warehousing & modeling:

Basic Concepts: Data Warehousing: A multitier Architecture, Data warehouse models: Enterprise warehouse, Data mart and virtual warehouse, Extraction, Transformation and loading, Data Cube: A multidimensional data model, Stars, Snowflakes and Fact constellations: Schemas for multidimensional Data models, Dimensions: The role of concept Hierarchies, Measures: Their Categorization and computation, Typical OLAP Operations

Textbook 2: Ch.4.1,4.2

Module-2 Data warehouse implementation& Data mining 8 hours

Data warehouse implementation & Data mining:

Efficient Data Cube computation: An overview, Indexing OLAP Data: Bitmap index and join index, Efficient processing of OLAP Queries, OLAP server Architecture ROLAP versus MOLAP Versus HOLAP. : Introduction: What is data mining, Challenges, Data Mining Tasks, Data: Types of Data, Data Quality, Data Preprocessing, Measures of Similarity and Dissimilarity.

Textbook 2: Ch.4.4

Textbook 1: Ch.1.1,1.2,1.4, 2.1 to 2.4

Module-3 Association Analysis 8 hours

Association Analysis:

Association Analysis: Problem Definition, Frequent Item set Generation, Rule generation. Alternative Methods for Generating Frequent Item sets, FPGrowth Algorithm, Evaluation of Association Patterns.

Textbook 1: Ch 6.1 to 6.7 (Excluding 6.4)

Module-4 Classification 8 hours

Classification:

Decision Trees Induction, Method for Comparing Classifiers, Rule Based Classifiers, Nearest Neighbor Classifiers, Bayesian Classifiers.

Textbook 1: Ch 4.3,4.6,5.1,5.2,5.3

Module-5 Clustering Analysis 8 hours

Clustering Analysis:

Overview, K-Means, Agglomerative Hierarchical Clustering, DBSCAN, Cluster Evaluation, Density-Based Clustering, Graph-Based Clustering, Scalable Clustering Algorithms.

Textbook 1: Ch 8.1 to 8.5, 9.3 to 9.5

 

Course Outcomes:

The student will be able to :

  • Identify data mining problems and implement the data warehouse
  • Write association rules for a given data pattern.
  • Choose between classification and clustering solution.

 

Question Paper Pattern:

  • The question paper will have ten questions.
  • Each full Question consisting of 20 marks
  • There will be 2 full questions (with a maximum of four sub questions) from each module.
  • Each full question will have sub questions covering all the topics under a module.
  • The students will have to answer 5 full questions, selecting one full question from each module.

 

Textbooks:

1. Pang-Ning Tan, Michael Steinbach, Vipin Kumar: Introduction to Data Mining, Pearson, First impression,2014.

2. Jiawei Han, Micheline Kamber, Jian Pei: Data Mining -Concepts and Techniques, 3rd Edition, Morgan Kaufmann Publisher, 2012.

 

Reference Books:

1. Sam Anahory, Dennis Murray: Data Warehousing in the Real World, Pearson,Tenth Impression,2012.

2. Michael.J.Berry,Gordon.S.Linoff: Mastering Data Mining , Wiley Edition, second edtion,2012.

Last Updated: Tuesday, January 24, 2023