Introduction
What is Data Science? - Big Data and Data Science hype - Current landscape of perspectives - Skill sets needed, Statistical Inference - Populations and samples - Statistical modeling, probability distributions, fitting a model - Introduction to R
Exploratory Data Analysis and the Data Science Process
Basic tools (plots, graphs and summary statistics) of EDA - Philosophy of EDA - The Data Science Process - Case Study, Three Basic Machine Learning Algorithms - Linear Regression - k-Nearest Neighbors (k-NN) - kmeans , Data Wrangling
Feature Generation and Feature Selection (Extracting Meaning from Data)
Feature Selection algorithms – Filters; Wrappers; Decision Trees; Random Forests, Recommendation Systems, Dimensionality Reduction - Singular Value Decomposition - Principal Component Analysis
Mining Social-
Network Graphs - Social networks as graphs - Clustering of graphs - Direct discovery of communities in graphs - Partitioning of graphs - Neighbourhood properties in graphs
Data Visualization
Basic principles, ideas and tools for data visualization, Examples of inspiring (industry) projects, Data Science and Ethical Issues - Discussions on privacy, security, ethics - A look back at Data Science - Next-generation data scientists
Course outcomes:
At the end of the course the student will be able to:
1. To apply data science and related skill sets
2. To understand Statistical Inference, probability distributions commonly , statistical modeling and model fitting
3. Apply R to carry out basic statistical modeling and analysis.
4. Apply exploratory data analysis (EDA) in data science.
5. To Apply the data science process
Question paper pattern:
The SEE question paper will be set for 100 marks and the marks scored will be proportionately reduced to 60.
Textbook/ Textbooks
1 Doing Data Science, Straight Talk From The Frontline. Cathy O’Neil and Rachel Schutt O'Reilly Media 2014
2 Mining of Massive Datasets. v2.1 Jure Leskovek, Anand Rajaraman and Jeffrey Ullman Cambridge University Press 2014
Reference Books
1 Machine Learning: A Probabilistic Perspective Kevin P. Murphy MIT Press 2012
2 Data Science for Business Foster Provost and Tom Fawcett O'Reilly Media 2013
3 Foundations of Data Science Avrim Blum, John Hopcroft and Ravindran Kannan K.,Vachtsevanos, George J Cambridge University Press 2020.
4 Data Mining and Analysis: Fundamental Concepts and Algorithms Mohammed J. Zaki and Wagner Miera Jr Cambridge University Press 2014.