Introduction to scikit-learn Python package, Iris data set. Getting and processing data: CSV files, Pandas package, Feature selection, Online data sources.
Data visualization using Matplotlib, Plotly. Supervised and Unsupervised learning
Regression:
Simple linear regression, Multiple linear regression, Decision tree, Random forests.
Classification:
Logistic regression, K-nearest neighbours, Decision tree classification, Random forests classification.
Clustering:
Goals and uses of clustering, K-means clustering, Anomaly detection, Association rule learning.
Artificial neural networks:
Definition, Example, Potential and constraints.
Course outcome (Course Skill Set)
At the end of the course the student will be able to:
1. Use online data sources for solving problems
2. Solve statistical problems and interpretation of results
3. Data visualization and graphical representation for decision making
4. Solve problems using artificial neural networks
Assessment Details (both CIE and SEE)
Continuous internal Examination (CIE)
Three Tests (preferably in MCQ pattern with 20 questions) each of 20 Marks (duration 01 hour)
1. First test at the end of 5th week of the semester
2. Second test at the end of the 10th week of the semester
3. Third test at the end of the 15th week of the semester
Two assignments each of 10 Marks
1. First assignment at the end of 4th week of the semester
2. Second assignment at the end of 9th week of the semester
Quiz/Group discussion/Seminar, any two of three suitably planned to attain the COs and POs for 20 Marks (duration 01 hours)
The sum of total marks of three tests, two assignments, and quiz /seminar/ group discussion will be out of 100 marks and shall be scaled down to 50 marks
Semester End Examinations (SEE)
Suggested Learning Resources:
Books
1. Peters Morgan, Data Analysis with Python, AI Sciences, 2016.
2. Wes McKinney, Python for Data Analysis, O’Reilly Media,