21CV481 Data Cleaning and Preparation with Python Pandas syllabus for CV



A d v e r t i s e m e n t

Module-1 Introduction to Pandas 0 hours

Introduction to Pandas – Panel data structure, Series, Data Frame, indices, datatypes of columns, sorting, copying.

 

Indexing and selecting data:

Different choices for indexing, Attribute access, slicing, selection by label, selection by position, selection by callable, Boolean indexing.

Module-2 MultiIndex and advanced indexing 0 hours

MultiIndex and advanced indexing,

Merge, join, concatenate and compare Data Frames Reshaping and pivot tables

Module-3 Working with text data 0 hours

Working with text data Working with missing data

Module-4 Grouping 0 hours

Grouping:

Splitting an object into groups, Iterating through groups, Selecting a group, Aggregation, Transformation, Filtration.

Module-5 Time series / date functionality 0 hours

Time series / date functionality,

Time deltas, Plotting, Handling large datasets

 

Course outcome (Course Skill Set)

At the end of the course the student will be able to:

1. Perform operations on data structure and data manipulation

2. Develop solutions using matrix method

3. Manage and maintain large data base

 

Assessment Details (both CIE and SEE)

  • The weightage of Continuous Internal Evaluation (CIE) is 50% and for Semester End Exam (SEE) is 50%.
  • The minimum passing mark for the CIE is 40% of the maximum marks (20 marks out of 50).
  • A student shall be deemed to have satisfied the academic requirements and earned the credits allotted to each subject/ course if the student secures not less than 35% ( 18 Marks out of 50)in the semester-end examination(SEE), and a minimum of 40% (40 marks out of 100) in the sum total of the CIE (Continuous Internal Evaluation) and SEE (Semester End Examination) taken together

Continuous internal Examination (CIE)

Three Tests (preferably in MCQ pattern with 20 questions) each of 20 Marks (duration 01 hour)

1. First test at the end of 5th week of the semester

2. Second test at the end of the 10th week of the semester

3. Third test at the end of the 15th week of the semester

 

Two assignments each of 10 Marks

1. First assignment at the end of 4th week of the semester

2. Second assignment at the end of 9th week of the semester

 

Quiz/Group discussion/Seminar, any two of three suitably planned to attain the COs and POs for 20 Marks (duration 01 hours)

The sum of total marks of three tests, two assignments, and quiz /seminar/ group discussion will be out of 100 marks and shall be scaled down to 50 marks

 

Semester End Examinations (SEE)

  • SEE paper shall be set for 50 questions, each of 01 mark.
  • The pattern of the question paper is MCQ (multiple choice questions). The time allotted for SEE is 01 hour.
  • The student has to secure minimum of 35% of the maximum marks meant for SEE.

 

Suggested Learning Resources:

Books

1. Pandas documentation at https://pandas.pydata.org/pandas-docs/stable/

2. Wes McKinney, Python for Data Analysis, 2ed., O’Reilly Media, 2017.

3. Matt Harrison, Learning the Pandas Library, 2016

Last Updated: Tuesday, January 24, 2023