Students Performance project

In this week, we’ll use the following dataset:

image.png

!wget <https://github.com/alexeygrigorev/datasets/raw/refs/heads/master/jamb_exam_results.csv>

Let’s open the dataset with Pandas package:

df_raw = pd.read_csv("/content/jamb_exam_results.csv")
df_raw.head(5)
JAMB_Score Study_Hours_Per_Week Attendance_Rate Teacher_Quality Distance_To_School School_Type School_Location Extra_Tutorials Access_To_Learning_Materials Parent_Involvement IT_Knowledge Student_ID Age Gender Socioeconomic_Status Parent_Education_Level Assignments_Completed
0 192 22 78 4 12.4 Public Urban Yes Yes High Medium 1 17 Male Low Tertiary 2
1 207 14 88 4 2.7 Public Rural No Yes High High 2 15 Male High NaN 1
2 182 29 87 2 9.6 Public Rural Yes Yes High Medium 3 20 Female High Tertiary 2
3 210 29 99 2 2.6 Public Urban No Yes Medium High 4 22 Female Medium Tertiary 1
4 199 12 98 3 8.8 Public Urban No Yes Medium Medium 5 22 Female Medium Tertiary 1

Preparing data

In previous chapters, we already learned to perform some techniques for prepare the data which also we’ll use in this project.

Lets make the columns names lowercase

This step makes handling the features the easiest:

df_raw = df_raw.rename(str.lower, axis='columns')
df_raw.columns
#Index(['jamb_score', 'study_hours_per_week', 'attendance_rate',
#       'teacher_quality', 'distance_to_school', 'school_type',
#       'school_location', 'extra_tutorials', 'access_to_learning_materials',
#       'parent_involvement', 'it_knowledge', 'student_id', 'age', 'gender',
#       'socioeconomic_status', 'parent_education_level',
#       'assignments_completed'],
#      dtype='object')

Removing some useless columns

Some columns are irrelevant to our objective, so we decided to remove them:

df = df_raw.drop(["student_id"], axis=1)
df.head(5)
jamb_score study_hours_per_week attendance_rate teacher_quality distance_to_school school_type school_location extra_tutorials access_to_learning_materials parent_involvement it_knowledge age gender socioeconomic_status parent_education_level assignments_completed
0 192 22 78 4 12.4 Public Urban Yes Yes High Medium 17 Male Low Tertiary 2
1 207 14 88 4 2.7 Public Rural No Yes High High 15 Male High NaN 1
2 182 29 87 2 9.6 Public Rural Yes Yes High Medium 20 Female High Tertiary 2
3 210 29 99 2 2.6 Public Urban No Yes Medium High 22 Female Medium Tertiary 1
4 199 12 98 3 8.8 Public Urban No Yes Medium Medium 22 Female Medium Tertiary 1

Looking for the missing value and filling them with zeros

Our models aren’t handling the missing values:

df.isna().sum()