Students Performance project

In this week, we’ll use the following dataset:

!wget <https://github.com/alexeygrigorev/datasets/raw/refs/heads/master/jamb_exam_results.csv>

Let’s open the dataset with Pandas package:

df_raw = pd.read_csv("/content/jamb_exam_results.csv")
df_raw.head(5)

	JAMB_Score	Study_Hours_Per_Week	Attendance_Rate	Teacher_Quality	Distance_To_School	School_Type	School_Location	Extra_Tutorials	Access_To_Learning_Materials	Parent_Involvement	IT_Knowledge	Student_ID	Age	Gender	Socioeconomic_Status	Parent_Education_Level	Assignments_Completed
0	192	22	78	4	12.4	Public	Urban	Yes	Yes	High	Medium	1	17	Male	Low	Tertiary	2
1	207	14	88	4	2.7	Public	Rural	No	Yes	High	High	2	15	Male	High	NaN	1
2	182	29	87	2	9.6	Public	Rural	Yes	Yes	High	Medium	3	20	Female	High	Tertiary	2
3	210	29	99	2	2.6	Public	Urban	No	Yes	Medium	High	4	22	Female	Medium	Tertiary	1
4	199	12	98	3	8.8	Public	Urban	No	Yes	Medium	Medium	5	22	Female	Medium	Tertiary	1

Preparing data

In previous chapters, we already learned to perform some techniques for prepare the data which also we’ll use in this project.

Lets make the columns names lowercase

This step makes handling the features the easiest:

df_raw = df_raw.rename(str.lower, axis='columns')
df_raw.columns
#Index(['jamb_score', 'study_hours_per_week', 'attendance_rate',
#       'teacher_quality', 'distance_to_school', 'school_type',
#       'school_location', 'extra_tutorials', 'access_to_learning_materials',
#       'parent_involvement', 'it_knowledge', 'student_id', 'age', 'gender',
#       'socioeconomic_status', 'parent_education_level',
#       'assignments_completed'],
#      dtype='object')

Removing some useless columns

Some columns are irrelevant to our objective, so we decided to remove them:

df = df_raw.drop(["student_id"], axis=1)
df.head(5)

	jamb_score	study_hours_per_week	attendance_rate	teacher_quality	distance_to_school	school_type	school_location	extra_tutorials	access_to_learning_materials	parent_involvement	it_knowledge	age	gender	socioeconomic_status	parent_education_level	assignments_completed
0	192	22	78	4	12.4	Public	Urban	Yes	Yes	High	Medium	17	Male	Low	Tertiary	2
1	207	14	88	4	2.7	Public	Rural	No	Yes	High	High	15	Male	High	NaN	1
2	182	29	87	2	9.6	Public	Rural	Yes	Yes	High	Medium	20	Female	High	Tertiary	2
3	210	29	99	2	2.6	Public	Urban	No	Yes	Medium	High	22	Female	Medium	Tertiary	1
4	199	12	98	3	8.8	Public	Urban	No	Yes	Medium	Medium	22	Female	Medium	Tertiary	1

Looking for the missing value and filling them with zeros

Our models aren’t handling the missing values:

df.isna().sum()