Note for Masters in Learning Analytics students: if you have not taken a Python course, this activity can be used to demonstrate your proficiency in Python. Note for 6191 students: this activity is not required or graded, but may be useful if you took or last used Python a while ago, or have relatively low levels of Python experience. Note for LASER BEAM students: this activity is a required module activity. For this assignment, you will be running and modifying Python code using Google Colab which can be accessed via the link below. Google Colab Starter Code Each question in this assignment will refer to sections of output and results from this code - some analyses will require you to modify or add to the code, so please read and follow all instructions for each question and throughout the provided Colab code. The questions here are meant to help you check your work and ensure that the code and analyses are running properly. This assignment is meant to introduce you to the coding environment and explore data that was collected in a digital learning platform. While you are not asked to write or modify large portions of code yourself in this assignment, please note that you may need to do so in future assignments. Be sure to open the starter code and download any required data before continuing. |
The starter code can be accessed via the following link: Google Colab Starter Code Data Descriptives Run the "Data Descriptives" code cells to load the dataset. Let's make sure that everything was loaded successfully - What is the first "student_id" in the dataset? |
The starter code can be accessed via the following link: Google Colab Starter Code Descriptive Statistics Run the "Descriptive Statistics" code cells (Note, these require you to make some small additions to the code). Among all the features, two stand out as having particularly large standard deviations and maximum values. What is the largest Max value (rounded to two decimal places)? |
The starter code can be accessed via the following link: Google Colab Starter Code Feature Correlations Run the "Feature Correlations" code cells and examine the correlation matrix that gets generated. Using the default threshold of 0.6, how many feature pairs emerge as being highly correlated? |
The starter code can be accessed via the following link: Google Colab Starter Code Feature Selection and Logistic Regression First, modify the "Feature Selection" code cell to remove one feature from each of the highly-correlated pairs identified from the previous analysis. Then, run the "Logistic Regression" code cell and examine the output. If you are unfamiliar with the performance metrics reported, note that these and others will be introduced in a later module (but note that higher scores indicate better performance for each of these metrics). Once you have examined the logistic regression output, continue to the next question |
The starter code can be accessed via the following link: Google Colab Starter Code Logistic Regression Now that you have tried this yourself, try running the code again by modifying (and re-running) the feature selection and logistic regression code cells to utilize the following selected feature set:
Which of these features is found to be most predictive (i.e. has the highest-magnitude coefficient) of students' assignment completion? |
The starter code can be accessed via the following link: Google Colab Starter Code Going Deeper Try re-introducing the features that you had removed earlier -- what would the performance results and coefficients look like if you had not removed the highly-correlated features? Which feature set resulted in the highest model performance? Did the direction of the coefficients (positive or negative) and magnitude align with your expectations? You do not need to answer these here, but do take a moment to consider these questions before proceeding to complete the assignment. It is important to not only observe the results of an analysis, but also to consider what the results are revealing about the subject matter. |