Prediction System of Traffic collisions in Toronto

[Code] [Dataset] [PPT]

This project involves building a predictive model using the Killed or Seriously Injured (KSI) collision dataset provided by the Toronto Police Service. The goal is to predict whether an individual involved in a collision is fatal or non-fatal based on their specific information.

The main challenge of this project lies in the dataset itself, as the fatal or non-fatal classification is based on the overall accident rather than the individual person involved.

   
       
   

In a traffic accident, multiple individuals may be involved, each associated with different vehicles. However, how does the dataset classify an accident as fatal or non-fatal? Typically, it is categorized as fatal if at least one person dies. If we aim to build a model to predict whether an individual’s outcome is fatal or non-fatal, we need to exclude all individuals incorrectly marked as fatal. These individuals are labeled as such solely because the accident was fatal, even though they themselves did not die.

   
       
   
Bin several categories into one.
   
       
   
feature selection
   
       
   
deal with imbalanced data.
   
       
   
Compared with 5 different models.
   
       
   
Confusion matrix
   
       
   
ROC
   
       
   
The result of model comparison