In this lab exercise, you will learn a popular machine learning algorithm, Decision Tree. You will use this classification algorithm to build a model from historical data of region and their total cases. Then you use the trained decision tree to predict the Risk Index of a region.
In Cloud Pak for Data, click on the Assets tab on top, under Asset types expand the Source code tab and select Notebook.
You will see the three notebooks listed. You will refer to the Region-All-Decision-Tree.ipynb notebook.
Click on the three dot menu and select edit to get started.
The notebook should look something as shown below.
Before running the notebook, you need to add the S3 connection to the notebook.
Verify the dataframe name to be data_df_1
in the generated code snippet.
Click on Cell and select Run All to run the notebook.
This will run the notebook, it will take some time please be patient.
Once the notebook is completed you can observe the following in the notebook:
Decision Tree Model Accuracy: You can observe the accuracy of the model is 86.63%.
Decision Tree Visualization: You can observe the decision tree in the notebook.
You have successfully completed this lab exercise. You can proceed to the next step.