The receiver manipulation characteristic (AUROC) is an indicator of the performance of a binary classification model. The recipient manipulation characteristic is the ratio of the true positive rate (TPR) to the false positive rate (FPR).
For more information on TPR and FPR, you can check here. A brief description of the statistics appearing in the confusion matrix is as follows.
- True Positive Rate (TPR): It is the probability that artificial intelligence predicts that what is true is true. It is also known as recall, sensitivity, or hit rate.
- 거짓 양성 비율(FPR) : 1 - Specificity로, 여기서 특이도(Specificity)는 실제 FALSE인 것을 인공지능이 FALSE라고 예측할 확률입니다. 실제 오���보율 (False Alarm Rate)이라고도 부릅니다.
If you draw a graph with FPR on the x-axis and TPR on the y-axis, you can see the following results.
When creating a classification model, you can set a probability threshold between 0 and 1 for True/False predictions. The lower the probability threshold to classify as True, the higher the TPR, but at the same time the higher the FPR. Conversely, the higher the probability threshold of classifying as True, the lower the FPR, but at the same time the lower the TPR. Therefore, you need to find a threshold at which TPR and FPR are adjusted to suit the nature of the data and the purpose of classification.
For example, if you are testing for a highly contagious disease, such as Ebola or COVID-19, you should increase your TPR, even if the FPR rises due to lowering the threshold. This is necessary in such cases, because a single incorrect test result could be crucial as a single patient could spread the diseases quickly, so it is important to put more emphasis on the TPR in such cases.
An ROC of a model could be shown in a graph as in the example above. Each model has a single ROC value and the value could be used to compare multiple models, instead of having to go through multiple graphs to compare the models which may be a difficult task having to look back and forth between the different graphs. Therefore, it is easier to compare different models by using the area occupied by ROC and summarizing it into one number. This value of the area occupied by ROC is called AUC (Area Under the Curve). It is literally the area value under the line of the ROC graph.
The larger the area, the higher the TPR regardless of the FPR, so the higher the AUROC (Area Under ROC) (closer to 1), the better the model. On the other hand, the worst model will have an AUROC of 0.5. This is because an AUROC of 0.5 means the graph is a straight line with slope of 1, which means positive and negative classes cannot be distinguished. If the AUROC is lower than 0.5, it means that the classes are reversed, but significantly differentiated. Therefore, in such a model, you can interpret the result value as TRUE as FALSE and FALSE as TRUE.
F-Beta is also an indicator of the performance of a binary classification model. F-Beta is the sum of Recall and Precision and is calculated as shown below.
Here, indicates how weight will be put on Recall over Precision. can be determined according to the characteristics of the data and the purpose of classification. If Recall and Precision are equally important, choose will be 1. If Recall is more important, the value will be greater than 1, and if Precision is more important, the value between 0 and 1.
In a case where Recall is more important (if you need to reduce False Negative), like in cases of testing for infectious diseases, you should choose a higher value.
However, in cases that Precision is more important, such as a video recommendation (if you need to reduce false positives), you should choose a lower value. This is because the list of videos recommended by the model should contain many videos that the user is interested in and few videos that are not of interest.
You can find the selected model's AUROC and F-beta values from the 'details' tab of CLICK AI's automated machine learning platform. Through the 'See details' tab, you can additional indicators and information of the selected model.