ErrorException Message: Argument 2 passed to WP_Translation_Controller::load_file() must be of the type string, null given, called in /homepages/23/d949784577/htdocs/clickandbuilds/Neuralnetlab/wp-includes/l10n.php on line 838
https://neuralnetlab.com/wp-content/plugins/dmca-badge/libraries/sidecar/classes/{"id":849,"date":"2021-12-09T23:07:20","date_gmt":"2021-12-09T23:07:20","guid":{"rendered":"https:\/\/neuralnetlab.com\/?p=849"},"modified":"2022-02-08T17:12:37","modified_gmt":"2022-02-08T17:12:37","slug":"auc-sklearn-with-practical-example","status":"publish","type":"post","link":"https:\/\/neuralnetlab.com\/auc-sklearn-with-practical-example\/","title":{"rendered":"auc sklearn with practical example"},"content":{"rendered":"\n

The auc sklearn is a method for assessing a binary classifier’s quality.  It measures the area under the ROC curve, which is also known as “AUC” to quantify how well a supervised classifier can distinguish between positive and negative classes. The auc sklearn ranges from 0, indicating a useless classification model, to a value of 1, a perfect prediction.  An auc sklearn is a useful, essential tool for a data scientist as a performance measure of a classifier’s quality and as a guide for model improvement.<\/p>\n\n\n\n

Having trouble with sklearn? You might want to check out this article. ModuleNotFoundError: No module named \u2018sklearn\u2019<\/strong><\/a>.<\/p>\n\n\n\n

auc sklearn<\/h2>\n\n\n\n

There are a few things to note about auc sklearn. \u00a0First, the AUC a is not a measure of how well the classifier performs on a test set; it is a measure of how well the classifier performs on the entire training set. \u00a0Second, the AUC is not always reliable, especially if the data set is small. \u00a0Third, it can be affected by the threshold value.<\/p>\n\n\n\n

Despite its limitations, the AUC sklearn is a valuable tool for assessing a classifier’s quality. \u00a0It is a good starting point for improving a classifier and can help data scientists to identify areas for improvement.<\/p>\n\n\n\n

Example: auc sklearn<\/h2>\n\n\n\n

Now let’s take a look at a good example to understand the concept behind the auc sklearn.<\/p>\n\n\n\n

Importing necessary python machine learning libraries<\/h3>\n\n\n\n

We need to import make_classification, import train_test_split, roc_curve, roc_auc_score, and matplotlib for this example. <\/p>\n\n\n\n

from sklearn.datasets import make_classification\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.metrics import roc_curve\nfrom sklearn.metrics import roc_auc_score\nimport matplotlib.pyplot as plt<\/code><\/pre>\n\n\n\n

Creating arbitrary data for auc sklearn example<\/h3>\n\n\n\n

We can create some imaginary data using sklearn for this example. We call them arbitrary data. And we create them using sklearn make_classification method shown below.<\/p>\n\n\n\n

Generating a arbitrary dataset with two classes<\/h4>\n\n\n\n
X, y = make_classification (n_samples=2000, n_classes=2, n_features=40, random_state=37)<\/code><\/pre>\n\n\n\n

Splitting the dataset into train and test sets<\/h3>\n\n\n\n
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=37)<\/code><\/pre>\n\n\n\n

Testing the arbitrary dataset using a machine learning classifier<\/h3>\n\n\n\n

In order to practically calculate auc sklearn, we can create a machine learning model and train it for our dataset. Then we can use the predicted values to understand the usage of auc sklearn.<\/p>\n\n\n\n

Importing logistic regression model from sklearn<\/h4>\n\n\n\n

I’m going to import the logistic regression model from sklearn linear model library.<\/p>\n\n\n\n

from sklearn.linear_model import LogisticRegression<\/code><\/pre>\n\n\n\n

Since we are talking about logistic regression, you might want to check out this article about linear regression Least Squares Regression Line<\/strong><\/a>.<\/p>\n\n\n\n

Creating the logistic regression model<\/h4>\n\n\n\n
LogRegModel = LogisticRegression()<\/code><\/pre>\n\n\n\n

Training the logostic regression model with abrbitratry data<\/h4>\n\n\n\n
LogRegModel.fit(X_train, y_train)<\/code><\/pre>\n\n\n\n

Pedicted Values<\/h4>\n\n\n\n
pred_val = LogRegModel.predict_proba(X_test)<\/code><\/pre>\n\n\n\n

computing the ROC for our logistic regression model and returning fpr, tpr and threshold values<\/h4>\n\n\n\n
fpr, tpr, threshold = roc_curve(y_test, pred_val[:,1], pos_label=1)<\/code><\/pre>\n\n\n\n
Calculating the curve where tpr = fpr (true positive rate = false poitive rate)<\/h5>\n\n\n\n
random_pred_val = [0 for i in range(len(y_test))]\np_fpr1, p_tpr1, _ = roc_curve(y_test, random_pred_val, pos_label=1)<\/code><\/pre>\n\n\n\n

Calculating the auc scores<\/h4>\n\n\n\n
auc_score = roc_auc_score(y_test, pred_val[:,1])<\/code><\/pre>\n\n\n\n
Let’s print the auc score out<\/h5>\n\n\n\n
print(auc_score)<\/code><\/pre>\n\n\n\n
\"print
print(auc_score)<\/figcaption><\/figure>\n\n\n\n

Here we have an auc score of 0.95 which is so close to 1.<\/p>\n\n\n\n

The code so far<\/h3>\n\n\n\n
from sklearn.datasets import make_classification\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.metrics import roc_curve\nfrom sklearn.metrics import roc_auc_score\nimport matplotlib.pyplot as plt\n\n# Let's generate an arbitrary dataset with two classes\nX, y = make_classification(n_samples=2000, n_classes=2, n_features=40, random_state=37)\n\n# Let's split the dataset into train and test sets\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=37)\n\n# Importing the logistic regression models from sklearn linear model directory\nfrom sklearn.linear_model import LogisticRegression\n\n# Creating the logistic regression machine learning model\nLogRegModel = LogisticRegression()\n\n# Training the model with the training data\nLogRegModel.fit(X_train, y_train)\n\n# Calculating the prediction values and assigning them into a variable\npred_val = LogRegModel.predict_proba(X_test)\n\n# roc curve for the logistic regression model\nfpr, tpr, threshold = roc_curve(y_test, pred_val[:,1], pos_label=1)\n\n# Calculating the curve where tpr = fpr\nrandom_pred_val = [0 for i in range(len(y_test))]\np_fpr1, p_tpr1, _ = roc_curve(y_test, random_pred_val, pos_label=1)\n\n# Calculating the auc score\nauc_score = roc_auc_score(y_test, pred_val[:,1])\n\n#Printing the auc score\nprint(auc_score)<\/code><\/pre>\n\n\n\n

Plotting the auc sklearn curve (ROC Curve)<\/h3>\n\n\n\n

Finally, we can plot the ROC curve. I’m going to use the matplotlib visualizations library to plot both ROC curves, the one with the predicted values, and for the condition where tpr is equal to for. Let’s write the codes responsible for the matplotlib functions now.<\/p>\n\n\n\n

Using seaborn style<\/h4>\n\n\n\n
plt.style.use('seaborn')<\/code><\/pre>\n\n\n\n

Plotting the roc curves<\/h4>\n\n\n\n
plt.plot(fpr, tpr, linestyle='--',color='red', label='Logistic Regression')\nplt.plot(p_fpr1, p_tpr1, linestyle='--', color='black')<\/code><\/pre>\n\n\n\n

Naming the plot title<\/h4>\n\n\n\n
plt.title('ROC curve plot')<\/code><\/pre>\n\n\n\n

Setting the x label<\/h4>\n\n\n\n
plt.xlabel('False Positive Rate\/FPR')<\/code><\/pre>\n\n\n\n

Setting the y label <\/h4>\n\n\n\n
plt.ylabel('True Positive Rate\/TPR')<\/code><\/pre>\n\n\n\n

Plot legend<\/h4>\n\n\n\n
plt.legend(loc='best')<\/code><\/pre>\n\n\n\n

Saving plt figure name and resolution<\/h4>\n\n\n\n
plt.savefig('ROC',dpi=300)<\/code><\/pre>\n\n\n\n

showing the plot<\/h4>\n\n\n\n
plot.show();<\/code><\/pre>\n\n\n\n
\"ROC
ROC curve<\/figcaption><\/figure>\n\n\n\n

With this ROC curve, we can see the relationship between false positives and true negatives. The closer to the left side of the graph (0% on y-axis), the better than model is at picking out relevant data points without too many irrelevant ones. Higher on the right side of the chart indicate that more wrong predictions are made with little accuracy in predicting what’s correct information. Generally speaking, models which have an AUC score greater than 0.8 are considered very good at their job while those under 0.5 should be investigated for possible improvements or new algorithms altogether before they’re used again! Ours is 0.95 which is considered a great AUC score.<\/p>\n\n\n\n

\n