Friday 5 July 2024

For beginners:What is accuracy? What is precision in Artificial Neural Networks

  

For beginners...

What is accuracy? What is precision in Artificial Neural Networks...

 


Accuracy

Accuracy is the proportion of correct predictions (both true positives and true negatives) out of the total number of predictions. It is a general measure of how well the model is performing.

 

Formula:

 

Accuracy = Number of Correct Predictions  /  Total Number of Predictions

 

For a binary classification problem:

 

Accuracy = TP + TN / TP + TN + FP + FN

 

Where:

  • TP (True Positives): Correctly predicted positive samples
  • TN (True Negatives): Correctly predicted negative samples
  • FP (False Positives): Incorrectly predicted positive samples
  • FN (False Negatives): Incorrectly predicted negative samples

 

Precision

Precision is the proportion of true positive predictions out of all positive predictions made by the model. It focuses on the accuracy of the positive predictions.

 

Formula:

Precision = TP  /  TP+FP

Precision is particularly useful in scenarios where the cost of false positives is high. It tells us how many of the predicted positive instances are actually positive.

 

Example

Let's consider an example to illustrate accuracy and precision:

  • TP (True Positives): 40
  • TN (True Negatives): 30
  • FP (False Positives): 10
  • FN (False Negatives): 20

 

Accuracy:

Accuracy = TP+TN   TP + TN + FP + FN

Accuracy = 40 + 30  /  40 + 30 + 10 + 20 = 70  /  100 = 0.70      

So, the accuracy is 70%.

 

Precision:

Precision = TP  /  TP + FP = 40  /  40 + 10 = 40  /  50 = 0.80

So, the precision is 80%.

 

Importance in Neural Networks

 

  • Accuracy is useful when you need a general measure of how well your model is performing across all classes.
  • Precision is critical in situations where the cost of false positives is high, such as in medical diagnosis (where you don't want to wrongly diagnose a healthy person as sick).

 

Using both metrics together gives a more comprehensive view of a model's performance, particularly in imbalanced datasets where one class may dominate.

 

Accuracy in Neural Networks

 

  • General Measure of Performance: Accuracy gives a straightforward, overall measure of how well the model is performing by calculating the proportion of correct predictions (both true positives and true negatives) out of all predictions.
  • Limitation: While accuracy is useful, it can be misleading in certain situations, especially with imbalanced datasets.

 

Precision in Neural Networks

 

  • Critical in High Cost of False Positives: Precision measures the proportion of true positive predictions out of all positive predictions. It is especially important when the cost of false positives is high. For example:
    • Medical Diagnosis: Misdiagnosing a healthy person as sick (false positive) can lead to unnecessary stress, additional tests, and treatments.
    • Spam Detection: Marking a legitimate email as spam (false positive) can cause users to miss important messages.

 

Imbalanced Datasets

 

An imbalanced dataset is one where the classes are not equally represented. For example, in a medical dataset, you might have 99% healthy patients and 1% sick patients. This imbalance can cause issues with model evaluation and performance.

 

Example Scenario

  • Dataset: 1000 samples, 990 healthy (negative class) and 10 sick (positive class).
  • Model: A model that predicts every patient as healthy will have 99% accuracy (990/1000), but it never correctly identifies a sick patient.

 

Using Both Accuracy and Precision

 

  • Comprehensive View: By using both accuracy and precision, you get a more nuanced understanding of your model's performance. This is particularly valuable in imbalanced datasets where one class dominates.
    • Accuracy: Shows the overall correctness of the model.
    • Precision: Ensures that when the model predicts a positive class, it is likely correct.

 

Example to Illustrate

Imagine a fraud detection system:

  • Imbalanced Dataset: 10,000 transactions, 9,900 legitimate (negative class) and 100 fraudulent (positive class).
  • High Accuracy but Low Precision: A model might have high accuracy by predicting most transactions as legitimate, but it would miss many fraudulent transactions and have low precision.
  • Improved Model: By focusing on precision, the model improves its ability to correctly identify fraudulent transactions, even if the overall accuracy drops slightly.

 

Practical Tips

  • Balance Metrics: Always consider multiple metrics (accuracy, precision, recall, F1 score) to get a complete picture of model performance.
  • Handle Imbalance: Techniques like resampling (over-sampling minority class or under-sampling majority class), using different evaluation metrics (like F1 score), or applying advanced algorithms designed to handle imbalance can help.
  •  

Summary

Accuracy and precision are both critical metrics in neural networks:

  • Accuracy: Useful for a general measure of performance, but can be misleading in imbalanced datasets.
  • Precision: Essential when the cost of false positives is high and provides a clearer picture of positive prediction reliability.
  • Imbalanced Datasets: Common in real-world scenarios, requiring careful handling and consideration of multiple metrics to ensure robust model evaluation.

 

Using a combination of these metrics provides a more comprehensive understanding of how well a model is truly performing, especially in situations where one class significantly outnumbers another.

 

Wednesday 3 July 2024

Underfitting overfitting fitting detail

  

 

 

 

Underfitting Model

 

Underfit model:

# Define a more complex neural network model

model = Sequential([

    Dense(64, activation='relu', input_shape=(2,)),

    Dropout(0.5),

    Dense(64, activation='relu'),

    Dropout(0.5),

    Dense(32, activation='relu'),

    Dense(1, activation='sigmoid')

])

 

Learning:

Epoch 98/100

15/15 [==============================] - 0s 5ms/step - loss: 0.7819 - accuracy: 0.5833 - val_loss: 0.6862 - val_accuracy: 0.5000

Epoch 99/100

15/15 [==============================] - 0s 4ms/step - loss: 0.7416 - accuracy: 0.6111 - val_loss: 0.6839 - val_accuracy: 0.5000

Epoch 100/100

15/15 [==============================] - 0s 5ms/step - loss: 0.8744 - accuracy: 0.5278 - val_loss: 0.6917 - val_accuracy: 0.5000

2/2 [==============================] - 0s 0s/step

 

Precision: 0.00

 

output

circlOUT00.png



circleOUT0.png


notes:

Changes made to increase precision:

Increased the number of layers to three hidden layers.

Increased the number of neurons in each layer.

Added dropout layers to prevent overfitting.

Increased the number of epochs to 100.

You can adjust the parameters further if needed, such as the number of layers, neurons, and dropout rates, to see if you can achieve even better performance.

 

Overfitting Model

Overfit Model:

# Define a neural network model with more layers and neurons to overfit

model = Sequential([

    Dense(512, activation='relu', input_shape=(2,)),

    Dense(512, activation='relu'),

    Dense(512, activation='relu'),

    Dense(512, activation='relu'),

    Dense(1, activation='sigmoid')

])

 

Notes

To demonstrate overfitting, we can create a model that is too complex relative to the amount of data and the problem at hand. This can be done  by significantly increasing the number of neurons and layers, removing  dropout and batch normalization, and reducing the training data size.   Overfitting occurs when the model performs very well on the training  data but poorly on the validation/test data.

 

Learning

Epoch 199/200

3/3 [==============================] - 0s 48ms/step - loss: 0.4401 - accuracy: 0.7889 - val_loss: 2.1862 - val_accuracy: 0.6000

Epoch 200/200

3/3 [==============================] - 0s 40ms/step - loss: 0.4413 - accuracy: 0.8000 - val_loss: 2.2906 - val_accuracy: 0.6000

29/29 [==============================] - 0s 4ms/step

 

Precision: 0.70

Accuracy: 0.67

4889/4889 [==============================] - 20s 4ms/step

 

output

circleOUTfinal13overfit.png



circleOUTfinal4overfit.png


Explanation

Model Complexity:

Increased the number of layers and neurons per layer significantly.

Removed dropout and batch normalization to make the model more likely to overfit.

 

Training Data Reduction:

Reduced the training data to 10% to further induce overfitting.

 

Training and Evaluation:

Trained the model for 200 epochs.

Evaluated on the test data and calculated precision and accuracy.

 

Visualization:

Plotted training and validation accuracy/loss to show the overfitting behavior.

Plotted the decision boundary and classifications to visually demonstrate  overfitting.

 

This setup should result in a model that performs very well on the training  data but poorly on the validation and test data, illustrating overfitting.

 

 

Fitting Model

Notes

# Define a neural network model with batch normalization

model = Sequential([

    Dense(128, activation='relu', input_shape=(2,)),

    BatchNormalization(),

    Dropout(0.5),

    Dense(128, activation='relu'),

    BatchNormalization(),

    Dropout(0.5),

    Dense(128, activation='relu'),

    BatchNormalization(),

    Dropout(0.5),

    Dense(128, activation='relu'),

    BatchNormalization(),

    Dropout(0.5),

    Dense(64, activation='relu'),

    Dense(1, activation='sigmoid')

])

 

The results indicate some improvement but also highlight that capturing the  circular decision boundary with the current model might still be  challenging. Let's try a few more strategies to improve the model's  performance:

 

Increase the number of epochs: Train the model for more epochs.

Adjust the learning rate: Sometimes, a different learning rate can help the  model converge better.

 Use a different optimizer: Experiment with different optimizers like RMSprop  or Nadam.

 Batch normalization: Add batch normalization layers to help stabilize training.

 

Learning:

Epoch 399/400

23/23 [==============================] - 0s 10ms/step - loss: 0.4219 - accuracy: 0.8361 - val_loss: 0.4337 - val_accuracy: 0.8500

Epoch 400/400

23/23 [==============================] - 0s 10ms/step - loss: 0.4398 - accuracy: 0.8306 - val_loss: 0.4113 - val_accuracy: 0.8500

7/7 [==============================] - 0s 3ms/step

 

Precision: 0.93

Accuracy: 0.86

4864/4864 [==============================] - 15s 3ms/step

 

circleOUTfinal1.png and 2


Explanation

Batch Normalization:

Added after each dense layer to help stabilize and speed up the training

 process.

 

Optimizer:

Switched to RMSprop, which can sometimes lead to better convergence for

certain problems.

 

Increased Epochs:

Increased to 400 epochs to give the model more time to learn the patterns

 in the data.

 

Precision and Accuracy:

Precision and accuracy are calculated to evaluate the model’s performance.

 

Visualization:

The decision boundary and true/predicted classifications are plotted to visually assess the model’s performance.

 

This approach should improve the model's ability to capture the circular decision boundary and provide better overall classification performance.

 


Wednesday 26 June 2024

Overview of CBOW

CBOW is a neural network model used to learn word embeddings. The goal is to predict a target word from the surrounding context words within a window.

Detailed Steps

  1. Context Window:
    • Definition: The context window is the span of words around a target word that you consider for prediction.
    • Example: Consider the sentence "The quick brown fox". If "quick" is the target word and the window size is 2, the context words are "The" and "brown". If the window size was 4, the context words would include "The", "brown", and "fox".
  2. Input Representation:
    • One-Hot Encoding: Convert each context word into a one-hot vector. A one-hot vector is a binary vector of the size of the vocabulary with all elements set to 0 except for the element corresponding to the word, which is set to 1.
    • Example: If the vocabulary is ["The", "quick", "brown", "fox"], the one-hot encoding for "The" would be [1, 0, 0, 0], and for "brown", it would be [0, 0, 1, 0].
  3. Projection Layer:
    • Average One-Hot Vectors: Combine the one-hot vectors of the context words by averaging them.
    • Example: If "The" and "brown" are the context words, average their one-hot vectors:
      • [1, 0, 0, 0] (for "The")
      • [0, 0, 1, 0] (for "brown")
      • Average: [0.5, 0, 0.5, 0]
    • Projection to Hidden Layer: Multiply this averaged vector by a weight matrix (W1), which maps the input space to a hidden layer of neurons. The result is the hidden layer representation.
      • If W1 is a matrix of size V × N (where V is the vocabulary size and N is the number of neurons in the hidden layer), the hidden layer representation is computed as h = W1 × avg_one_hot_vector.
  4. Output Layer:
    • From Hidden to Output: Multiply the hidden layer vector by another weight matrix (W2), which maps the hidden layer back to the vocabulary size.
      • If W2 is a matrix of size N×V, the output layer scores (u) are computed as u = W2 × h       
  5. Softmax:
    • Convert Scores to Probabilities: Apply the softmax function to the output scores to convert them into a probability distribution over the vocabulary. The softmax function ensures that all the probabilities sum to 1.
      • ypred = eui  / j e^uj   where ui are the scores for each word in the vocabulary.

Given Scores:

Assume we have scores for four words in our vocabulary:

·         u(The) = 2.0

·         u(quick) = 1.0

·         u(brown) = 0.1

·         u(fox) = 0.5

Steps:

1.      Exponentials of Scores:

o    e2.0≈7.389e^{2.0} \approx 7.389e2.07.389

o    e1.0≈2.718e^{1.0} \approx 2.718e1.02.718

o    e0.1≈1.105e^{0.1} \approx 1.105e0.11.105

o    e0.5≈1.649e^{0.5} \approx 1.649e0.51.649

2.      Sum of Exponentials:

o    Sum = 7.389 + 2.718 + 1.105 + 1.649 ≈ 12.861

3.      Softmax Probabilities:

o    Probability(The) = 7.38912.861≈0.574\frac{7.389}{12.861} \approx 0.57412.8617.3890.574

o    Probability(quick) = 2.71812.861≈0.211\frac{2.718}{12.861} \approx 0.21112.8612.7180.211

o    Probability(brown) = 1.10512.861≈0.086\frac{1.105}{12.861} \approx 0.08612.8611.1050.086

o    Probability(fox) = 1.64912.861≈0.128\frac{1.649}{12.861} \approx 0.12812.8611.6490.128

 

  1. Loss Function:
    • Measure Prediction Accuracy: Use a loss function to measure how far the predicted probabilities (from softmax) are from the actual distribution (where the target word has a probability of 1 and all other words have a probability of 0).
    • Negative Log Likelihood: The loss is typically computed using the negative log likelihood of the true word given the predicted probabilities:
      • loss=−log(ypred​[target_word]).
  2. Backpropagation:
    • Adjust Weights: Use backpropagation to adjust the weights (W1 and W2) in the network. This involves computing the gradient of the loss with respect to each weight and updating the weights in the direction that reduces the loss.
    • Gradient Descent: The weights are updated using gradient descent, where each weight is adjusted by subtracting the product of the learning rate and the gradient of the loss with respect to that weight.

Summary

In CBOW, the model learns to predict a target word based on the average representation of the surrounding context words. Through multiple iterations over a large corpus, the weights in the network (which ultimately form the word vectors) are adjusted to minimize the prediction error. These learned word vectors capture semantic relationships between words, enabling various natural language processing tasks.

Summary of CBOW

In the Continuous Bag of Words (CBOW) model, the goal is to predict a target word using the surrounding context words. Here's a more detailed breakdown:

1.      Context and Target Words:

    • Context Words: These are the words surrounding the target word within a defined window size.
    • Target Word: This is the word the model tries to predict based on the context words.

2.      Input Representation:

    • Each context word is converted into a one-hot vector, which is a binary vector of the size of the vocabulary with a single 1 indicating the word's position in the vocabulary and 0s elsewhere.

3.      Projection Layer:

    • The one-hot vectors of the context words are averaged.
    • This averaged vector is then multiplied by a weight matrix (W1) to produce a hidden layer representation, which captures the combined context information.

4.      Output Layer:

    • The hidden layer representation is multiplied by another weight matrix (W2) to produce scores for all words in the vocabulary.

5.      Softmax Function:

    • The scores are passed through the softmax function, which converts them into probabilities. These probabilities indicate the likelihood of each word in the vocabulary being the target word.

6.      Loss Function:

    • The model uses a loss function (typically negative log likelihood) to measure the difference between the predicted probabilities and the actual target word.
    • The loss is minimized through backpropagation, adjusting the weight matrices (W1 and W2) to improve predictions over time.

7.      Training Process:

    • The model is trained on a large corpus of text. Through multiple iterations (epochs), the weights are continuously updated to reduce the prediction error.

8.      Word Vectors:

    • The rows of the weight matrix W1 (after training) become the word vectors.
    • These vectors encode semantic relationships between words, such that words with similar meanings have similar vectors.

9.      Applications:

    • The learned word vectors can be used in various natural language processing (NLP) tasks, such as text classification, sentiment analysis, and machine translation, because they capture meaningful patterns and relationships in the data.

Key Takeaway

The CBOW model learns to predict a word based on its context, and through this process, it generates word vectors that encapsulate semantic relationships. These word vectors are valuable for numerous NLP applications, as they represent words in a way that reflects their meanings and relationships to other words in the vocabulary.