Now Reading
Handwritten Character Digit Classification using Neural Network

Handwritten Character Digit Classification using Neural Network

Utkarsh Nigam
Handwritten Character Digit Classification
W3Schools

Introduction

Once upon a time, when we had simpler questions like “What is the problem?”, structured datasets were used to report numbers. Fast forward a few decades, we have more complex questions to answer like “Why is this problem happening?” and with complex problems comes complex datasets or unstructured datasets. To rescue us from all this complexity comes neural network, making machines learn and resolve all those complexities for us, scaling out with each level of complexity.

One of such complex problem is handwriting recognition, imagine handwriting on a paper or a tablet and getting that translated into a computer typed text, no more redo! Imagine not wracking your brains into deciphering a doctor’s handwriting. Imagine a child with dysgraphia, a condition that results in poor handwriting, not struggling in the classroom. 

All this can happen with the handwriting recognition tool, which classifies text from an image. This tool has a Graphical User Interface, where inside a canvas, a user can write any English word in freehand style, and the model sitting at the backend will be able to recognize the word. For this tool, Multi-Layer Perceptron (MLP) classifier with Adam solver and sigmoid function has been used to achieve significant results.



Dataset

Dataset used was created by the National Institute of Standards and Technology (NIST). The NIST Special Database 19 consists of roughly 0.7 million sample png images. The current model has been trained only for uppercase letters (A-Z).  The following table highlights the number of observations per character: 

Table 1: Number of Observations Per Characters

A: 7,010Q: 2,566g: 3,839w: 2,699
B: 4,091R: 4,536h: 9,713x: 2,820
C: 2,792S: 23,827i: 2,788y: 5,088
D: 4,945T: 10,927j: 1,920z: 2,726
E: 5,420U: 14,146k: 2,5620: 34,803
F: 10,203V: 4,951l: 16,9371: 38,049
G: 2,575W: 5,026m: 2,6342: 34,184
H: 3,271X: 2,731n: 12,8563: 35,293
I: 13,179Y: 2,359o: 2,7614: 33,432
J: 3,962Z: 2,698p: 2,4015: 31,067
K: 2,473a: 11,196q: 3,1156: 34,037
L: 5,390b: 5,551r: 15,9347: 35,796
M: 10,027c: 11,315s: 2,6988: 33,884
N: 9,149d: 11,421t: 20,7939: 33,720
O: 28,680e: 28,299u: 2,837
P: 9,277f: 2,493v: 2,854

Preprocessing

Each character in the original dataset occupies 128×128 pixels per raster (Fig 1 a), to avoid heavy computation the size of the image was reduced to 56 x 56 pixels (Fig 1 b). Furthermore, canvas size was reduced to 28×28 pixels by removing the padding (Fig 1 c) which resulted in a 784 feature configuration dataset. Each character was labelled sequentially from “A”- “Z”. 

Handwritten Character Digit Classification
Figure 1: (a) Original 128 x 128-pixel raster as obtained from the NIST Database, (b) 56 x 56- pixel raster upon resizing the image, (c) 28 x 28-pixel raster upon removing the padding from the resized image

The package ‘tkinter’ was used to create the canvas-like user interface. Once a user writes a word in the canvas, the tool converts the image into a NumPy 2-D array and then traverses the array column-wise looking for a filled pixel to mark the beginning of a letter. For words, the model continues to traverse and look for a column where there is significant relative blank space to mark the beginning of the second character. The tool is intelligent enough to differentiate a break in letters versus the beginning of a second letter.

Handwritten Character Digit Classification
Figure 2: User- Input Interpretation by the Model

Handwritten Character Digit Classification
Figure 3: User-Interface

Challenges

While designing and creating this tool, several challenges were faced highlighted below:

#1: The original dataset demanded heavy computational power to hyper tune the model for different combinations. To overcome this limitation with the personal computer that was used to build this model, the dataset was split into batches of five characters (i.e. letters A-E and F-I, etc.), and the model was trained and tested using these batches. 

#2: An imbalance in the number of observations among the characters in the dataset, resulted in complications during the testing and training phase (i.e. for example 10k and 2.5k images for J and K respectively). To overcome this a script was created that divided the test and train dataset for each character individually, and later merged them to ensure a well-balanced dataset.

#3: English alphabets contained letters that appeared very similar to each other (i.e. B and P, D and O). While training and testing, the model struggled to classify these letters accurately. To minimize misclassification for these letters, the model was trained and hyper tuned for these letters separately with a larger dataset. 

See Also

Neural Network Model Configuration

For this tool, Multi-Layer Perceptron (MLP) classifier has been trained using backpropagation to achieve significant results. Below is the configuration of the neural network:

  • Hidden Layer Size: (100,100,100) i.e., 3 hidden layers with 100 neurons in each
  • Activation Function: logistic sigmoid, returns f(x) = 1 / (1 + exp(-x))
  • Solver for weight optimization: stochastic gradient-based optimizer (“Adam”)
  • Early Stopping (to avoid overfitting): True

A picture containing table, large, computer, wooden

Description automatically generated

Figure 4: Model Architecture

Results

Table 2: Results Summary

Number of Samples (Test Set):74,491
Correctly Classified:71,227
Accuracy:95.6%
 
CharacterAttemptsCorrectly ClassifiedAccuracyMisclassified With
2,774 2,696 97.2%‘B’, ‘H’, ‘K’, ‘N’, ‘R’, ‘X’
1,734 1,572 90.7%‘A’, ‘D’, ‘E’, ‘G’, ‘H’, ‘R’, ‘S’
4,682 4,544 97.1%‘E’, ‘G’, ‘L’, ‘O’
2,027 1,776 87.6%‘B’, ‘O’, ‘P’, ‘Q’
2,288 2,115 92.4%‘B’, ‘C’, ‘F’, ‘G’, ‘K’, ‘S’
233 210 90.4%‘E’, ‘P’,’T’
1,152 1,050 91.1%‘B’, ‘C’, ‘E’, ‘O’, ‘Q’
1,444 1,278 88.5%‘A’, ‘B’, ‘K’, ‘N’, ‘R’
224 196 87.4%‘J’, ‘L’, ‘T’, ‘Z’
1,699 1,589 93.5%‘I’, ‘T’, ‘Z’
1,121 1,008 90.0%‘A’, ‘E’, ‘H’, ‘M’, ‘N’, ‘R’, ‘X’, Y
2,317 2,255 97.3%‘C’, ‘I’
2,467 2,337 94.7%‘K’, ‘N’, ‘W’
3,802 3,606 94.9%‘A’, ‘H’, ‘K’, ‘M’, ‘R’
11,565 11,338 98.0%‘C’, ‘D’, ‘G’, ‘Q’
3,868 3,778 97.7%‘D’, ‘F’, ‘R’
1,162 1,009 86.8%‘D’, ‘G’, ‘O’
2,313 2,144 92.7%‘A’, ‘B’, ‘H’, ‘K’, ‘N’, ‘P’
9,684 9,481 97.9%‘B’, ‘E’
4,499 4,430 98.5%‘F’, ‘I’, ‘J’
5,802 5,658 97.5%‘V’, ‘W’
836 810 96.8%‘U’, ‘W’, ‘Y’
2,157 2,017 93.5%‘M’, ‘U’, ‘V’
1,254 1,163 92.7%‘A’, ‘K’, ‘Y’
2,172 2,055 94.6%‘K’, ‘X’
1,215 1,162 95.6%‘I’, ‘J’
Figure 4: Accuracy per character

Future Expansion 

While this tool serves as a base model in bridging the communication gap, there is more work that needs to be done. Currently, the model can decrypt letters and words, but it is capable of processing phrases and paragraphs with proper expansion. Additionally, the UI/ UX can be further developed to be leveraged by a wider user-audience.

What Do You Think?

If you loved this story, do join our Telegram Community.


Also, you can write for us and be one of the 500+ experts who have contributed stories at AIM. Share your nominations here.

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top