Identifying Cancer Characteristics Utilizing Handwriting Method

HOD of Department of Computer Engineering, JSPM’s Rajarshi Shahu College of Engineering, Savitribai Phule University of Pune, India. Department of Computer Engineering, JSPM’s Rajarshi Shahu College of Engineering, Savitribai Phule University of Pune, India. 3 Department of Computer Engineering, JSPM’s Rajarshi Shahu College of Engineering, Savitribai Phule University of Pune, India. Department of Computer Engineering, JSPM’s Rajarshi Shahu College of Engineering, Savitribai Phule University of Pune, India. Department of Computer Engineering, JSPM’s Rajarshi Shahu College of Engineering, Savitribai Phule University of Pune, India.


Introduction
Recognition of handwriting analysis, various characteristics are taken into consideration to analyses a matching handwriting. The Quality of line gives an idea about the strength, thickness and the flow in which the letters are arranged. Approximate aspects are the letters are if shaky, flowing or very thick. The amount of space put between each and every letters is basically the letter spacing. The scopes of the handwriting are analyses based on the height, width and size of the letters. While writing a person may lift the pen or write with separations. They may stop before writing a new letter or connect the letters. Basically, connecting strokes are separations and the pen lifts such as the capital letters are connected to lowercase letters. Strokes aspects like how the person writing starts and ends their words. Does the writer ends the letter with a curl or an upstroke or a down stroke? The pressure of the pen is analyses. Where the pressure is more is calculated. In which manner each letter tends to slant, either in left or right direction or no slant at all. Usually people tend to write slant towards the right. A writer may tend to write above or below the line or even on the line. This is basically stated as the Baseline behaviours of a writer.
All activities including writing start in the brain. Like all other actions, the act of writing depends on central nerve system [1]. Our brain sends impulses to hand through nervous signals, achieving the motor act. Graphology is pseudoscience based on combination of psychoanalysis and neuroscience nested in subconscious mind. Though handwriting is driven through pen, its movement is administrated by the central nervous system, which is a process usually unconscious, but most revealing [2]. Handwriting is closely tied to impulses from the brain and therefore it can be reliably used to predict state of physical, emotional and mental health of individual [3]. Handwriting analysis is used to find out disturbance in the subject's handwriting.
This paper is based on the same procedure of handwriting detection and estimate the characteristic of the person whose features are comparable to cancer patients. The very first step is to make the computer understand using machine learning. The collection of digital handwriting of a person and computer prediction through that sample is economical and appropriate method. Any individual gives the digital sample of handwriting on a tab by which features are extracted from raw data. And it is uploaded on computer and various features are estimated with help of image processing techniques and characteristics of writer is predicted.
Digital handwriting analysis consist of 5 different standard cases and by this data samples are collected from individual. These 5 standard cases are important as all features are dependent, such as pressure, x coordinate, y coordinate and time spent on tab. These features help us to extract 11 different features which are important as characteristics of cancer patient lie.
These samples are considered as input to the support vector machine, naïve Bayes, KNN for classification. The algorithm planned is simple and easy to implement.

Proposed System
This paper proposes a system which uses digitized handwriting analysis to identify the cancer characteristics. The proposed system consists of four modules.  Each individual is to complete a draw-write standard test for the handwriting analysis on Wacom Intuos tab. Each set has the 100 handwriting samples. The five files of each test from the standard test is obtained. The features are then extracted from these files. These features are then used to train and test the suitable classifier for providing the accuracy for identifying the cancer characteristics.

Tab Analysis
Digitally collected handwriting samples of different individuals are used in this research which uses standard test on a Wacom Tab. The handwriting of 100 individuals which were diagnosed with cancer and another of 100 handwriting samples which are weren't diagnosed with the cancer. The standard test consists five test cases which focuses on the different aspects of the handwriting. Each of them was asked to write on the tab and complete the five tests in running hand. The first test consists of intersecting pentagon copying [4], test 2 consist of writing the three words in block letters. The test 3 is to write a sentence on the tab. The task 4 is to draw the analog clock showing the time of 5.15 without showing the numbers. As the test 5 is to draw a house with a tree and the person. These tests provide a file each contains the five parameters. The parameter is as following i. Timestamp ii. Count in microseconds iii. X co-ordinate iv. Y co-ordinate and v. Pressure. The parameters obtained in each of these files while performing the test indicates the variables obtained at that instance.

Feature Extraction
Feature extraction is a technique of dimensionality reduction from a high dimensional input data. This reduced output data is a transformation of the high dimension input data. represented as feature vector [5]. In our case, the 7 important factors on which the proposed system predicts the cancer characteristics are considered as the features. Below are the features extracted from the 5 files generated. These features are used for the handwriting analysis which represents a distinct characteristic developed by the cancer patient.

Classification
One of the main conducts for building a classifier is to train the database based on handwriting samples obtained from the cancer patients. In a primary study of numerous classification models, linear support vector machines (SVMs) and Naïve Bayes (Binary) classification raised out, even though not with a large margin. Certainly, linear SVMs and Naïve Bayes have been used broadly on a wide range of classification, regression, and ranking applications in science, medicine, and engineering and also known admirable observed performance. In all succeeding works, linear SVMs and Naïve Bayes were used to train and test the models. The classifier was set such that proficient or nonproficient products are connected with the undesirable or desirable output of classifier.

Experiment and Result
In demand to accurately evaluate the performance of the trained classifiers taking place handwriting samples, tenfold cross-validation was used. This procedure delivers an impartial estimator for the performance of classifiers on concealed data. In this 60 handwriting samples collected from both cancer patients as well as noncancer patients. Data are split in 80-20 ratio for training and testing purposes. The 80% training data consist of 96 samples and are tested for accuracy. The accuracy we get after training data is 84%. Remaining 20% of dataset are used for testing. Testing dataset consist of 24 samples and are tested for accuracy. The accuracy we get after testing data is 92%. In our case where the accuracy chart jumped from 84% towards 92% which is a respectable result.

Performance Measure
The performance of the trained classifier on a database which was assessed by calculating the accuracy. Accuracy is a distinctive metric for classifier performance, measuring the fraction of correctly classified handwriting samples data and shows that better the accuracy better the output.

Results
The SVM classifier confirms precise predictions for 77 out of 100 writing samples, which deciphers to an accuracy of 76.6% whereas the Naïve Bayes classifier shows the precise prediction for 99 out of 100 writing sample, which gives 99.6% accuracy.

Conclusion
The procedure of evaluating the handwriting samples to recognize cancer characteristics of the writer is very considerable beneficial in current scenario. It senses the characteristics which determine the precise condition of the writer. It would be really useful to find out if the writer is suffering from cancer or not and if writer remains found to be so, essential action for treatment or provision can be on condition that with the intent that they overcome it.