Educational Training For Processing Invoice Of Vendor Identification And Payments Using Python-Tesseract

. The aim of the project is to recognize the invoices of receipts from various vendors, by using automated invoice processing using various learning educational tools. This automated invoice processing is far better than manual invoice processing, it saves a serious amount of time and money creating efficiencies and increasing the accuracy of captured data. Basically, the invoices were calculated from the scanned receipts by using python-tesseract software. Python-tesseract is an optical character recognition (OCR) tool for python. It will recognize and read the text embedded in images. So, this python-tesseract software extracts key information like bill or invoice number, amount etc.; from all receipts and imports the calculated invoices and total amount of all receipts which are given by vendors to the database.

information from the receipt is also a big task.If the receipt contains above mentioned noises then it is difficult to automate the invoice processing.By using image processing and correction software's the automation gets easier but extracting and converting the image to string is difficult.This job is done by Optical Character Recognition tool.Optical character Recognition tool is the most common thing used to extract text embedded in the image that is receipt.By using these tools the company can avoid the problems of printing and paper costs, huge amount of time and errors caused by humans.This invoice processing is not only used in companies, these are also used in taxation areas.But compared to manual invoice processing automated invoice processing has many advantages.So now automated invoice processing is used everywhere to avoid man hours, labor, transparency costs, paper costs, printing costs.Automated invoice processing enables a company to get more and more trades and reputation.It has several advantages, so even a small company is also shifting to automated invoice processing rather than using manual invoice processing.The receipts are mostly noisy and disturbance, so it is difficult to extract the information from them.So the Optical Character Recognition tools and the extraction tools work well to automate the invoice processing.Image processing is required for avoiding the noisy text, faded text or any other disturbances.This image processing is called as preprocess step in the automated invoice processing.There are several preprocess methods like noise removal, grey-scaling, thresholding etc.; these preprocessing methods are mainly used in removing the noisy text and disturbances.Image processing step in automation of invoice processing help to correct the image and contrast the image etc.; so that the automation of invoice processing can be done easily with accurate output and greater efficiencies.
Optical Character Recognition tool is used for extracting the text embedded in the image read the text embedded in the image such as scanned copy of images.This tool can convert any kind of data embedded in the scanned copy to text.This Optical Character Recognition can be achieved in two steps.First thing is detection of text in the scanned copy of the image.Second step is to read or recognize the text embedded in the scanned copy of the image.Now the invoices are processed by using python tesseract software.Python tesseract is an optical character recognition tool.This tesseract software is used to automate the invoice processing in this project.

AIM AND OBJECTIVE
Automated invoice processing is very useful to a company to process their invoices with greater efficiencies.First of all, a company or an organization receives the invoices or receipts from the vendors.And then these receipts are needed to be scanned without any margin mistakes or any other mistakes.These scanned receipts are given as input to the Optical Character Recognition tool that is python tesseract software.Then the invoice number and amount paid to the vendor is displayed.Hence the amount is paid to the vendor at correct time with more accurate data.1.
Reduced paper and printing costs.

2.
Saving time manually entering financial data into the systems.
Improving the accuracy of the data.

5.
Errors in supplier payments, such as over payments and duplicated payments.6.
Reducing time spent on the phone dealing with supplier queries.
By automating the invoice processing, errors that are caused due to human are minimized.Time taken to write all the details that are in the receipts gets reduced by automating invoices.Improves the accuracy of data and eliminates the risk of missing invoices.Reduces the paper and printing costs.
In earlier days all the accounting tasks were handled manually and in resulting, there were lots of errors.With today's available technology, there's no reason to be relying on email and spreadsheets to handle invoicing needs.
Manual invoice processing has high risk of human errors and may not enable you to pay all your invoices by the payment date.It requires a huge amount of time, man-hours, paper and printing costs, also the accuracy of data is not good.algorithms in the open source computer vision.These algorithms can used for various tasks such as detecting the faces, recognizing faces, identifying the objects, differentiating human actions in videos, to track the movements of the camera, to track the movements of the objects, to extract the objects that are in 3D models, to stitch the images together to provide a image comprising of both the images in one image as entire scene with highest resolution, to find the images in the database that are similar, to remove the excess light, color, flash from the image, to identify the eye movements in a video etc. there are several thousands of people who are using this open source computer vision.This open source computer vision is hugely used in many companies, government, research people etc. some of the companies that are using this open source computer vision are Microsoft, Google, Honda, IBM, Yahoo, Sony, Intel etc. not only these well established companies, there are various number of startups that are using this open source computer vision library.The startup companies that are using this open source computer vision library are VideoSurf, Applied Minds etc[4] Matplotlib is a library that is used for visualization in python.

Block Diagram
[5] It is a multiplatform visualization library.This matplotlib visualization library is used for plotting in python.It is an amazing library for plotting 2D arrays in python.Matplotlib visualization library is built on NumPy arrays.Matplotlib visualization library was introduced by John Hunter in the year 2002.Matplotlib visualization library has various applications, it is one of the greatest visualization library.This Matplotlib visualization library has various benefits in plotting.It can allow visual access to large lumps of data in very easy way.This Matplotlib visualization library has several plots like scatter, histogram, line, bar etc.; Any plotting works that are in python and NumPy are done by this powerful plotting library.Matplotlib visualization library is a tool that is used in various applications for plotting the images.It is very necessary to visualize the data, so it can be achieved by this Matplotlib visualization library.

FLOWCHART AND EXPLANATION
There are various steps involved in automating the invoices to recognize the invoice number and total amount.

FIGURE 2. Flow chart of the code written for the process
Those steps are collecting the receipts, scanning the receipts without any errors, preprocessing the image, passing it through python tesseract software to get the invoice number and total amount of the receipts given by the vendors as shown in fig 2. Before running the code we have to install all the required software.First we have to create a folder in which it stores the receipts of a vendor [7], [8], [9].Then we must open the anaconda software, launch the Jupiter notebook.Then a window opens select desktop then check whether the created folder is present, then click on new which is on open top right, thus we have created python notebook 3, in that we have to write the code.First import all the required libraries like regular expressions, cv2, etc.., Write the code to import all the libraries, then write code to import image and then we have to convert image to text so write required by importing tesseract, from the converted text we have to find invoice number and total amount of a vendor by writing the required code, after each code we have run the program.Then output is displayed.

RESULTS
Fig. 3 represents the output of the software shows the imported calculated invoices and total amount all receipt values and total amount of all receipts which are given by vendors to the database Process the invoices to recognize invoice number and total amount in less time and cost by automation.By utilizing python-tesseract software in automation of invoice processing the accuracy of the captured data is improved and the errors get minimized creating efficiencies.In earlier days all the accounting tasks were manually and in resulting, there were lots of errors.With automation of invoice processing, error rates are minimized, preventing all types of error-resulting issues, such as delayed approvals type spent on searching for and correcting mistakes.Improves the accuracy of data and eliminates the risk of missing invoices.Reduces the paper and printing costs.
FIGURE 1. Block Diagram representing the text embedded from the image

Figure 3 .
Figure 3. Result shown by all receipts in softwareDISCUSSION and CONCLUSIONSProcess the invoices to recognize invoice number and total amount in less time and cost by automation.By utilizing python-tesseract software in automation of invoice processing the accuracy of the captured data is improved and the errors get minimized creating efficiencies.In earlier days all the accounting tasks were manually and in resulting, there were lots of errors.With automation of invoice processing, error rates are minimized, preventing all types of error-resulting issues, such as delayed approvals type spent on searching for and correcting mistakes.Improves the accuracy of data and eliminates the risk of missing invoices.Reduces the paper and printing costs.
Supported Operating System, Windows 7(32 or 64 bit), Windows 8(32 or 64 bit), Windows 10 (32 or 64 bit)Supported Development Environment, Python, anaconda, Tesseract, OpenCV, matplotlib, Optical Character Recognition (OCR) is a tool used to get the text information or printed text or handwritten text embedded in the scanned copy of the image.Basically this Optical Character Recognition (OCR) is of two steps.The two steps involved in the Optical Character Recognition (OCR) tool are detecting text from the scanned copy of the image or the receipt and extract the information embedded in the text or to recognize the text from the scanned copy of Firstly, Optical Character Recognition (OCR) tool requires the scanned copy of the receipt or image.And then Optical Character Recognition (OCR) tool copies all the scanned images or receipts and converts the image or receipt into black and white color.This scanned image or receipt is analyzed to recognize the text, characters embedded in the scanned copy of image or receipt as shown in fig.2.4.1.2.There are two methods or algorithms to recognize the characters and text embedded in the scanned copy of the image or receipt.They are pattern recognition, featured detection.In Optical Character Recognition (OCR) tool there are several examples of programs.This programs in Optical Character Recognition (OCR) tool are used to get the various fonts or formats of the text.These fonts and formats are used to compare the text that is embedded scanned copy of the image or receipt.This is called pattern recognition.There are also several examples which help in understanding the rules of writing specific letters, numbers and characters.These rules are used to compare with the various kinds of letters, numbers and characters that are embedded from the scanned copy of image or receipt.This is called feature detection.Optical Character Recognition (OCR) tool has several advantages.It saves serious amount of time, decreases the errors caused by human due to any reason and minimizes the effort.It can save huge amount of data that is images or receipts and it can compress the file.One can edit the image or receipt that is scanned into the Optical Character Recognition (OCR) tool and can search the image or document or file that is scanned in Optical Character Recognition (OCR) tool at any point of time[1].Python tesseract software is an Optical Character Recognition (OCR) tool.This python tesseract software is used to process the invoices of the vendor to recognize invoice number and total amount.There are Optical Character Recognition (OCR) tools available but python tesseract software is more efficient.It can recognize text embedded in the scanned copy of the image or receipt more accurately.Tessaract software can be accessed with many programming languages.In this project python programming language is used.This python tesseract software recognizes the text embedded in the scanned copy of the image or receipt as shown in fig.2.4.2.This python tesseract software has the capability to recognize the text embedded in very large document or file als[2].OpenCV means open source computer vision.Open source computer vision is a library used for processing a image[3].This open source library is based on machine learning, it is a machine learning software library.Open source computer vision is built to provide the infrastructure to the applications that are based on computer vision.In open source computer vision one can write code according to their need.The code in the open source computer vision can be edited or modified according to our requirement.This open source computer vision has various number of algorithms or examples to perform various tasks.There are around two thousand five hundred the image or the receipt.Optical Character Recognition (OCR) tool consist of both hardware and software equipment.This hardware and software equipment enables Optical Character Recognition (OCR) tool to get the information embedded in the scanned copy of image or receipt and to read the information or to recognize the text embedded in the scanned copy of image or receipt.Optical Character Recognition (OCR) tool needs two steps to perform its task.The two steps involved in Optical Character Recognition (OCR) are detecting the text and recognizing the text.