Our Main Challenges in Document Classification
A retail bank addressed MindCraft asking for help with document classification. Their organization has an input queue of documents, scanned or captured with a camera or cell phone. Before the information can be processed, OCRed, and stored, the documents need to be classified by type. The reason is that different types of text content can be processed in different ways. Some can be easily captured by fields and OCRed. Others, like handwriting, need manual tagging and then storing. Types of documents can vary (as shown below):
- a regular printed letter
- a handwritten document containing a table
- a mixed-type document