Thesis
Semi-automated indexing of handwritten tabular forms
Washington State University
Master of Science (MS), Washington State University
2012
Handle:
https://hdl.handle.net/2376/103072
Abstract
The massive amount of information in historical handwritten documents spread throughout the world is beyond the reach of the general public because it is not indexed, or converted into its digital symbolic representation. Currently, thousands of volunteers donate their time to manually index old documents to make them available. Not only does the current approach depend on volunteers, but the amount of work to be done is still more than is possible by even the thousands of volunteers currently helping in the indexing process. In this research, we present a method to assist the user who is indexing U.S. census records by automatically indexing as much of the document as possible. We do this by detecting the tabular structure of the record, extracting the content of the cells and then using unsupervised handwriting recognition algorithms to decipher the cell content. We present a novel approach to classifying contents of the census record using a self-organizing map machine learning algorithm. We compare the results using various machine learning algorithms. We also present a user interface that can be used by indexers to correct classification errors. By automatically classifying the majority of the information in the documents, indexers will simply need to review the automatic suggestions and fill in the gaps, thus saving hours of tedious, manual indexing. i
Metrics
7 File views/ downloads
10 Record Views
Details
- Title
- Semi-automated indexing of handwritten tabular forms
- Creators
- Russell S. Jensen
- Contributors
- Robert R. Lewis (Degree Supervisor)
- Awarding Institution
- Washington State University
- Academic Unit
- Electrical Engineering and Computer Science, School of
- Theses and Dissertations
- Master of Science (MS), Washington State University
- Publisher
- Washington State University; [Pullman, Washington] :
- Identifiers
- 99900525176701842
- Language
- English
- Resource Type
- Thesis