HOME » Publications » Economic Review

Abstract

Vol. 73, No. 1, pp. 15-28 (2022)

“Textizing Statistical Tables using OCR at Scale”
Yutaka Arimoto (Institute of Economic Research, Hitotsubashi University)

This study describes the requirements and methods for textizing statistical tables using OCR (optical character recognition) at scale. A major challenge of textizing statistical tables using OCR is analyzing the table layout with high accuracy. I develop a Python tookit, ocrstats, which supports the task by providing batch processing, automation of routine processes, use of external OCR, and table layout analysis with practical accuracy. I also explain the practical tips learnt from the process of textizing the Japan Imperial Statistical Yearbook using ocrstats.