Abstract: |
This paper presents an application of neural network models to predictive
classification for data quality control. Our aim is to identify data affected
by measurement error in the Bank of Italy’s business surveys. We build an
architecture consisting of three feed-forward networks for variables related
to employment, sales and investment respectively: the networks are trained on
input matrices extracted from the error-free final survey database for the
2003 wave, and subjected to stochastic transformations reproducing known error
patterns. A binary indicator of unit perturbation is used as the output
variable. The networks are trained with the Resilient Propagation learning
algorithm. On the training and validation sets, correct predictions occur in
about 90 per cent of the records for employment, 94 per cent for sales, and 75
per cent for investment. On independent test sets, the respective quotas
average 92, 80 and 70 per cent. On our data, neural networks perform much
better as classifiers than logistic regression, one of the most popular
competing methods, on our data. They appear to provide a valid means of
improving the efficiency of the quality control process and, ultimately, the
reliability of survey data. |