Hugues Annoye
Cédric Heuchenne
Alessandro Beretta
(2025).
Statistical matching using autoencoders-canonical correlation analysis, kernel canonical correlation analysis and multi-output multilayer perceptron.
Knowledge-Based Systems.
A lot of data are gathered every day, whether via surveys or other sources. For many people, the need for variables from different data sources is a key factor and leads to the need of methods to combine them. A recognized practice to combine data sets in this field is statistical matching. In this paper, we investigate and extend to statistical matching an Autoencoders-Canonical Correlation Analysis - A-CCA. A-CCA is an extension of KCCA, that reduces the need for kernels, with the added benefit of a dimensionality reduction. It can be regarded as an extension of Deep Canonical Correlation Analysis -DCCA-, providing enhanced flexibility that makes it well suited for statistical matching. This method is designed to deal with various variable types, sampling weights and incompatibilities among categorical variables. We compare the performance of this method with other methods based on Kernel Canonical Correlation Analysis -KCCA- or Multi-output Multilayer Perceptron -MMLP-, using 2017 Belgian Statistics on Income and Living Conditions -SILC-. We divide this data set in two parts and we act as if they were coming from two different sources.
Field : scientific publication