An unconstrained statistical matching algorithm for combining individual and household level geo-specific census and survey data

Mohammad-Reza Namazi-Rad, Robert TANTON, David Steel, Sumonkanti Das

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

The Population Census is an important source of statistical information in most countries that is capable of producing reliable estimates of population characteristics for small geographic areas. One limitation of a census is that there are many population characteristics that cannot be collected due to respondent burden or cost. This means that statistical agencies have to conduct population based surveys to provide social, economic and demographic characteristics for a target population which are not captured by a large-scale census. These surveys are usually capable of producing direct estimates at the national level and high level regions but often cannot produce reliable estimates for smaller areas. Due to the increasing demand for comprehensive statistical information not only at the national level but also for sub-national domains, there is a wide discussion in the literature about the use of statistical techniques that combine survey with census data to provide more detailed, finer-level estimates. Where censuses and sample surveys are based on the same reporting units, statistical matching techniques can be employed to link the records from survey and census data where exact matching of reporting units is impossible due to confidentiality restrictions. These techniques can then provide the detailed social, economic and demographic information required for small areas. An approach is developed in this paper in which a close-to-reality synthetic population of individuals and households is generated from available census tables using an iterative proportional updating (IPU) method. Statistical matching using a nearest neighbour method is then used to impute survey data to the individuals and households in the synthetic population. To evaluate this approach, 2011 Bangladesh census data is used to generate a district-specific synthetic population of individuals and households. Matching is then performed by imputing the nearest possible records among the 2011 Bangladesh Demographic and Health Survey to estimate the wealth index for each household within the synthetic population. The results show that using the method presented in this paper helps with achieving more representative estimates (comparing with direct survey estimates) particularly for areas with small sample sizes where not many population units with different socio-demographic characteristics are included.
Original languageEnglish
Pages (from-to)3-14
Number of pages12
JournalComputers, Environment and Urban Systems
Volume63
DOIs
Publication statusPublished - May 2017

Fingerprint

census
population characteristics
Bangladesh
social economics
demographic survey
health survey
household
economics
district
cost
demand
method
costs
health

Cite this

@article{e7738acae7854c5289f80174e05ab22d,
title = "An unconstrained statistical matching algorithm for combining individual and household level geo-specific census and survey data",
abstract = "The Population Census is an important source of statistical information in most countries that is capable of producing reliable estimates of population characteristics for small geographic areas. One limitation of a census is that there are many population characteristics that cannot be collected due to respondent burden or cost. This means that statistical agencies have to conduct population based surveys to provide social, economic and demographic characteristics for a target population which are not captured by a large-scale census. These surveys are usually capable of producing direct estimates at the national level and high level regions but often cannot produce reliable estimates for smaller areas. Due to the increasing demand for comprehensive statistical information not only at the national level but also for sub-national domains, there is a wide discussion in the literature about the use of statistical techniques that combine survey with census data to provide more detailed, finer-level estimates. Where censuses and sample surveys are based on the same reporting units, statistical matching techniques can be employed to link the records from survey and census data where exact matching of reporting units is impossible due to confidentiality restrictions. These techniques can then provide the detailed social, economic and demographic information required for small areas. An approach is developed in this paper in which a close-to-reality synthetic population of individuals and households is generated from available census tables using an iterative proportional updating (IPU) method. Statistical matching using a nearest neighbour method is then used to impute survey data to the individuals and households in the synthetic population. To evaluate this approach, 2011 Bangladesh census data is used to generate a district-specific synthetic population of individuals and households. Matching is then performed by imputing the nearest possible records among the 2011 Bangladesh Demographic and Health Survey to estimate the wealth index for each household within the synthetic population. The results show that using the method presented in this paper helps with achieving more representative estimates (comparing with direct survey estimates) particularly for areas with small sample sizes where not many population units with different socio-demographic characteristics are included.",
keywords = "Imputation, K-nearest neighbours, Pseudo census, Small area estimation, Spatial microsimulation, Synthetic population",
author = "Mohammad-Reza Namazi-Rad and Robert TANTON and David Steel and Sumonkanti Das",
year = "2017",
month = "5",
doi = "10.1016/j.compenvurbsys.2016.11.003",
language = "English",
volume = "63",
pages = "3--14",
journal = "Urban Systems",
issn = "0198-9715",
publisher = "Elsevier Limited",

}

An unconstrained statistical matching algorithm for combining individual and household level geo-specific census and survey data. / Namazi-Rad, Mohammad-Reza; TANTON, Robert; Steel, David; Das, Sumonkanti.

In: Computers, Environment and Urban Systems, Vol. 63, 05.2017, p. 3-14.

Research output: Contribution to journalArticle

TY - JOUR

T1 - An unconstrained statistical matching algorithm for combining individual and household level geo-specific census and survey data

AU - Namazi-Rad, Mohammad-Reza

AU - TANTON, Robert

AU - Steel, David

AU - Das, Sumonkanti

PY - 2017/5

Y1 - 2017/5

N2 - The Population Census is an important source of statistical information in most countries that is capable of producing reliable estimates of population characteristics for small geographic areas. One limitation of a census is that there are many population characteristics that cannot be collected due to respondent burden or cost. This means that statistical agencies have to conduct population based surveys to provide social, economic and demographic characteristics for a target population which are not captured by a large-scale census. These surveys are usually capable of producing direct estimates at the national level and high level regions but often cannot produce reliable estimates for smaller areas. Due to the increasing demand for comprehensive statistical information not only at the national level but also for sub-national domains, there is a wide discussion in the literature about the use of statistical techniques that combine survey with census data to provide more detailed, finer-level estimates. Where censuses and sample surveys are based on the same reporting units, statistical matching techniques can be employed to link the records from survey and census data where exact matching of reporting units is impossible due to confidentiality restrictions. These techniques can then provide the detailed social, economic and demographic information required for small areas. An approach is developed in this paper in which a close-to-reality synthetic population of individuals and households is generated from available census tables using an iterative proportional updating (IPU) method. Statistical matching using a nearest neighbour method is then used to impute survey data to the individuals and households in the synthetic population. To evaluate this approach, 2011 Bangladesh census data is used to generate a district-specific synthetic population of individuals and households. Matching is then performed by imputing the nearest possible records among the 2011 Bangladesh Demographic and Health Survey to estimate the wealth index for each household within the synthetic population. The results show that using the method presented in this paper helps with achieving more representative estimates (comparing with direct survey estimates) particularly for areas with small sample sizes where not many population units with different socio-demographic characteristics are included.

AB - The Population Census is an important source of statistical information in most countries that is capable of producing reliable estimates of population characteristics for small geographic areas. One limitation of a census is that there are many population characteristics that cannot be collected due to respondent burden or cost. This means that statistical agencies have to conduct population based surveys to provide social, economic and demographic characteristics for a target population which are not captured by a large-scale census. These surveys are usually capable of producing direct estimates at the national level and high level regions but often cannot produce reliable estimates for smaller areas. Due to the increasing demand for comprehensive statistical information not only at the national level but also for sub-national domains, there is a wide discussion in the literature about the use of statistical techniques that combine survey with census data to provide more detailed, finer-level estimates. Where censuses and sample surveys are based on the same reporting units, statistical matching techniques can be employed to link the records from survey and census data where exact matching of reporting units is impossible due to confidentiality restrictions. These techniques can then provide the detailed social, economic and demographic information required for small areas. An approach is developed in this paper in which a close-to-reality synthetic population of individuals and households is generated from available census tables using an iterative proportional updating (IPU) method. Statistical matching using a nearest neighbour method is then used to impute survey data to the individuals and households in the synthetic population. To evaluate this approach, 2011 Bangladesh census data is used to generate a district-specific synthetic population of individuals and households. Matching is then performed by imputing the nearest possible records among the 2011 Bangladesh Demographic and Health Survey to estimate the wealth index for each household within the synthetic population. The results show that using the method presented in this paper helps with achieving more representative estimates (comparing with direct survey estimates) particularly for areas with small sample sizes where not many population units with different socio-demographic characteristics are included.

KW - Imputation

KW - K-nearest neighbours

KW - Pseudo census

KW - Small area estimation

KW - Spatial microsimulation

KW - Synthetic population

UR - http://www.scopus.com/inward/record.url?scp=85007284251&partnerID=8YFLogxK

UR - http://www.mendeley.com/research/unconstrained-statistical-matching-algorithm-combining-individual-household-level-geospecific-census

U2 - 10.1016/j.compenvurbsys.2016.11.003

DO - 10.1016/j.compenvurbsys.2016.11.003

M3 - Article

VL - 63

SP - 3

EP - 14

JO - Urban Systems

JF - Urban Systems

SN - 0198-9715

ER -