IJAM: Volume 37, No. 2 (2024)

DOI: 10.12732/ijam.v37i2.3

 

AN INTRODUCTION TO MACHINE

LEARNING METHODS IN SAMPLE SURVEYS

 

Pankaj Das

 

Division of Sample Surveys

ICAR-Indian Agricultural Statistics Research Institute

New Delhi - 110012, INDIA

 

Abstract.  Machine learning is revolutionizing sample surveys by improving data collection,

analysis, and utilization. It combines advanced statistical techniques with computational algorithms to enhance survey sampling methods and data quality. Machine learning algorithms optimize survey sample design by identifying relevant variables, detecting patterns, and constructing efficient sampling strategies. They also assist in preprocessing and cleaning survey data, automatically detecting errors, imputing missing values, and handling outliers. Moreover, machine learning enables predictive modeling and estimation in sample surveys, leveraging large-scale data to generate models that predict outcomes, estimate population parameters, and uncover complex relationships among variables. Integrating machine learning into survey practices leads to more efficient and informative surveys, benefiting decision-making processes

across various domains. Overall, machine learning has the potential to transform sample surveys, enabling more accurate predictions and estimations and improving the overall effectiveness of surveys. The application of machine learning in sample surveys and its potential future  applications are described in the study.

 

 

Download paper from here

 

How to cite this paper?
DOI: 10.12732/ijam.v3
7i2.3
Source: 
International Journal of Applied Mathematics
ISSN printed version: 1311-1728
ISSN on-line version: 1314-8060
Year: 202
4
Volume: 3
7
Issue: 2

References

 

[1] A.L. Samuel, Some studies in machine learning using the game of checkers, IBM. J. Re. Dev., 3, No 3 (1959), 210-29.

[2] T. Mitchell, Machine Learning, McGraw Hill, p.2. ISBN 978-0-07-042807-2 (1997).

[3] T.deWaal, J. Pannekoek, S. Scholtus, Handbook of Statistical Data Editing and Imputation, Wiley, Hoboken (2011).

[4] P. Christen, Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection, Springer, Canberra (2012).

[5] K. Harron, H. Goldstein, C. Dibben, Methodological Developments in Data Linkage, Wiley, Chichester (2015).

[6] K. Harron, C. Dibben, J. Boyd, A. Hjern, M. Azimaee, M.L. Barreto, H. Goldstein, Challenges in administrative data linkage for research, Big Data Soc., 4, No 2 (2017).

[7] J.N.K. Rao, I. Molina, Small Area Estimation, Second Edition, Wiley, Hoboken (2015).

[8] J. Van den Brakel, E. S¨ohler, P. Daas, B. Buelens, Social media as a data source for official statistics; the Dutch Consumer Confidence Index, Surv. Methodol., 43 (2017), 183-210.

[9] L. Rizzo, G. Kalton, M. Brick, Handling missing data in survey research, Surv. Methodol., 22 (1996), 43-53.

[10] P. Lynn, Quality Profile: British Household Panel Survey Waves 1 to 13: 1991–2003, Institute for Social and Economic Research (2006).

[11] D. Judkins, H. Hao, B. Barrett, P. Adhikari, Modeling and polishing of nonresponse propensitys, In: JSM Proceedings, Survey Research Methods Section, American Statistical Association, Alexandria (2015), 3159-3166.

[12] A. Eck, L.K. Soh, A.L. McCutcheon, Modeling and polishing of nonresponse propensitys, In: 70th Annual Conference of the American Association for Public Opinion Research, Hollywood, FL (2015).

[13] P. Phipps, D. Toth, Analyzing establishment nonresponse using an interpretable regression tree model with linked administrative data, Ann. Appl. Stat., 6, No 2 (2012), 772–794.

[14] S. Lohr, V. Hsu, J. Montaquila, Modeling and polishing of nonresponse propensitys, In: JSM Proceedings, Survey Research Methods Section, American Statistical Association, Alexandria (2015), 2071–2085.

[15] J.P. Reiter, Using CART to generate partially synthetic public use microdata, J. Off. Stat., 21, No 3 (2012), 441.

[16] J. Nin, V. Torra, New approach to the re-Identification problem using neural networks, In: Modeling Decisions for Artificial Intelligence, Springer, Berlin-Heidelberg, 2005.

[17] C. Lu, X. Li, H. Pan, Application of SVM and fuzzy set theory for classifying with incomplete survey data, In: Proceedings of the IEEE International Conference on Service Systems and Service Management, 2007, 1-4.

[18] G. Caiola, J.P. Reiter, Random forests for generating partially synthetic, categorical data, Trans. Data Priv., 3, No 1 (2010), 27-42.

[19] S. Nordbotten, Neural network imputation applied to the Norwegian 1990 population census data, J. Off. Stat., 12, No 4 (1996), 385-401.

[20] H. Mallinson, A. Gammerman, Imputation Using Support Vector Machines, Department of Computer Science, Royal Holloway, University of London, Egham, UK (2003), 52.

[21] J. Drechsler, J.P. Reiter, An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets, Comput. Stat. Data. Anal., 52, No 12 (2011), 3232–3243.

[22] J.H. Jeong, J.P. Resop, N.D. Mueller, D.H. Fleisher, K. Yun, E.E. Butler, S.H. Kim, Random forests for global and regional crop yield predictions, PLoS One, 11, No 6 (2016), e0156571.

[23] A. Crane-Droesch, Machine learning methods for crop yield prediction and climate change impact assessment in agriculture, Environ. Res. Lett., 13, No 11 (2018), 114003.

[24] A. Elhag, A. Abdelhadi, Monitoring and yield estimation of sugarcane using remote sensing and GIS, Am. J. Eng. Res., 7, No 1 (2018), 170-179.

[25] S.J.J. Jui, A.M. Ahmed, A. Bose, N. Raj, E. Sharma, J. Soar, M.W.I. Chowdhury, patiotemporal hybrid random forest model for tea yield prediction using satellite-derived variables, Remote Sens., 14, No 3 (2022), 805.

[26] V.K. Sehgal, D. Chakraborty, R. Dhakar, J. Mukherjee, R.N. Sahoo, Crop yield assessment of smallholder farms using remote sensing and simulation modelling, In: Remote Sensing of Agriculture and Land Cover/Land Use Changes in South and Southeast Asian Countries, Cham, Springer International Publishing (2022).

[27] F.J. Breidt, J.D. Opsomer, Model-assisted survey estimation with modern prediction techniques, Stat. Sci., 32, No 2 (2017), 190–205.

[28] K.S. McConville, F.J. Breidt, T.C. Lee, G.G. Moisen, Model-assisted survey regression estimation with the lasso, J. Surv. Stat. Methodol., 5, No 2 (2017), 131-158.

[29] K.S. McConville, D. Toth, Automated selection of post-strata using a model-assisted regression tree estimator, Scand. J. Stat., 46, No 2 (2017), 389-413.

[30] M. Dagdoug, C. Goga, D. Haziza, Model-assisted estimation through random forests in finite population sampling, J. Am. Stat. Assoc., 00, (2021), 1-18.

 

·       IJAM

 

(c) 2010-2024, Academic Publications, Ltd.https://www.diogenes.bg/ijam/