An investigation of the imputation techniques for missing values in ordinal data enhancing clustering and classification analysis validity

Alam S; Ayub MS; Arora S; Khan MA

An investigation of the imputation techniques for missing values in ordinal data enhancing clustering and classification analysis validity

dc.citation.volume	9
dc.contributor.author	Alam S
dc.contributor.author	Ayub MS
dc.contributor.author	Arora S
dc.contributor.author	Khan MA
dc.date.accessioned	2024-06-27T01:37:53Z
dc.date.available	2024-06-27T01:37:53Z
dc.date.issued	2023-12
dc.description.abstract	Missing data can significantly impact dataset integrity and suitability, leading to unreliable statistical results, distortions, and poor decisions. The presence of missing values in data introduces inaccuracies in clustering and classification and compromises the reliability and validity of such analyses. This study investigates multiple imputation techniques specifically designed for handling missing values in ordinal data commonly encountered in surveys and questionnaires. Quantitative approaches are used to evaluate different imputation methods on various datasets with varying missing value percentages. By comparing the performance of imputation techniques using clustering metrics and algorithms (e.g., k-means, Partitioning Around Medoids), the study provides valuable insights for selecting appropriate imputation methods for accurate data analysis. Furthermore, the study examines the impact of imputed values on classification algorithms, including k-Nearest Neighbors (kNN), Naive Bayes (NB), and Multilayer Perceptron (MLP). Results demonstrate that the decision tree method is the most effective approach, closely aligning with the original data and achieving high accuracy. In contrast, random number imputation performs poorly, indicating limited reliability. This study advances the understanding of handling missing values and emphasizes the need to address this issue to enhance data analysis integrity and validity.
dc.description.confidential	false
dc.edition.edition	December 2023
dc.identifier.citation	Alam S, Ayub MS, Arora S, Khan MA. (2023). An investigation of the imputation techniques for missing values in ordinal data enhancing clustering and classification analysis validity. Decision Analytics Journal. 9.
dc.identifier.doi	10.1016/j.dajour.2023.100341
dc.identifier.eissn	2772-6622
dc.identifier.elements-type	journal-article
dc.identifier.number	100341
dc.identifier.pii	S2772662223001819
dc.identifier.uri	https://mro.massey.ac.nz/handle/10179/70028
dc.language	English
dc.publisher	Elsevier Inc
dc.publisher.uri	https://www.sciencedirect.com/science/article/pii/S2772662223001819
dc.relation.isPartOf	Decision Analytics Journal
dc.rights	(c) 2023 The Author/s
dc.rights	CC BY-NC-ND 4.0
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject	Classification
dc.subject	Clustering
dc.subject	Imputation
dc.subject	Ordinal data
dc.subject	Partitioning Around Medoids
dc.subject	Multilayer Perceptron
dc.title	An investigation of the imputation techniques for missing values in ordinal data enhancing clustering and classification analysis validity
dc.type	Journal article
pubs.elements-id	483964
pubs.organisational-group	Other

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Published version.pdf
Size:: 7.5 MB
Format:: Adobe Portable Document Format
Description:: 483964 PDF.pdf

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 9.22 KB
Format:: Plain Text
Description:

Download

Collections

Journal Articles