An investigation of the imputation techniques for missing values in ordinal data enhancing clustering and classification analysis validity

dc.citation.volume9
dc.contributor.authorAlam S
dc.contributor.authorAyub MS
dc.contributor.authorArora S
dc.contributor.authorKhan MA
dc.date.accessioned2024-06-27T01:37:53Z
dc.date.available2024-06-27T01:37:53Z
dc.date.issued2023-12
dc.description.abstractMissing data can significantly impact dataset integrity and suitability, leading to unreliable statistical results, distortions, and poor decisions. The presence of missing values in data introduces inaccuracies in clustering and classification and compromises the reliability and validity of such analyses. This study investigates multiple imputation techniques specifically designed for handling missing values in ordinal data commonly encountered in surveys and questionnaires. Quantitative approaches are used to evaluate different imputation methods on various datasets with varying missing value percentages. By comparing the performance of imputation techniques using clustering metrics and algorithms (e.g., k-means, Partitioning Around Medoids), the study provides valuable insights for selecting appropriate imputation methods for accurate data analysis. Furthermore, the study examines the impact of imputed values on classification algorithms, including k-Nearest Neighbors (kNN), Naive Bayes (NB), and Multilayer Perceptron (MLP). Results demonstrate that the decision tree method is the most effective approach, closely aligning with the original data and achieving high accuracy. In contrast, random number imputation performs poorly, indicating limited reliability. This study advances the understanding of handling missing values and emphasizes the need to address this issue to enhance data analysis integrity and validity.
dc.description.confidentialfalse
dc.edition.editionDecember 2023
dc.identifier.citationAlam S, Ayub MS, Arora S, Khan MA. (2023). An investigation of the imputation techniques for missing values in ordinal data enhancing clustering and classification analysis validity. Decision Analytics Journal. 9.
dc.identifier.doi10.1016/j.dajour.2023.100341
dc.identifier.eissn2772-6622
dc.identifier.elements-typejournal-article
dc.identifier.number100341
dc.identifier.piiS2772662223001819
dc.identifier.urihttps://mro.massey.ac.nz/handle/10179/70028
dc.languageEnglish
dc.publisherElsevier Inc
dc.publisher.urihttps://www.sciencedirect.com/science/article/pii/S2772662223001819
dc.relation.isPartOfDecision Analytics Journal
dc.rights(c) 2023 The Author/s
dc.rightsCC BY-NC-ND 4.0
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subjectClassification
dc.subjectClustering
dc.subjectImputation
dc.subjectOrdinal data
dc.subjectPartitioning Around Medoids
dc.subjectMultilayer Perceptron
dc.titleAn investigation of the imputation techniques for missing values in ordinal data enhancing clustering and classification analysis validity
dc.typeJournal article
pubs.elements-id483964
pubs.organisational-groupOther
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Published version.pdf
Size:
7.5 MB
Format:
Adobe Portable Document Format
Description:
483964 PDF.pdf
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
9.22 KB
Format:
Plain Text
Description:
Collections