On the perils of categorizing responses

Doi: 10.20982/tqmp.05.1.p035

Lemon, Jim
Keywords: Stastitics , Conversion to a categorical scale
Tools: R
The assumptions underlying the categorization of numeric measurements are examined and it is concluded that some numeric data that are measured by categories might better be obtained by direct estimates. Statistical tests are performed on artificially generated data of normal, triangular and empirically measured distributions, and on various categorizations of these data. It is shown that categorization can markedly affect the outcome of significance tests, in some cases leading to both Type I and Type II errors. When high local densities of values are numerically separated by categorization, test statistics can be substantially inflated from the uncategorized values. It is recommended that response categorization be subjected to the same critical analysis as data transformation techniques like arbitrary dichotomization.

