Alter, U., & Counsell, A. (2021, June). Determining negligible associations in regression. doi: 10.17605/OSF.IO/W96XE. Altman, D. G., & Bland, J. M. (1995). Statistics notes: Absence of evidence is not evidence of absence. BMJ, 311(7003), 485–485. doi: 10.1136/bmj.311.7003.485. Amrhein, V., Korner-Nievergelt, F., & Roth, T. (2017). The earth is flat (p > 0.05): Significance thresholds and the crisis of unreplicable research. PeerJ, 5, e3544. doi: 10.7717/peerj.3544. Bakbergenuly, I., & Kulinskaya, E. (2018). Meta-analysis of binary outcomes via generalized linear mixed models: A simulation study. BMC Medical Research Methodology, 18(1). doi: 10.1186/s12874-018-0531-9. Bartolucci, A. A., Tendera, M., & Howard, G. (2011). Meta-analysis of multiple primary prevention trials of cardiovascular events using aspirin. The American Journal of Cardiology, 107(12), 1796–1801. doi: 10.1016/j.amjcard.2011.02.325. Bauer, P., & Kieser, M. (1996). A unifying approach for confidence intervals and testing of equivalence and difference. Biometrika, 83(4), 934–937. doi: 10.1093/biomet/83.4.934. Berg, N. (2004). No-decision classification: An alternative to testing for statistical significance. The Journal of Socio-Economics, 33(5), 631–650. doi: 10.1016/j.socec.2004.09.036. Berger, J. O. (1985). Statistical decision theory and bayesian analysis. Springer-Verlag. doi: 10.1007/978- 1- 4757-4286-2. Berger, R. L., & Hsu, J. C. (1996). Bioequivalence trials, intersection-union tests and equivalence confidence sets. Statistical Science, 11(4), 283–319. doi: 10.1214/ss/1032280304. Biecek, P., et al. (2024a). An experimental study on the rashomon effect of balancing methods in imbalanced classification. arXiv preprint arXiv:2405.01557. Biecek, P., et al. (2024b). Performance is not enough: The story told by a rashomon quartet. Journal of Computational and Graphical Statistics. Cahill, K., Lindson-Hawley, N., Thomas, K. H., Fanshawe, T. R., & Lancaster, T. (2016). Nicotine receptor partial agonists for smoking cessation. Cochrane Database of Systematic Reviews. doi: 10.1002/14651858.CD006103.pub7. Campbell, H. (2020). Equivalence testing for standardized effect sizes in linear regression. arXiv preprint arXiv:2004.01757. doi: 10.48550/arXiv.2004.01757. Campbell, H., & Gustafson, P. (2018). Conditional equivalence testing: An alternative remedy for publication bias. PloS one, 13(4), e0195145. doi: 10.1371/journal.pone.0195145. Campo, M., & Lichtman, S. W. (2008). Interpretation of research in physical therapy: Limitations of null hypothesis significance testing. Journal of Physical Therapy Education, 22(1), 43–48. doi: 10.1097/00001416-200801000-00007. Cecato, J. F., Martinelli, J. E., Izbicki, R., Yassuda, M. S., & Aprahamian, I. (2016). A subtest analysis of the Montreal cognitive assessment (MoCA): which subtests can best discriminate between healthy controls, mild cognitive impairment and Alzheimer’s disease? International Psychogeriatrics, 28(5), 825–832. doi: 10.1017/S1041610215001982. Citrome, L. (2011). The tyranny of the p-value: Effect size matters. Klinik Psikofarmakoloji Bulteni-Bulletin of Clinical Psychopharmacology, 21(2), 91–92. doi: 10.5455/bcp.20110706020600. Cohen, J. (1992). Things I have learned (so far). Annual Convention of the American Psychological Association, 98th, Aug, 1990, Boston, MA, US; Presented at the aforementioned conference. doi: 10.1037/0003-066X.45.12.1304. Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49(12), 997–1003. doi: 10.1037/0003-066x.49.12.997. Cohen, J. (2013, May). Statistical power analysis for the behavioral sciences (2nd ed.) [Chapter 2, pg. 20]. Routledge. doi: 10.4324/9780203771587. Coscrato, V., Izbicki, R., & Stern, R. B. (2020). Agnostic tests can control the type I and type II errors simultaneously. Brazilian Journal of Probability and Statistics, 34(2), 230–250. doi: 10.1214/19-BJPS431. Counsell, A., & Cribbie, R. A. (2015). Equivalence tests for comparing correlation and regression coefficients. British Journal of Mathematical and Statistical Psychology, 68(2), 292–309. doi: 10.1111/bmsp.12045. Da Silva, G. M., Esteves, L. G., Fossaluza, V., Izbicki, R., & Wechsler, S. (2015). A Bayesian decision-theoretic approach to logically-consistent hypothesis testing. Entropy, 17(10), 6534–6559. doi: 10.3390/e17106534. da Silva Teixeira, R., Nazareth, I. F., de Paula, L. C., do Nascimento Duque, G. P., & Colugnati, F. A. B. (2022). Adherence to computational technologies for the treatment of smoking cessation: Systematic review and meta-analysis. International Journal of Mental Health and Addiction. doi: 10.1007/s11469-022-00839-5. Davies, P. L., Kovac, A., & Meise, M. (2009). Nonparametric regression, confidence regions and regularization. The Annals of Statistics, 37(5B), 2597–2625. doi: 10.1214/07-AOS575. Diez, D. M., Barr, C. D., & Cetinkaya-Rundel, M. (2012). Open-intro statistics. OpenIntro Boston, MA, USA. doi: 10.5070/t573020084. Diggle, P. J., & Chetwynd, A. G. (2011). Statistics and scientific method: An introduction for students and researchers. Oxford University Press. doi: 10.1093/acprof:oso/9780199543182.001.0001. Dixon, P. M., & Pechmann, J. H. K. (2005). A statistical test to show negligible trend. Ecology, 86(7), 1751–1756. doi: 10.1890/04-1343. Domingos, P. (2012). A few useful things to know about machine learning. Communications of the ACM, 55(10), 78–87. Edwards, W., Lindman, H., & Savage, L. J. (1963). Bayesian statistical inference for psychological research. Psychological Review, 70(3), 193. doi: 10.1037/h0044139. Ellis, P. D. (2010). The essential guide to effect sizes: Statistical power, meta-analysis, and the interpretation of research results. Cambridge University Press. doi: 10.1017/CBO9780511761676. Esteves, L. G., Izbicki, R., Stern, J. M., & Stern, R. B. (2016). The logical consistency of simultaneous agnostic hypothesis tests. Entropy, 18(7), 256. doi: 10 . 3390/e18070256. Esteves, L. G., Izbicki, R., Stern, J. M., & Stern, R. B. (2023). Logical coherence in bayesian simultaneous three-way hypothesis tests. International Journal of Approximate Reasoning, 152, 297–309. doi: 10.1016/j.ijar.2022.10.019. Esteves, L. G., Izbicki, R., Stern, J. M., & Stern, R. B. (2019). Pragmatic hypotheses in the evolution of science. Entropy, 21(9), 883. doi: 10.3390/e21090883. Faber, J., & Fonseca, L. M. (2014). How sample size influences research outcomes. Dental Press Journal of Orthodontics, 19(4), 27–29. doi: 10.1590/2176-9451.19.4.027-029.ebo. Fidler, F., Singleton Thorn, F., Barnett, A., Kambouris, S., & Kruger, A. (2018). The epistemic importance of establishing the absence of an effect. Advances in Methods and Practices in Psychological Science, 1(2), 237–244. doi: 10.1177/2515245918770407. Fossaluza, V., Izbicki, R., da Silva, G. M., & Esteves, L. G. (2017). Coherent hypothesis testing. The American Statistician, 71(3), 242–248. doi: 10 . 1080/00031305.2016.1237893. Friedman, J., Hastie, T., & Tibshirani, R. (2001). The elements of statistical learning. Springer. Friedman, L. M., Furberg, C. D., DeMets, D. L., Reboussin, D. M., & Granger, C. B. (2015). Fundamentals of clinical trials. Springer. doi: 10.1007/978-3-319-18539-2. Fritz, C. O., Morris, P. E., & Richler, J. J. (2012). Effect size estimates: Current use, calculations, and interpretation. Journal of Experimental Psychology: General, 141(1), 2. doi: 10.1037/a0024338. Genovese, C. R., & Wasserman, L. (2005). Confidence sets for nonparametric wavelet regression. The Annals of Statistics, 33(2). doi: 10.1214/009053605000000011. Gill, J. (1999). The insignificance of null hypothesis significance testing. Political research quarterly, 52(3), 647–674. doi: 10.1177/106591299905200309. Goeman, J. J., Solari, A., & Stijnen, T. (2010). Three-sided hypothesis testing: Simultaneous testing of superiority, equivalence and inferiority. Statistics in medicine, 29(20), 2117–2125. doi: 10.1002/sim.4002. Good, I. J. (2009). Some logic and history of hypothesis testing. In Good thinking: The foundations of probability and its applications (pp. 129–148). Dover Publications. Goulet-Pelletier, J.-C., & Cousineau, D. (2018). A review of effect sizes and their confidence intervals, part i: The cohen’sd family. The Quantitative Methods for Psychology, 14(4), 242–265. doi: 10.20982/tqmp.14.4.p242. Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, p values, confidence intervals, and power: A guide to misinterpretations. European Journal of Epidemiology, 31(4), 337–350. doi: 10.1007/s10654-016-0149-3. Gross, J. H. (2014). Testing what matters (if you must test at all): A context-driven approach to substantive and statistical significance. American Journal of Political Science, 59(3), 775–788. doi: 10.1111/ajps.12149. Gupta, S. S., & Huang, D.-Y. (1981). Multiple statistical decision theory: Recent developments. Springer New York. doi: 10.1007/978-1-4612-5925-1. Hansen, S., & Rice, K. (2023). Coherent tests for interval null hypotheses. The American Statistician, 77(1), 20–28. doi: 10.1080/00031305.2022.2050299. Hays, W. L. (1963). Statistics for psychologists. Holt, Rinehart &. Winston. Hobbs, B. P., & Carlin, B. P. (2007). Practical bayesian design and analysis for drug and device clinical trials. Journal of Biopharmaceutical Statistics, 18(1), 54–80. doi: 10.1080/10543400701668266. Izbicki, R., & Esteves, L. G. (2015). Logical consistency in simultaneous statistical test procedures. Logic Journal of the IGPL, 23(5), 732–758. doi: 10.1093/jigpal/jzv027. Izbicki, R., Fossaluza, V., Hounie, A. G., Nakano, E. Y., & de Braganca Pereira, C. A. (2012). Testing allele homogeneity: The problem of nested hypotheses. BMC genetics, 13, 1–11. doi: 10.1186/1471-2156-13-103. Jefferson, T., Dooley, L., Ferroni, E., Al-Ansary, L. A., van Driel, M. L., Bawazeer, G. A., Jones, M. A., Hoffmann, T. C., Clark, J., Beller, E. M., et al. (2023). Physical interventions to interrupt or reduce the spread of respiratory viruses. Cochrane database of systematic reviews, (1). doi: 10.1002/14651858.CD006207.pub6. Jeffreys, H. (1961). Theory of probability (Third). Oxford. doi: 10.1093/oso/9780198503682.001.0001. Johnson, R. A., & Wichern, D. W. (2002). Applied multivariate statistical analysis (5th ed.). Prentice Hall. Jones, L. V., & Tukey, J. W. (2000). A sensible formulation of the significance test. Psychological methods, 5(4), 411. doi: 10.1037/1082-989x.5.4.411. Julious, S. A. (2004). Sample sizes for clinical trials with normal data. Statistics in medicine, 23(12), 1921–1986. doi: 10.1002/sim.1783. Kadane, J. B. (2016). Beyond hypothesis testing. Entropy, 18(5), 199. doi: 10.3390/e18050199. Kass, R. E. (1993). Bayes factors in practice. Journal of the Royal Statistical Society. Series D (The Statistician), 42(5), 551–560. doi: 10.2307/2348679. Kelter, R. (2020). Bayesian alternatives to null hypothesis significance testing in biomedical research: A non-technical introduction to bayesian inference with jasp. BMC Medical Research Methodology, 20(1), 1–12. doi: 10.1186/s12874-020-00980-6. Keren, G., & Lewis, C. (1993). A handbook for data analysis in the behavioral sciences: Methodological issues. L. Erlbaum Associates. doi: 10.4324/9781315799582. Keysers, C., Gazzola, V., & Wagenmakers, E.-J. (2020). Using bayes factor hypothesis testing in neuroscience to establish evidence of absence. Nature neuroscience, 23(7), 788–799. doi: 10.1038/s41593-020-0660-4. Kirk, R. E. (2007). Effect magnitude: A different focus. Journal of statistical planning and inference, 137(5), 1634–1646. doi: 10.1016/j.jspi.2006.09.011. Kruschke, J. K. (2010). Bayesian data analysis. Wiley Interdisciplinary Reviews: Cognitive Science, 1(5), 658–676. doi: 10.1002/wcs.72. Kruschke, J. K. (2018). Rejecting or accepting parameter values in bayesian estimation. Advances in methods and practices in psychological science, 1(2), 270–280. doi: 10.1177/2515245918771304. Kruschke, J. K., & Liddell, T. M. (2018). The bayesian new statistics: Hypothesis testing, estimation, metaanalysis, and power analysis from a bayesian perspective. Psychonomic bulletin & review, 25, 178–206. doi: 10.3758/s13423-016-1221-4. Kutner, M., Nachtsheim, C. J., Neter, J., & Wasserman, L. (2005). Applied linear statistical models (5th ed.). McGraw-Hill Irwin. Lakens, D. (2017). Equivalence tests: A practical primer for t tests, correlations, and meta-analyses. Social psychological and personality science, 8(4), 355–362. doi: 10.1177/1948550617697177. Lakens, D. (2022). Improving your statistical inferences. doi: 10.5281/ZENODO.6409077. Lakens, D., Scheel, A. M., & Isager, P. M. (2018). Equivalence testing for psychological research: A tutorial. Advances in Methods and Practices in Psychological Science, 1(2), 259–269. doi: 10.1177/2515245918770963. Lassance, R. F. L., Izbicki, R., & Stern, R. B. (2025). Adding imprecision to hypotheses: A bayesian framework for testing practical significance in nonparametric settings. International Journal of Approximate Reasoning, 178, 109332. doi: 10.1016/j.ijar.2024.109332. Lassance, R. F. L., Stern, J. M., & Stern, R. B. (2024). Nonparametric fbst for validating linear models. doi: 10.48550/ARXIV.2406.15608. Lavine, M., & Schervish, M. J. (1999). Bayes factors: What they are and what they are not. The American Statistician, 53(2), 119–122. doi: 10.1080/00031305.1999.10474443. Lecoutre, B., & Poitevineau, J. (2022). The significance test controversy revisited. In The significance test controversy revisited: The fiducial bayesian alternative (pp. 41–54). Springer. doi: 10.1007/978-3-662-65705-8. Lehmann, E. L. (1957). A theory of some multiple decision problems, i. The Annals of Mathematical Statistics, 28(1), 1–25. doi: 10.1214/aoms/1177707034. Leung, J. T., Barnes, S. L., Lo, S. T., & Leung, D. Y. (2020). Non-inferiority trials in cardiology: What clinicians need to know. Heart, 106(2), 99–104. doi: 10.1136/heartjnl-2019-315772. Makowski, D., Ben-Shachar, M., & L¨udecke, D. (2019). Bayestestr: Describing effects and their uncertainty, existence and significance within the bayesian framework. Journal of Open Source Software, 4(40), 1541. doi: 10.21105/joss.01541. Mara, C. A., & Cribbie, R. A. (2012). Paired-samples tests of equivalence. Communications in Statistics-Simulation and Computation, 41(10), 1928–1943. doi: 10.1080/03610918.2011.626545. Mayo, D. G. (2018). Statistical inference as severe testing: How to get beyond the statistics wars. Cambridge University Press. doi: 10.1017/9781107286184. Mayo, D. G., & Spanos, A. (2006). Severe testing as a basic concept in a Neyman–Pearson philosophy of induction. The British Journal for the Philosophy of Science. doi: 10.1093/bjps/axl003. Meeker, W. Q., & Escobar, L. A. (1995). Teaching about approximate confidence regions based on maximum likelihood estimation. The American Statistician, 49(1), 48–53. doi: 10.1080/00031305.1995.10476112. Mehra, M. R., Desai, S. S., Ruschitzka, F., & Patel, A. N. (2020). Retracted: Hydroxychloroquine or chloroquine with or without a macrolide for treatment of covid-19: A multinational registry analysis. The Lancet. doi: 10.1016/s0140-6736(20)31180-6. Mehrabi, N., Morstatter, F., et al. (2021). A survey on bias and fairness in machine learning. ACM Computing Surveys, 54(6), 1–35. Meyners, M. (2012). Equivalence tests–a review. Food quality and preference, 26(2), 231–245. doi: 10.1016/j .foodqual.2012.05.003. Migon, H., Gamerman, D., & Louzada, F. (2014). Statistical inference: An integrated approach, second edition. CRC Press. doi: 10.1201/b17229. Mitchell, M., Shankar, S., et al. (2021). Ai fairness: A review of bias mitigation techniques. Advances in Neural Information Processing Systems. Neyman, J. (1957). The use of the concept of power in agricultural experimentation. Journal of the Indian Society of Agricultural Statistics, IX, 9–17. Neyman, J. (1976). Tests of statistical hypotheses and their use in studies of natural phenomena. Communications in statistics – Theory and methods, 5(8), 737–751. doi: 10.1080/03610927608827392. Park, B., Balakrishnan, S., & Wasserman, L. (2023). Robust universal inference for misspecified models. doi: 10.48550/ARXIV.2307.04034. Patriota, A. G. (2013). A classical measure of evidence for general null hypotheses. Fuzzy Sets and Systems, 233, 74–88. doi: 10.1016/j.fss.2013.03.007. Pearson, K. (1920). Notes on the history of correlation. Biometrika, 13(1), 25–45. doi: 10.1093/biomet/13.1.25. Pereira, C. A. d. B., & Stern, J. M. (1999). Evidence and credibility: Full bayesian significance test for precise hypotheses. Entropy, 1(4), 99–110. doi: 10.3390/e1040099. Phadia, E. G. (2016). Prior processes and their applications. Springer International Publishing. doi: 10.1007/978-3-319-32789-1. Pike, H. (2019). Statistical significance should be abandoned, say scientists. BMJ, 364. doi: 10.1136/bmj.l1374. Rainey, C. (2014). Arguing for a negligible effect. American Journal of Political Science, 58(4), 1083–1091. doi: 10.1111/ajps.12102. Rice, K. M., & Krakauer, C. A. (2023). Three-decision methods: A sensible formulation of significance tests—and much else. Annual Review of Statistics and Its Application, 10. doi: 10.1146/annurev-statistics-033021-111159. Robins, J., & van der Vaart, A. (2006). Adaptive nonparametric confidence sets. The Annals of Statistics, 34(1). doi: 10.1214/009053605000000877. Robinson, A. P., Duursma, R. A., & Marshall, J. D. (2005). A regression-based equivalence test for model validation: Shifting the burden of proof. Tree physiology, 25(7), 903–913. doi: 10.1093/treephys/25.7.903. Roth, M., Tym, E., Mountjoy, C. Q., Huppert, F. A., Hendrie, H., Verma, S., & Goddard, R. (1986). CAMDEX: A standardised instrument for the diagnosis of mental disorder in the elderly with special reference to the early detection of dementia. The British journal of psychiatry, 149(6), 698–709. doi: 10.1192/bjp.149.6.698. Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., & Iverson, G. (2009). Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic bulletin & review, 16, 225–237. doi: 10.3758/PBR.16.2.225. Scheffé, H. (1999). The analysis of variance (Vol. 72). John Wiley & Sons. Schervish, M. J. (1995). Theory of statistics. Springer New York. doi: 10.1007/978-1-4612-4250-5. Schuirmann, D. J. (1987). A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. Journal of pharmacokinetics and biopharmaceutics, 15, 657–680. doi: 10.1007/bf01068419. Smith, R. J. (2020). P>. 05: The incorrect interpretation of “not significant” results is a significant problem. American journal of physical anthropology, 172(4), 521–527. doi: 10.1002/ajpa.24092. Stead, L. F., Perera, R., Bullen, C., Mant, D., Hartmann-Boyce, J., Cahill, K., & Lancaster, T. (2012). Nicotine replacement therapy for smoking cessation. Cochrane Database of Systematic Reviews. doi: 10.1002/14651858.cd000146.pub4. Stern, J. M., Izbicki, R., Esteves, L. G., & Stern, R. B. (2017). Logically-consistent hypothesis testing and the hexagon of oppositions. Logic Journal of the IGPL, 25(5), 741–757. doi: 10.1093/jigpal/jzx024. Sullivan, G. M., & Feinn, R. (2012). Using effect size—or why the p value is not enough. Journal of graduate medical education, 4(3), 279–282. doi: 10.4300/JGME-D-12-00156.1. Trafimow, D., Amrhein, V., Areshenkoff, C. N., Barrera-Causil, C. J., Beh, E. J., Bilgi¸c, Y. K., Bono, R., Bradley, M. T., Briggs, W. M., Cepeda-Freyre, H. A., et al. (2018). Manipulating the alpha level cannot cure significance testing. Frontiers in Psychology, 9. doi: 10.3389/fpsyg.2018.00699. Trafimow, D., & Marks, M. (2015). Editorial. Basic and Applied Social Psychology, 37(1), 1–2. doi: 10.1080/01973533.2015.1012991. Tryon, W. W. (2001). Evaluating statistical difference, equivalence, and indeterminacy using inferential confidence intervals: An integrated alternative method of conducting null hypothesis statistical tests. Psychological methods, 6(4), 371. doi: 10.1037/1082-989x.6.4.371. Tukey, J. W. (1953). The problem of multiple comparisons. Multiple comparisons. Vaughan, G. M., & Corballis, M. C. (1969). Beyond tests of significance: Estimating strength of effects in selected anova designs. Psychological bulletin, 72(3), 204. doi: 10.1037/h0027878. Walker, E., & Nowacki, A. S. (2011). Understanding equivalence and noninferiority testing. Journal of general internal medicine, 26, 192–196. doi: 10.1007/s11606-010-1513-8. Wang, X.-g., & Shen, H. C. (1999). Multiple hypothesis testing method for decision making. Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No. 99CH36288C), 3, 2090–2095. doi: 10.1109/ROBOT.1999.770415. Wang, Y., Devji, T., Carrasco-Labra, A., King, M. T., Terluin, B., Terwee, C. B., Walsh, M., Furukawa, T. A., & Guyatt, G. H. (2023). A step-by-step approach for selecting an optimal minimal important difference. BMJ, 381. doi: 10.1136/bmj-2022-073822. Wasserman, L. (2013). All of statistics: A concise course in statistical inference. Springer Science & Business Media. doi: 10.1007/978-0-387-21736-9. Wasserstein, R. L., & Lazar, N. A. (2016). The asa statement on p-values: Context, process, and purpose. The American Statistician, 70(2), 129–133. doi: 10.1080/00031305.2016.1154108. Wasserstein, R. L., Schirm, A. L., & Lazar, N. A. (2019). Moving to a world beyond p < 0.05. The American Statistician, 73(sup1), 1–19. doi: 10.1080/00031305.2019.1583913. Weber, R., & Popova, L. (2012). Testing equivalence in communication research: Theory and application. Communication methods and measures, 6(3), 190–213. doi: 10.1080/19312458.2012.703834. Wellek, S. (2010). Testing statistical hypotheses of equivalence and noninferiority. CRC press. doi: 10.1201/EBK1439808184. Wellek, S., & Blettner, M. (2012). Establishing equivalence or non-inferiority in clinical trials: Part 20 of a series on evaluation of scientific publications. Deutsches ¨Arzteblatt International, 109(41), 674. doi: 10.3238/arztebl.2012.0674. Westlake, W. J. (1976). Symmetrical confidence intervals for bioequivalence trials. Biometrics, 741–744. doi: 10.2307/2529259. Yang, C., Bartolucci, A. A., & Cui, X. (2015). Multigroup equivalence analysis for high-dimensional expression data. Cancer Informatics, 14, CIN–S17304. doi: 10.4137/cin.s17304. Zhao, G. (2016). Considering both statistical and clinical significance. International Journal of Statistics and Probability, 5(5), 16. doi: 10.5539/ijsp.v5n5p16. Zhao, Y., Caffo, B. S., & Ewen, J. B. (2022). B-value and empirical equivalence bound: A new procedure of hypothesis testing. Statistics in Medicine, 41(6), 964–980. doi: 10.1002/sim.9298. Zöller, M.-A., & Huber, M. F. (2022). A survey on automated machine learning. ACM Computing Surveys, 54(6), 1–34.