Automatic test data generation to improve fault-localization based on causal-statistical analysis

Document Type : Persian Original Article

Authors

1 Iran University of Science and Technology, Tehran, Iran.

2 Associate Professor/Iran University of Science and Technology, Tehran, Iran.

Abstract

The statistical-based software fault localization approaches highly depend on the program inputs and become unstable as input data changes. Therefore, generating appropriate test data plays an essential role in the quality of the software fault-localization process. This paper presents an approach to improving program fault localization by generating new test data. The minimized test suite determines the faulty execution path and the fault suspiciousness of each statement in the path is generated. First, the suspicious statements in the faulty path are determined. To this aim, the conditions of the faulty execution path are contradicted from the end to the beginning, and test data is created for the desired path using the Z3 solver. Afterward, the program under test is executed with the generated test data using the Concolic testing technique. The fault-suspicious branch is determined depending on the passing or failing of the program execution. As a result, the region of statements for applying the causal-statistical approach is minimized. The proposed approach is evaluated on the four projects in the Defects4J benchmark. The results show that 75% of faults are localized by examining a maximum of 1% of the program's source code. Compared to the related work, the results have improved by 17.98%. Moreover, the mean number of sentences examined for fault localization decreases by 16.78% in the worst case.

Keywords


[1] Kshirasagar Naik and Priyadarshi Tripathy, Software testing and quality assurance: theory and practice. John Wiley & Sons, 2011.
[2] F. Feyzi and S. Parsa, “FPA-FL: Incorporating static fault-proneness analysis into statistical fault localization,” Journal of Systems and Software, vol. 136, pp. 39–58, Feb. 2018, doi: 10.1016/j.jss.2017.11.002.
[3] Y. Yang, F. Deng, Y. Yan, and F. Gao, “A fault localization method based on conditional probability,” in 2019 IEEE 19th International Conference on Software Quality, Reliability and Security Companion (QRS-C), Jul. 2019, pp. 213–218. doi: 10.1109/QRS-C.2019.00050.
[4] T. Shu, T. Ye, Z. Ding, and J. Xia, “Fault localization based on statement frequency,” Inf Sci (N Y), vol. 360, pp. 43–56, Sep. 2016, doi: 10.1016/j.ins.2016.04.023.
[5] W. E. Wong, R. Gao, Y. Li, R. Abreu, and F. Wotawa, “A survey on software fault localization,” IEEE Transactions on Software Engineering, vol. 42, no. 8, pp. 707–740, Aug. 2016, doi: 10.1109/TSE.2016.2521368.
[6] A. Aghamohammadi, S.-H. Mirian-Hosseinabadi, and S. Jalali, “Statement frequency coverage: a code coverage criterion for assessing test suite effectiveness,” Inf Softw Technol, vol. 129, p. 106426, Jan. 2021, doi: 10.1016/j.infsof.2020.106426.
[7] N. Neelofar, L. Naish, J. Lee, and K. Ramamohanarao, “Improving spectral-based fault localization using static analysis,” Softw Pract Exp, vol. 47, no. 11, pp. 1633–1655, Nov. 2017, doi: 10.1002/spe.2490.
[8] A. Dutta, S. S. Srivastava, S. Godboley, and D. P. Mohapatra, “Combi-FL: Neural network and SBFL based fault localization using mutation analysis,” J Comput Lang, vol. 66, p. 101064, Oct. 2021, doi: 10.1016/J.COLA.2021.101064.
[9] H. L. Ribeiro, P. A. R. de Araujo, M. L. Chaim, H. A. de Souza, and F. Kon, “Evaluating data-flow coverage in spectrum-based fault localization,” in 2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), 2019, pp. 1–11.
[10] S. Pearson et al., “Evaluating and improving fault localization,” in 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), May 2017, pp. 609–620. doi: 10.1109/ICSE.2017.62.
[11] G. Candea and P. Godefroid, “Automated software test generation: some challenges, solutions, and recent advances,” 2019, pp. 505–531. doi: 10.1007/978-3-319-91908-9_24.
[12] P. Ammann and J. Offutt, Introduction to software testing. Cambridge: Cambridge University Press, 2016. doi: DOI: 10.1017/9781316771273.
[13] E. Nikravan and S. Parsa, “Improving dynamic domain reduction test data generation method by Euler/Venn reasoning system,” Software Quality Journal, vol. 28, no. 2, pp. 823–851, Jun. 2020, doi: 10.1007/s11219-019-09471-4.
[14] F. Belli, M. Beyazıt, A. T. Endo, A. Mathur, and A. Simao, “Fault domain-based testing in imperfect situations: a heuristic approach and case studies,” Software Quality Journal, vol. 23, no. 3, pp. 423–452, Sep. 2015, doi: 10.1007/s11219-014-9242-6.
[15] P. Godefroid, N. Klarlund, and K. Sen, “DART: directed automated random testing,” ACM SIGPLAN Notices, vol. 40, no. 6, pp. 213–223, Jun. 2005, doi: 10.1145/1064978.1065036.
[16] L. de Moura and N. Bjørner, “Z3: An efficient SMT solver,” 2008, pp. 337–340. doi: 10.1007/978-3-540-78800-3_24.
[17] K. Luckow et al., “JDart: A dynamic symbolic analysis framework,” 2016, pp. 442–459. doi: 10.1007/978-3-662-49674-9_26.
[18] B. Korel and J. Laski, “Dynamic program slicing,” Inf Process Lett, vol. 29, no. 3, pp. 155–163, Oct. 1988, doi: 10.1016/0020-0190(88)90054-3.
[19] C. Hammacher, K. Streit, S. Hack, and A. Zeller, “Profiling Java programs for parallelism,” in 2009 ICSE Workshop on Multicore Software Engineering, May 2009, pp. 49–55. doi: 10.1109/IWMSE.2009.5071383.
[20] G. K. Baah, A. Podgurski, and M. J. Harrold, “Causal inference for statistical fault localization,” in Proceedings of the 19th international symposium on Software testing and analysis - ISSTA ’10, 2010, p. 73. doi: 10.1145/1831708.1831717.
[21] G. K. Baah, A. Podgurski, and M. J. Harrold, “Mitigating the confounding effects of program dependences for effective fault localization,” in Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering - SIGSOFT/FSE ’11, 2011, p. 146. doi: 10.1145/2025113.2025136.
[22] H. Li, Y. Liu, Z. Zhang, and J. Liu, “Program structure aware fault localization,” in Proceedings of the International Workshop on Innovative Software Development Methodologies and Practices, Nov. 2014, pp. 40–48. doi: 10.1145/2666581.2666593.
[23] X. Zhang, N. Gupta, and R. Gupta, “Locating faults through automated predicate switching,” in Proceedings of the 28th international conference on Software engineering, May 2006, pp. 272–281. doi: 10.1145/1134285.1134324.
[24] N. Bayati Chaleshtari and S. Parsa, “SMBFL: slice-based cost reduction of mutation-based fault localization,” Empir Softw Eng, vol. 25, no. 5, pp. 4282–4314, 2020, doi: 10.1007/s10664-020-09845-4.
[25] D. Jeffrey, N. Gupta, and R. Gupta, “Fault localization using value replacement,” in Proceedings of the 2008 international symposium on Software testing and analysis - ISSTA ’08, 2008, p. 167. doi: 10.1145/1390630.1390652.
[26] J. A. Jones and M. J. Harrold, “Empirical evaluation of the tarantula automatic fault-localization technique,” in Proceedings of the 20th IEEE/ACM international Conference on Automated software engineering  - ASE ’05, 2005, p. 273. doi: 10.1145/1101908.1101949.
[27] ben Liblit, Cooperative Bug Isolation, vol. 4440. Berlin, Heidelberg: Springer Berlin Heidelberg, 2007. doi: 10.1007/978-3-540-71878-9.
[28] T. Chen, X. Zhang, S. Guo, H. Li, and Y. Wu, “State of the art: dynamic symbolic execution for automated test generation,” Future Generation Computer Systems, vol. 29, no. 7, pp. 1758–1773, Sep. 2013, doi: 10.1016/j.future.2012.02.006.
[29] F. Feyzi and S. Parsa, “A program slicing-based method for effective detection of coincidentally correct test cases,” Computing, vol. 100, no. 9, pp. 927–969, Sep. 2018, doi: 10.1007/s00607-018-0591-z.
[30] A. Bandyopadhyay, “Mitigating the effect of coincidental correctness in spectrum based fault localization,” in 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation, Apr. 2012, pp. 479–482. doi: 10.1109/ICST.2012.130.
[31] Y. MIAO, Z. CHEN, S. LI, Z. ZHAO, and Y. ZHOU, “A clustering-based strategy to identify coincidental correctness in fault localization,” International Journal of Software Engineering and Knowledge Engineering, vol. 23, no. 05, pp. 721–741, Jun. 2013, doi: 10.1142/S0218194013500186.
[32] X. Wang, S. C. Cheung, W. K. Chan, and Z. Zhang, “Taming coincidental correctness: coverage refinement with context patterns to improve fault localization,” in 2009 IEEE 31st International Conference on Software Engineering, 2009, pp. 45–55. doi: 10.1109/ICSE.2009.5070507.
[33] J.-F. Bergeretti and B. A. Carré, “Information-flow and data-flow analysis of while-programs,” ACM Transactions on Programming Languages and Systems, vol. 7, no. 1, pp. 37–61, Jan. 1985, doi: 10.1145/2363.2366.
[34] N. Tsantalis and A. Chatzigeorgiou, “Identification of extract method refactoring opportunities for the decomposition of methods,” Journal of Systems and Software, vol. 84, no. 10, pp. 1757–1782, Oct. 2011, doi: 10.1016/j.jss.2011.05.016.
[35] E. Alpaydin, Introduction to machine learning, 4th edition. MIT Press, 2020. Accessed: Jul. 24, 2022. [Online]. Available: https://mitpress.mit.edu/books/introduction-machine-learning-fourth-edition
[36] X. Mao, Y. Lei, Z. Dai, Y. Qi, and C. Wang, “Slice-based statistical fault localization,” Journal of Systems and Software, vol. 89, pp. 51–62, Mar. 2014, doi: 10.1016/j.jss.2013.08.031.
[37] F. Feyzi and S. Parsa, “Inforence: effective fault localization based on information-theoretic analysis and statistical causal inference,” CoRR, vol. abs/1712.0, Dec. 2017, doi: 10.1007/s11704-017-6512-z.
[38] D. G. Kleinbaum and M. Klein, Logistic regression. New York, NY: Springer New York, 2010. doi: 10.1007/978-1-4419-1742-3.
[39] R. Just, D. Jalali, and M. D. Ernst, “Defects4J: a database of existing faults to enable controlled testing studies for Java programs,” in Proceedings of the 2014 International Symposium on Software Testing and Analysis - ISSTA 2014, 2014, pp. 437–440. doi: 10.1145/2610384.2628055.
[40] D. Zou, J. Liang, Y. Xiong, M. D. Ernst, and L. Zhang, “An empirical study of fault localization families and their combinations,” IEEE Transactions on Software Engineering, vol. 47, no. 2, pp. 332–347, Feb. 2021, doi: 10.1109/TSE.2019.2892102.
[41] W. E. Wong, V. Debroy, R. Gao, and Y. Li, “The DStar method for effective software fault localization,” IEEE Trans Reliab, vol. 63, no. 1, pp. 290–308, Mar. 2014, doi: 10.1109/TR.2013.2285319.