[1] Rajpurkar P, Zhang J, Lopyrev K, Liang P. SQuAD: 100,000+ Questions for Machine Comprehension of Text. InProceedings of the 2016 Conference on Empirical Methods in Natural Language Processing 2016 Nov (pp. 2383-2392).
[2] Joshi M, Choi E, Weld DS, Zettlemoyer L. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. InProceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2017 Jul (pp. 1601-1611).
[3] Dunn M, Sagun L, Higgins M, Guney VU, Cirik V, Cho K. Searchqa: A new q&a dataset augmented with context from a search engine. arXiv preprint arXiv:1704.05179. 2017 Apr 18.
[4] Welbl J, Stenetorp P, Riedel S. Constructing Datasets for Multi-hop Reading Comprehension Across Documents. Transactions of the Association for Computational Linguistics.2018 ;6:287-302.
[5] Talmor A, Berant J. Repartitioning of the complexwebquestions dataset. arXiv preprint arXiv:1807.09623. 2018 Jul 25.
[6] Talmor A, Herzig J, Lourie N, Berant J. CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) 2019 Jun (pp. 4149-4158).
[7] Dua D, Wang Y, Dasigi P, Stanovsky G, Singh S, Gardner M. DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) 2019 Jun (pp. 2368-2378).
[8] Khashabi D, Chaturvedi S, Roth M, Upadhyay S, Roth D. Looking beyond the surface: A challenge set for reading comprehension over multiple sentences. InProceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) 2018 Jun (pp. 252-262).
[9] Dalvi B, Huang L, Tandon N, Yih WT, Clark P. Tracking State Changes in Procedural Text: a Challenge Dataset and Models for Process Paragraph Comprehension. InProceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) 2018 Jun (pp. 1595-1604).
[10] Weston J, Bordes A, Chopra S, Rush AM, Van Merriënboer B, Joulin A, Mikolov T. Towards AI-complete question answering: A set of prerequisite toy tasks. In4th International Conference on Learning Representations, ICLR 2016 2016.
[11] Yang Z, Qi P, Zhang S, Bengio Y, Cohen W, Salakhutdinov R, Manning CD. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing 2018 (pp. 2369-2380).
[12] Chen W, Zha H, Chen Z, Xiong W, Wang H, Wang WY. HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data. InFindings of the Association for Computational Linguistics: EMNLP 2020 2020 Nov (pp. 1026-1036).
[13] Mihaylov T, Clark P, Khot T, Sabharwal A. Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing 2018 (pp. 2381-2391).
[14] Clark C, Gardner M. Simple and Effective Multi-Paragraph Reading Comprehension. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2018 Jul (pp. 845-855).
[15] Qiu L, Xiao Y, Qu Y, Zhou H, Li L, Zhang W, Yu Y. Dynamically fused graph network for multi-hop reasoning. InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics 2019 Jul (pp. 6140-6150).
[16] Kenton JD, Toutanova LK. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of naacL-HLT 2019 Jun 2 (Vol. 1, p. 2).
[17] Tu M, Huang K, Wang G, Huang J, He X, Zhou B. Select, answer and explain: Interpretable multi-hop reading comprehension over multiple documents. InProceedings of the AAAI conference on artificial intelligence 2020 Apr 3 (Vol. 34, No. 05, pp. 9073-9080).
[18] Zheng C, Kordjamshidi P. SRLGRN: Semantic Role Labeling Graph Reasoning Network. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020 Nov (pp. 8881-8891).
[19] Nishida K, Nishida K, Nagata M, Otsuka A, Saito I, Asano H, Tomita J. Answering while Summarizing: Multi-task Learning for Multi-hop QA with Evidence Extraction. InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics 2019 Jul (pp. 2335-2345).
[20] Min S, Zhong V, Zettlemoyer L, Hajishirzi H. Multi-hop Reading Comprehension through Question Decomposition and Rescoring. InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics 2019 Jul (pp. 6097-6109).
[21] Wu, B., Zhang, Z., & Zhao, H. (2021). Graph-free multi-hop reading comprehension: A select-to-guide strategy. arXiv
preprint arXiv:2107.11823.
[22] He, Y., Gorinski, P. J., Staliunaite, I., & Stenetorp, P. (2023). Graph Attention with Hierarchies for Multi-hop Question Answering. arXiv preprint arXiv:2301.11792..
[23] Zhangyue, Y., Yuxin, W., Xiannian, H., Yiguang, W., Hang, Y., Xinyu, Z., ... & Xipeng, Q. (2023, August). Rethinking Label Smoothing on Multi-hop Question Answering. In Proceedings of the 22nd Chinese National Conference on Computational Linguistics (pp. 611-623).
[24] He P, Liu X, Gao J, Chen W. DEBERTA: DECODING-ENHANCED BERT WITH DISENTANGLED ATTENTION. In 9th International Conference on Learning Representations 2021 May.
[25] Kipf TN, Welling M. Semi-Supervised Classification with Graph Convolutional Networks. 5th International Conference on Learning Representations, ICLR 2017 April.
[26] Hamilton W, Ying Z, Leskovec J. Inductive representation learning on large graphs. Advances in neural information processing systems. 2017;30.