A Novel Hybrid Approach to Finding Meaningful Basis Vectors For Explicit Representation of Word Vectors

Document Type : Persian Original Article

Authors

1 School of Computer Engineering, Iran University of Science and Technology, Tehran, Iran.

2 Associate Professor of Computer Engineering. Iran University of Science and technology

Abstract

The main purpose of this study is to represent the semantic word vectors with low dimensions, explicitly. The problem of finding a limited number of meaningful basis vectors for producing explicit semantic word vectors must be solved in such a way that a large accuracy drop is not caused by reducing the dimensions. In this study, we represent a hybrid approach to finding meaningful basis vectors. First, we obtain N basis vectors using the proposed methods: 1- The criterion of word similarity-to-word frequency ratio, 2- Feature selection method based on comparison of distance matrices, 3- Binary weighting method based on PSO algorithm. Then, to take advantage of the expertise of methods 1 and 2 to the same extent, we obtain the first combined basis vectors by combining half of the basis vectors obtained by the criterion of word similarity-to-word frequency ratio with half of the basis vectors selected by the feature selection method. In the next step, we obtain the common context words that have a weight "1" as the common basis vectors produced by the binary weighting method. In the next step, we add the common context words with a weight "1" obtained using the BPSO method to the first combined basis vectors obtained from word similarity-to-word frequency ratio and the feature selection methods. Thus, the second combined basis vectors are obtained, which are meaningful, and each basis vector is equivalent to an informative context word. Therefore, the explicit word vectors produced by meaningful basis vectors can be interpreted. We train the proposed approach using the UkWaC corpus and evaluate it using the word similarity task. Both first and second combined basis vectors improve accuracy. The increase in accuracy is greater in the first combined basis vectors. The evaluation results of explicit word vectors obtained with the first basis vectors show that despite the reduction of word vector dimensions from 5000 to 1511, the Spearman correlation coefficient on MEN, RG-65, and SimLex-999 test sets is increased by 2.47%, 7.39%, and 0.52%, respectively.

Keywords


[1] Lenci, "Distributional models of word meaning", Annual review of Linguistics 4, pp.151-171, 2018.
[2] Salton, A. Wong, CS. Yang, "A vector space model for automatic indexing", Communications of the ACM 18.1, pp. 613-620, 1975.
[3] Grefenstette, "Category-theoretic quantitative compositional distributional models of natural language semantics", arXiv preprint arXiv, pp. 1311-1539, 2013.
[4] Gamallo, "Comparing explicit and predictive distributional semantic models endowed with syntactic contexts", Language Resources and Evaluation 51.3, pp.727-743, 2017
[5] BaroniG. DinuG. Kruszewski, "Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors", 52nd Annual Meeting of the Association for Computational Linguistics, Volume 1: Long Papers, 2014.
[6] LiT. Yang, "Word embedding for understanding natural language: a survey", Guide to big data applications, Springer, Cham, pp. 83-104, 2018.
[7] Goddard, "Semantic analysis: A practical introduction", Oxford University Press, 2011.
[8] HuangR. SocherCD. ManningAY. Ng, "Improving word representations via global context and multiple word prototypes", 50th Annual Meeting of the Association for Computational Linguistics, Volume 1: Long Papers, 2012.
[9] Andreas, D. Klein, "How much do word embeddings encode about syntax? ", 52nd Annual Meeting of the Association for Computational Linguistics, Volume 2: Short Papers, 2014.
[10] Pota, F. Marulli, M. Esposito, G. De Pietro, H. Fujita, "Multilingual POS tagging by a composite deep architecture based on character-level features and on-the-fly enriched word embeddings", Knowledge-Based Systems, 164, pp.309-323, 2019.
[11] Rezaeinia, R. Rahmani, A. Ghodsi, H. Veisi, "Sentiment analysis based on improved pre-trained word embeddings", Expert Systems with Applications, 117, pp.139-147, 2019.
[12] Ali, D. Kwak, P. Khan, S. El-Sappagh, A. Ali, S. Ullah, K.H. Kim and K.S. Kwak, "Transportation sentiment analysis using word embedding and ontology-based topic modeling", Knowledge-Based Systems, 174, pp.27-42, 2019.
[13] O. Deho, A.W. Agangiba, L.F. Aryeh and A.J. Ansah, "Sentiment analysis with word embedding", In 2018 IEEE 7th International Conference on Adaptive Science & Technology, pp. 1-4, IEEE, 2018.
[14] Nozza, P. Manchanda, E. Fersini, M. Palmonari and E. Messina, "LearningToAdapt with word embeddings: Domain adaptation of Named Entity Recognition systems", Information Processing & Management, 58(3), 102537, 2021.
[15] Zhang, M. Tuo, Q. Yin, L. Qi, X. Wang, T. Liu, "Keywords extraction with deep neural network model", Neurocomputing, 383, pp. 113-121, 2020.
[16] Sundermann, J., Antunes, M. Domingues, and S. Rezende, "Exploration of word embedding model to improve context-aware recommender systems", In 2018 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 383-388, IEEE, 2018.
[17] Khattar, V. Kumar, V. Varma, M. Gupta, "Weave&rec: A word embedding based 3-d convolutional network for news recommendation", 27th ACM International Conference on Information and Knowledge Management, pp. 1855-1858, 2018.
[18] Bagheri, F. Ensan, F. Al-Obeidat, "Neural word and entity embeddings for ad hoc retrieval", Information Processing & Management, 54(4), pp.657-673, 2018.
[19] Dobó,  Csirik, "A comprehensive study of the parameters in the creation and comparison of feature vectors in distributional semantic models", Journal of Quantitative Linguistics, 27.3, pp. 244-271, 2020.
[20] Heunen, M. Sadrzadeh, E. Grefenstette, "Quantum physics and linguistics: a compositional, diagrammatic discourse", Oxford University Press, 2013.
[21] Kartsaklis, "Compositional operators in distributional semantics", Springer Science Reviews, 2.1-2, pp. 161-177, 2014.
[22] Levy, Y. Goldberg, I. Dagan, "Improving distributional similarity with lessons learned from word embeddings", Transactions of the Association for Computational Linguistics, 3, pp. 211-225, 2015.
[23] Bullinaria, JP. Levy, "Extracting semantic representations from word co-occurrence statistics: A computational study", Behavior research methods, 39.3, pp. 510-526, 2007.
[24] Biemann, M. Riedl, "Text: Now in 2D! a framework for lexical expansion with contextual similarity", Journal of Language Modelling, 1.1, pp. 55-95, 2013.
[25] M Padró, M Idiart, A Villavicencio, C. Ramisch, "Nothing like good old frequency: Studying context filters for distributional thesauri", Conference on Empirical Methods in Natural Language Processing, 2014.
[26] Gamallo, S. Bordag, "Is singular value decomposition useful for word similarity extraction?", Language resources and evaluation 45.2, pp. 95-119, 2011.
[27] Hofmann, AM. Jacobs, "Interactive activation and competition models and semantic context: from behavioral to brain data", Neuroscience & Biobehavioral Reviews, 46, pp. 85-104, 2014.
[28] Mikolov, K. Chen, G. Corrado, J. Dean, "Efficient estimation of word representations in vector space", arXiv preprint arXiv, 1301.3781, 2013.
[29] Mikolov, I. Sutskever, K. Chen, GS. Corrado, J. Dean, "Distributed representations of words and phrases and their compositionality", arXiv preprint arXiv, 1310.4546, 2013.
[30] Panigrahi, HV. Simhadri, and C. Bhattacharyya, "Word2Sense: sparse interpretable word embeddings", 57th Annual Meeting of the Association for Computational Linguistics, 2019.
[31] Murphy, P. Talukdar, T. Mitchell, "Learning effective and interpretable semantic models using non-negative sparse embedding", COLING, 2012.
[32] Sun, J. Guo, Y. Lan, J. Xu, X. Cheng, "Sparse word embeddings using l1 regularized online learning", Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016.
[33] Faruqui, Y. Tsvetkov, D. Yogatama, C. Dyer, N. Smith, "Sparse overcomplete word vector representations", arXiv preprint arXiv, 1506.02004, 2015.
[34] Subramanian,  Pruthi, H. Jhamtani, T. Berg-Kirkpatrick, E. Hovy, "Spine: Sparse interpretable neural embeddings", AAAI Conference on Artificial Intelligence, Vol. 32, 2018.
[35] Bruni, G. Boleda, M. Baroni, NK. Tran, "Distributional semantics in technicolor", 50th Annual Meeting of the Association for Computational Linguistics, Volume 1: Long Papers, 2012.
[36] Rubenstein, JB. Goodenough, "Contextual correlates of synonymy", Communications of the ACM, 8.10, pp. 627-633, 1965.
[37] Hill, R. Reichart, A. Korhonen, "Simlex-999: Evaluating semantic models with (genuine) similarity estimation", Computational Linguistics, 41.4, pp. 665-695, 2015.
[38] Yang, C. Shahabi, "A PCA-based similarity measure for multivariate time series", 2nd ACM international workshop on Multimedia databases. 2004.
[39] Kennedy, R. Eberhart, "Particle swarm optimization", ICNN'95-international conference on neural networks, Vol. 4, IEEE, 1995.
[40] Garg, "A hybrid PSO-GA algorithm for constrained optimization problems", Applied Mathematics and Computation, 274, pp. 292-305, 2016.
[41] Kennedy, RC. Eberhart, "A discrete binary version of the particle swarm algorithm", Computational cybernetics and simulation, Vol. 5, IEEE, 1997.
[42] El-Maleh, AT. Sheikh, SM. Sait, "Binary particle swarm optimization (BPSO) based state assignment for area minimization of sequential circuits", Applied soft computing, 13.12, pp. 4832-4840, 2013.
[43] Baroni, S. Bernardini, A. Ferraresi, E. Zanchetta, "The WaCky wide web: a collection of very large linguistically processed web-crawled corpora", Language resources and evaluation, 43.3, pp. 209-226, 2009.