Research Output
Exploiting various word embedding models for query expansion in microblog
  Microblogs, especially Twitter, make it easier to communicate with others in a real-time manner and is treated as a valuable information source. With the increasing amount of tweets, it would be fascinating to be able to extract essential information out of those diverse tweets. However, due to the length constraint in Twitter, users typically use unfamiliar short forms, ambiguous expressions, Twitter-specific syntaxes, and URLs to convey their brief thoughts. All of these aspects incur the severe vocabulary mismatch problem and make it difficult to perform effective information retrieval (IR) on Twitter. In this paper, we propose a query expansion method that ameliorates the initial queries with expansion terms which reflects the user’s intent effectively. To select the effective candidate expansion terms, we exploit the various word embedding models including Word2Vec, GloVe, and fastText that are trained with the different local and external corpus. Our ensemble word embedding approach helps to extract the effective contextual features of terms. Next, we ranked the candidate terms based on the mean cosine similarity score of each query-term pair and use the top-ranked terms to augment the initial query. We have performed the experiments on TREC Microblog 2011-2012 test sets covering TREC Tweets2011 corpora. Experimental results exhibit the efficacy of our query expansion method over the other competitive approaches.

Citation

Ahmed, S., Chy, A. N., & Ullah, M. Z. (2020, December). Exploiting various word embedding models for query expansion in microblog. Presented at 2020 IEEE 8th R10 Humanitarian Technology Conference (R10-HTC), Kuching, Malaysia

Authors

Keywords

microblog search, query expansion, contextual information, Word2Vec, GloVe, fastText

Monthly Views:

Available Documents