Exploiting various word embedding models for query expansion in microblog

Research Output

Microblogs, especially Twitter, make it easier to communicate with others in a real-time manner and is treated as a valuable information source. With the increasing amount of tweets, it would be fascinating to be able to extract essential information out of those diverse tweets. However, due to the length constraint in Twitter, users typically use unfamiliar short forms, ambiguous expressions, Twitter-specific syntaxes, and URLs to convey their brief thoughts. All of these aspects incur the severe vocabulary mismatch problem and make it difficult to perform effective information retrieval (IR) on Twitter. In this paper, we propose a query expansion method that ameliorates the initial queries with expansion terms which reflects the user’s intent effectively. To select the effective candidate expansion terms, we exploit the various word embedding models including Word2Vec, GloVe, and fastText that are trained with the different local and external corpus. Our ensemble word embedding approach helps to extract the effective contextual features of terms. Next, we ranked the candidate terms based on the mean cosine similarity score of each query-term pair and use the top-ranked terms to augment the initial query. We have performed the experiments on TREC Microblog 2011-2012 test sets covering TREC Tweets2011 corpora. Experimental results exhibit the efficacy of our query expansion method over the other competitive approaches.

Date:

31 December 2020
Publication Status:

Published
DOI:

10.1109/R10-HTC49770.2020.9357016
Funders:

Historic Funder (pre-Worktribe)

http://researchrepository.napier.ac.uk/output/3011061 <p>Ahmed, S., Chy, A. N., & Ullah, M. Z. (2020, December). <i>Exploiting various word embedding models for query expansion in microblog</i>. Presented at 2020 IEEE 8th R10 Humanitarian Technology Conference (R10-HTC), Kuching, Malaysia</p>

Citation

Ahmed, S., Chy, A. N., & Ullah, M. Z. (2020, December). Exploiting various word embedding models for query expansion in microblog. Presented at 2020 IEEE 8th R10 Humanitarian Technology Conference (R10-HTC), Kuching, Malaysia

Authors

Dr Md Zia Ullah

Lecturer
School of Computing Engineering and the Built Environment

0131 455 4761

M.Ullah@napier.ac.uk

Keywords

microblog search, query expansion, contextual information, Word2Vec, GloVe, fastText

Monthly Views:

Available Documents

Files currently unavailable for download , please contact repository@napier.ac.uk to request a copy
Downloadable citations
HTML BIB RTF

Date:

Publication Status:

DOI:

Funders:

Citation

Authors

Dr Md Zia Ullah

Keywords

Monthly Views:

Files currently unavailable for download , please contact repository@napier.ac.uk to request a copy

Downloadable citations