Discriminative bi-term topic model for headline-based social news clustering

Research Output

Social news are becoming increasingly popular. News organizations and popular journalists are starting to use social media more and more heavily for broadcasting news. The major challenge in social news clustering lies in the fact that textual content is only a headline, which is much shorter than the fulltext. Previous works showed that the bi-term topic model (BTM) is effective in modeling short text such as tweets. However, the drawback is that all non-stop terms are considered equally in forming the bi-terms. In this paper, a discriminative bi-term topic model ($d$-BTM) is presented, which tries to exclude less indicative bi-terms by discriminating topical terms from general and document-specific ones. Experiments on {TDT4} and Reuter-21578 show that using merely headlines, the $d$-BTM model is able to induce latent topics that are nearly as good as that are generated by LDA using news fulltext as evidence. The major contribution of this work lies in the empirical study on the reliability of topic modeling using merely news headlines.

Date:

07 April 2015
Publication Status:

Published
Funders:

Historic Funder (pre-Worktribe)

http://researchrepository.napier.ac.uk/output/1792890 <p>Xia, Y., Tang, N., Hussain, A., & Cambria, E. (2015). Discriminative bi-term topic model for headline-based social news clustering. In <i>Proceedings of the Twenty-Eighth International Florida Artificial Intelligence Research Society Conference</i>, (311-316)</p>

Citation

Xia, Y., Tang, N., Hussain, A., & Cambria, E. (2015). Discriminative bi-term topic model for headline-based social news clustering. In Proceedings of the Twenty-Eighth International Florida Artificial Intelligence Research Society Conference, (311-316)