Research Output
A novel multiple kernel fuzzy topic modeling technique for biomedical data
  Background: Text mining in the biomedical field has received much attention and regarded as the important research area since a lot of biomedical data is in text format. Topic modeling is one of the popular methods among text mining techniques used to discover hidden semantic structures, so called topics. However, discovering topics from biomedical data is a challenging task due to the sparsity, redundancy, and unstructured format. Methods: In this paper, we proposed a novel multiple kernel fuzzy topic modeling (MKFTM) technique using fusion probabilistic inverse document frequency and multiple kernel fuzzy c-means clustering algorithm for biomedical text mining. In detail, the proposed fusion probabilistic inverse document frequency method is used to estimate the weights of global terms while MKFTM generates frequencies of local and global terms with bag-of-words. In addition, the principal component analysis is applied to eliminate higher-order negative effects for term weights. Results: Extensive experiments are conducted on six biomedical datasets. MKFTM achieved the highest classification accuracy 99.04%, 99.62%, 99.69%, 99.61% in the Muchmore Springer dataset and 94.10%, 89.45%, 92.91%, 90.35% in the Ohsumed dataset. The CH index value of MKFTM is higher, which shows that its clustering performance is better than state-of-the-art topic models. Conclusion: We have confirmed from results that proposed MKFTM approach is very efficient to handles to sparsity and redundancy problem in biomedical text documents. MKFTM discovers semantically relevant topics with high accuracy for biomedical documents. Its gives better results for classification and clustering in biomedical documents. MKFTM is a new approach to topic modeling, which has the flexibility to work with a variety of clustering methods.

  • Type:

    Article

  • Date:

    12 July 2022

  • Publication Status:

    Published

  • DOI:

    10.1186/s12859-022-04780-1

  • ISSN:

    1471-2105

  • Funders:

    Technology Development Program of MSS; National Research Foundation of Korea

Citation

Rashid, J., Kim, J., Hussain, A., Naseem, U., & Juneja, S. (2022). A novel multiple kernel fuzzy topic modeling technique for biomedical data. BMC Bioinformatics, 23(1), Article 275. https://doi.org/10.1186/s12859-022-04780-1

Authors

Keywords

Topic modeling, Medical data, Multiple kernel fuzzy topic modeling, MKFTM, Classification, Clustering

Monthly Views:

Available Documents
  • pdf

    A novel multiple kernel fuzzy topic modeling technique for biomedical data

    1MB

    Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

  • Downloadable citations

    HTML BIB RTF