Skip to main content
  1. Data Science Blog/

Topic Modeling with BERT

·225 words·2 mins· loading · ·
Natural Language Processing (NLP) Data Analysis & Visualization AI/ML Models Language Models (LLMs) NLP Applications Text Analysis Natural Language Processing (NLP) Machine Learning Text Mining

Topic Modeling with BERT

Topic Modeling with BERT
#

Key steps in BERTopic modelling are as following.

  • Use “Sentence Embedding” models to embed the sentences of the article
  • Reduce the dimensionality of embedding using UMAP
  • Cluster these documents (reduced dimensions) using HDBSAN
  • Use c-TF-IDF extract keywords, their frequency and IDF for each cluster.
  • MMR: Maximize Candidate Relevance. How many words in a topic can represent the topic?
  • Intertopic Distance Map
  • Use similarity matrix (heatmap), dandogram (hierarchical map), to visualize the topics and key_words.
  • Traction of topic over time period. Some may be irrelevant and for other traction may be increasing or decreasing.

Installation
#

# Installation, with sentence-transformers, can be done using pypi:

pip install bertopic

# If you want to install BERTopic with other embedding models, you can choose one of the following:

# Choose an embedding backend
pip install bertopic[flair, gensim, spacy, use]

# Topic modeling with images
pip install bertopic[vision]

Supported Topic Modelling Techniques
#

BERTopic supports all kinds of topic modeling techniques as below.

  • Guided
  • Supervised
  • Semi-supervised
  • Manual
  • Multi-topic distributions
  • Hierarchical
  • Class-based
  • Dynamic
  • Online/Incremental
  • Multimodal
  • Multi-aspect
  • Text Generation/LLM
  • Merge Models

Related Resources#

Tools in BERTopic
#

Tools-in-BERTopic

Best Topic Modeling Tool in BERTopic
#

BEST-Tools-in-BERTopic

BERTopic Model Building
#

BERTopic-Model-Building

Application
#

  • arXiv Dataset (1.7m+ STEP papers)
  • Images/photographs
  • Historical Documents
  • News articles

Related

The AI Market Ecosystem
·1150 words·6 mins· loading
Artificial Intelligence Technology Trends & Future Societal Impact AI Industry AI Economics Technology Policy Market Analysis AI Ethics
The AI Market Ecosystem # Who the Players Are, Who Earns, Who Spends, and What It Means for Human …
Accuracy Is Not a Number: How Customers Misjudge AI Document Processing
·2628 words·13 mins· loading
Artificial Intelligence AI Applications Evaluation & Metrics Document AI OCR Enterprise AI Model Evaluation Accuracy Metrics
Accuracy Is Not a Number # How Customers Misjudge AI Document Processing Many enterprise AI …
Experimenting with Vertex AI: A Practical Guide from Account Setup to First Model Call
·4895 words·23 mins· loading
Cloud Computing Artificial Intelligence Language Models (LLMs) Vertex AI Google Cloud Platform Gemini GCP Vertex AI Studio Model Garden IAM MLOps
Experimenting with Vertex AI: A Practical Guide from Account Setup to First Model Call # 1. …
Cursor Chat: Architecture, Data Flow & Storage
·1318 words·7 mins· loading
Artificial Intelligence Developer Tools Software Architecture Cursor IDE Cursor Chat AI Code Editor SQLite Turbopuffer Codebase Indexing RAG Semantic Search Data Flow Local Storage Composer
Cursor Chat: Architecture, Data Flow & Storage # This document explains how Cursor chat works …
Safeguarding PII When Using LLMs in Alternative Investment Banking
·4261 words·21 mins· loading
Artificial Intelligence Financial Technology Data Security & Privacy PII Protection LLM Privacy Alternative Investment Banking BFSI Data Privacy AI Compliance Differential Privacy Federated Learning Financial AI Security
Safeguarding PII When Using LLMs in Alternative Investment Banking # 1. Introduction # The …