NLP Analysis of Medical Questions
Analyzed structured text data from 23,000+ patient questions from 12 NIH sites with NLTK and Scikit-Learn. Reduced features with PCA and t-SNE, clustered with K-Means and DBSCAN, modeled topics with LDA. Found 3 topics of questions for Symptoms, Visits, and Disorders with potential for more granular subtopics.
Features
- NLTK and Scikit-Learn for text analysis
- PCA and t-SNE for feature reduction
- K-Means and DBSCAN for clustering
- LDA for topic modeling