Clustering Analysis of Disease Outbreaks
Extracted disease and location data from unstructured text data of 650 news headlines with GeoNamesCache. Clustered with DBSCAN using implementation of Great Circle Distance to account for earth’s curvature. Identified 3 potential budding pandemics by analyzing prevalence of diseases in 11 clusters.
Features
- Disease and location data extraction from unstructured text
- GeoNamesCache for geolocation data
- DBSCAN clustering with Great Circle Distance
- Pandemic identification analysis