Clustering Analysis of Disease Outbreaks

Extracted disease and location data from unstructured text data of 650 news headlines with GeoNamesCache. Clustered with DBSCAN using implementation of Great Circle Distance to account for earth’s curvature. Identified 3 potential budding pandemics by analyzing prevalence of diseases in 11 clusters.

Features