COVID Authorship Community Detection with Machine Learning

Early grad-school experiment: mapped COVID-19 co-authorship to see how different clustering methods reveal the research network. Built a large collaboration graph, compared Infomap clusters to GraphSAGE embeddings, and visualized the differences.

Python Machine Learning Data Viz GraphSAGE Infomap t-SNE

What I explored

  • Parsed LitCovid data into a co-author network of ~300K authors and 2M+ edges using Pandas and NetworkX.
  • Ran Infomap for community labels and GraphSAGE embeddings + t-SNE/DBSCAN to see where methods align or diverge.
  • Visualized clusters to compare tight algorithmic communities versus looser embedding neighborhoods.

Takeaways

  • The two methods surface different shapes of the network—useful reminder to test multiple approaches before interpreting clusters.
  • Notebooks keep everything reproducible; see the GitHub repo.

Clustering visualization

t-SNE-enabled visualization of GraphSAGE embeddings with Informap colorization