Snowmass 2021 LOIs

Exploring the Snowmass 2021 LOI's graphically

View the Project on GitHub gordonwatts/snowmass-loi-words

Statistics

Some basic statistics:

Calculating similarity

A word-vector for each document is made. From that, simple vector distance between documents is calculated. The closer the vector distance, the more common words are used. When documents are identical, of course, their vector distance is one. If you calculate the 2x2 matrix for the similarity between all documents (and ignore the diagonal), you get the following:

Document Similarity

Everything to the right of the duplicates line is classified as being identical below. And everything to the right of the updates line is classified as being an update.

Duplicates

These LOI’s are duplicates of each other.

Updates

These LOI’s might be updates of each other