The first two years of medical school, condensed to word cloud form

George Marzloff
4 min readDec 18, 2016


There is a saying that learning in medical school is like “drinking from a firehose.” I have also heard estimates that one learns [insert some large number here] of new words in medical school, but they were never substantiated by data. While in school at Ross University, I started thinking about ways to analyze what I had learned and to answer “What is medical school like?” asked by family and friends on holiday breaks. I realized I could use my own lecture notes as a data source.


In 2010, I developed a flashcards website to review key facts from every lecture I received in school. I wanted to make it simple to maintain due to lack of time during the semester, so I took notes as text files during every lecture, where each line represented a flashcard question and answer. The web application parsed each text file into flashcards. By skipping the intermediate step of taking traditional lecture notes and translating them to flashcards, my studying was more efficient.

At the end of the first semester, I wrote a script to count the frequencies of words in all the text files created over the past few months. The distribution had a long tail, so I filtered out prepositions and other non-medical words and took the top 250 words to represent the material I had learned. By assigning each word’s size and color shade as a function of the frequency of its appearance, I generated a simple type of word cloud using HTML and CSS and saved the rendering to an image using Peter ColesScreen Capture extension for Chrome.

Here is the output from the first semester:

Medical School, Semester 1. Larger image here

I repeated the process for Semester 2:

Medical School, Semester 2. Larger image here

Generally, the first year of school focuses on the fundamentals of what is “normal” in the human body. The second year focuses on pathology, or the “abnormal.”

The following is Semester 3’s cloud with the top 500 terms. One can see high-frequency words such as cancer, deficiency, disease, infectious, fever, and tumor which reflect the focus on disease and pathology in this year.

Medical School, Semester 3. Larger image here

Semester 4 content continues to focus on the clinical presentations of diseases:

Medical School, Semester 4. Larger image here

There are some limitations in the analysis. Because the source of the words were my own notes, the lecture content had already been filtered into what I thought was most important. Extracting the words directly from all the Powerpoint files from the lectures would have been a more accurate approach, but parsing through those files would have taken significantly longer. Second, I consolidated the same words with different suffixes (such as plurals, or adjective forms of a noun) to simplify the word list without disrupting the overall frequency pattern of the concepts.

The goal of this analysis was to provide simple insight into what is learned in the first two years of medical school. Once the basic science “book” years are complete, the next two years involve clinical rotations working in hospitals learning from patients, residents in training and attending physicians. Upon finishing medical school in the U.S.-based system, students graduate as doctors of medicine (M.D.) or doctors of osteopathic medicine (D.O.) and begin further training as residents, building on the fundamental concepts from school to learn the skills needed for their careers.

George Marzloff, M.D. is a resident physician in the Department of Rehabilitation Medicine at the Icahn School of Medicine at Mount Sinai. He graduated from Ross University School of Medicine in 2014.



George Marzloff

Physician in Spinal Cord Injury & Physical Medicine and Rehab @ Rocky Mountain Regional VAMC, Colorado. Interests: Rehab Engineering & software development