Skip to content

Word frequency analyzer

Mira Hello. I've been tasked with pulling insights from the mission logs and there are hundreds of them. I need something that takes a text file, counts how often each word appears, and shows me the top results. I want to find out what topics keep coming up without reading everything manually.

What you're building

Enter text, or a filename to read from: sample.txt

Top 10 words:
  the       42
  and       31
  python    18
  is        16
  you       14
  ...

What you'll need

Hints

Normalise before counting. Lowercase everything and strip punctuation before building your count. Otherwise "Python" and "python" and "Python," all count as different words.

A dictionary does the counting. Loop through the words. If the word is already a key, increment its count. If it isn't, add it with a count of 1. .get() with a default value makes this neat.

Sorting a dictionary by value. sorted() accepts a key= argument. Pass a lambda that returns the value for each key to sort by frequency.

Going further

Once the core analysis works:

  • Stop words. Ignore common words like "the", "and", "is". Define a set of stop words and skip any word that appears in it.
  • Configurable top N. Let the user specify how many results to show instead of always showing 10.
  • Visual output. Print each word with a bar made of repeated characters proportional to its count. Even a simple version makes the output much more readable.