Word frequency analyzer

Mira Hello. I've been tasked with pulling insights from the mission logs and there are hundreds of them. I need something that takes a text file, counts how often each word appears, and shows me the top results. I want to find out what topics keep coming up without reading everything manually.

What you're building

Enter text, or a filename to read from: sample.txt

Top 10 words:
  the       42
  and       31
  python    18
  is        16
  you       14
  ...

What you'll need

Strings — splitting text into words, stripping punctuation, lowercasing
Dictionaries — counting how many times each word appears
Lists — sorting and slicing the top results
Files and exceptions — reading from a text file
Lambda, comprehensions, and zip — list comprehensions and sorted() with a key work well here

Hints

Normalise before counting. Lowercase everything and strip punctuation before building your count. Otherwise "Python" and "python" and "Python," all count as different words.

A dictionary does the counting. Loop through the words. If the word is already a key, increment its count. If it isn't, add it with a count of 1. .get() with a default value makes this neat.

Sorting a dictionary by value. sorted() accepts a key= argument. Pass a lambda that returns the value for each key to sort by frequency.

Going further

Once the core analysis works:

Stop words. Ignore common words like "the", "and", "is". Define a set of stop words and skip any word that appears in it.
Configurable top N. Let the user specify how many results to show instead of always showing 10.
Visual output. Print each word with a bar made of repeated characters proportional to its count. Even a simple version makes the output much more readable.

Word frequency analyzer ​

What you're building ​

What you'll need ​

Hints ​

Going further ​

Word frequency analyzer

What you're building

What you'll need

Hints

Going further