This article represents a way to count the number of words, the number of unique words, and the number of each word occurrences in a text file.
- Read the text file.
text = open("filename.txt").read() - Replace non-alphanumeric characters as a whitespace.
import re text = re.sub('[^\w&^\d]', ' ', text) - Change all characters to lowercase.
text = text.lower() - Split words into a list.
text = text.split() - Display the number for words in the text file.
len(text) - Display the number of unique words in the text file.
len(set(text)) - Display the number of occurrences for each word.
from collections import defaultdict
wordsCount = defaultdict(int)
for word in text:
wordsCount[word] += 1
for word, num in wordsCount.items():
print(word, num)
