Count the number of words using Python

Monday, June 1st, 2009
Advertisement

Subscribe.
Enter your email:

This article represents a way to count the number of words, the number of unique words, and the number of each word occurrences in a text file.

  1. Read the text file.
    text = open("filename.txt").read()
  2. Replace non-alphanumeric characters as a whitespace.
    import re
    text = re.sub('[^\w&^\d]', ' ', text)
  3. Change all characters to lowercase.
    text = text.lower()
  4. Split words into a list.
    text = text.split()
  5. Display the number for words in the text file.
    len(text)
  6. Display the number of unique words in the text file.
    len(set(text))
  7. Display the number of occurrences for each word.
  8. from collections import defaultdict
    
    wordsCount = defaultdict(int)
    for word in text:
      wordsCount[word] += 1
    
    for word, num in wordsCount.items():
    	print(word, num)
If you are new here, you might want to subscribe to the RSS feed or newsletter.

Enter your email address:

Creates the exact copy of your hard disk and allows you to instantly restore the entire machine.
New Acronis True Image Home 2010 is the most reliable and easy in use backup solution. Now with online backup option!
15% Discount Code: FMAATIH2010

What else?

Like this article? Share it

 Digg  del.icio.us  TwitThis  Facebook  Reddit  StumbleUpon

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>