Project #30786 - Python Programing

Only basics,    the code may not fully fanction but i need deep explanation

dont use specia packeges such as operator/re/os


the idea here is to use sets (see and dictionaries, and some simple statistics to characterize texts by different authors.

For example, say you wanted to characterize the plays written by Shakespeare and the stories written by Melville. You might choose a sample of each. For example,

  1. For Shakespeare you might choose 3 plays: Macbeth, Othello and All's Well that Ends Well.
  2. For Milton you might choose Moby Dick, Bartleby and Omoo.

You can find these texts on the Internet. For example, take a look at, and You may find additional sites. What you will want to do is to create files of plain text from these sites in your own directory. For web pages, I use a command-line editor, but the Gutenberg set usually has pure (UTF8) text files.

So, then you might characterize these files by some simple statistics. For example, you might characterize the Shakespeare texts by the words that appear a certain number of times (as a percentage of the total number of unique words) in the Shakespeare plays but under some percentage in the Melville texts. You will have to experiment to determine these percentages.

Then use these characterizations to decide among, say 10 different files, which contain works of Shakespeare and which contain works of Melville. These 10 works can be found on the Internet and saved as files, say file1.txt, …, file10.txt. See if you can use the characterizations (or vocabulary signatures) in this way to identify authors.

Feel free to modify the parameters of this project so long as you at least try this simple characterization.

You may try additional tasks. For example you might work with a larger set of authors. You might try categorizing scientific articles as to their field or sub-fields.


code that can be modify

 def byFreq(pair):

return pair[1]

def main():
    print("This program analyzes word frequency in a file")
    print("and prints a report on the n most frequent words.\n")

    # get the sequence of words from the file
    fname = input("File to analyze: ")
    text = open(fname,'r').read()
    text = text.lower()
    for ch in '!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~':
        text = text.replace(ch, ' ')
    words = text.split()

    # construct a dictionary of word counts
    counts = {}
    for w in words:
        counts[w] = counts.get(w,0) + 1

    # output analysis of n most frequent words.
    n = eval(input("Output analysis of how many words? "))
    items = list(counts.items())
    items.sort(key=byFreq, reverse=True)
    print("The number of unique words in", fname, "is", len(counts), ".")
    for i in range(n):
        word, count = items[i]
        print("{0:<15}{1:>5}".format(word, count))

if __name__ == '__main__':  main()

Subject Computer
Due By (Pacific Time) 05/14/2014 12:00 am
Report DMCA

Chat Now!

out of 1971 reviews

Chat Now!

out of 766 reviews

Chat Now!

out of 1164 reviews

Chat Now!

out of 721 reviews

Chat Now!

out of 1600 reviews

Chat Now!

out of 770 reviews

Chat Now!

out of 766 reviews

Chat Now!

out of 680 reviews
All Rights Reserved. Copyright by - Copyright Policy