create a function to save the resulting dictionary to a file and a function to read that file. The file should conform to the format in the instructions.

computer science

Description

For this first part you will have to create two python programs. One to read a text file and compute trigram, bigram and unigram probabilities with the existing words. Your program should be called n-trigrams.py and should be run like this


For each of the it should output it’s probability to the screen. Therefore, if there are 10 sentences in the file, it will output 10 probabilities. The sentences are separated by a period. There can be more than one sentence on each file


ne sentence on each file. The format of the file must be a tab separated file with two columns: The first column is the ngram. if a unigram, then just the word. If a bigram or trigram, the words separated by space. The second column is the count of that n-gram. Add STOP as a special word. This will allow you to know how many sentences are there in the corpus. 


Here are some steps to help you modularize your code 

• create a function that replaces punctuation and other noise (accents, tildes, newlines,etc.) in a string. End of sentence punctuation should be replaced with ; accents and tildes can be replaced with their corresponding English equivalent (this is optional). Newlines, commas and apostrophes should be replaced with a space. 

• create a function that reads a file. For each line read it should replace punctuation and then find unigrams, bigrams and trigrams and add them to a dictionary (you can use collections.Counter here. The key to the dictionary can be a string or a tuple For example: (word1,word2,word3) for trigrams. 

• create a function to save the resulting dictionary to a file and a function to read that file. The file should conform to the format in the instructions. 

• Lastly, create a function to compute the probability of any bigram present in the corpus.


Related Questions in computer science category