This is like a pattern recognition where you take the whole story of the article (can check News1.csv, column E, ‘content’). Then, the Features Extraction Programme (Python) can recognise and take only important keypoints. I have my codes (main.py) where it can only take: 1) Criminal Case [Example Theft, Homicide, Gambling, Smuggling etc in “crime word dictionary.txt”] 2) Location (in location.txt) 3) Date (Exact date of the crime happening based on the news) 4) Gender: Usually for Malay names When the name has ‘Bin’ means the suspect is a Male. When the name has ‘Bte’ means the suspect is a Female: Kasim bin Omar = Male Ahmad Razali bin Ismail = Male Siti Noraliza bte Haji Abdullah = Female 5) Fine being accuse: $ 500 or one month jail, $ 20 000 Brunei Dollar and 12 months jail etc.
1) As you can see, each article content the word “Homicide” but they were used for different
meaning. Article A is only a description. Article B is only mentioning. Article C is the real
situation where a person committed a crime homicide. What I need is only to extract the
Homicide case on Article C where it is really happening unlike Article A and Article B.
But for this one, you must use the file “News1.csv”, column E, ‘Content’” . I need you to
make a pattern recognition which can be apply for all the content in the News1.csv where
incase if like features a1 is happening, it only extract the article which is really happening.
Basically, the features extraction programme (python) need to understand the situation of
the article either it is really doing the crime (then extract this content) or the article only
mentioning the crime which no crime being done here (then this one we don’t want to
extract). This should done for all different crime types, theft, smuggling etc.
IMPORTANT: If you can use NLTK for this in the code, that might be okay.
Get Free Quote!
449 Experts Online