The point: Data preparation and analysis, confusion matrices, correlation and feature selection are all important in realworld machine learning tasks. Data clustering and probabilistic data analysis are two core sets of methods in data mining and machine learning. So this coursework gives you experience with each of these things. The data set: The data set for the coursework is a sample from Stallkamp et al's German Street Sign Recognition Benchmark. Originally the data set consisted of 39,209 RGB-coloured train and 12,630 RGB-coloured test images of different sizes displaying 43 different types of German traffic signs. These images are not centred and are taken during different times of the day. This data set is considered to be an important benchmark for Computer Vision, as has close relation to the street sign recognition tasks that autonomous cars have to perform. And safe deployment of autonomous cars is the next big challenge that researchers and engineers face. You will be working with a sample of this data set which consists of 10 classes and 12660 images. The images have been converted to grey-scale with pixel values ranging from 0 to 255, and were rescaled to a common size of 48*48 pixels. Hence, each row (= feature vector) in the data set has 2305 features, and represents a single image in row-vector format (2304 features) plus its associated label. Compensating the light conditions and position of the images is not necessary for the coursework and is left for the interested student to do. Below, the class labels and their meanings are displayed: 0. speed limit 60 (original label: 3) 1. speed limit 80 (original label: 5) 2. speed limit 80 lifted (original label: 6) 3. right of way at crossing (original label: 11) 4. right of way in general (original label: 12) 5. give way (original label: 13) 6. stop (original label: 14) 7. no speed limit general (original label: 32) 8. turn right down (original label: 38) 9. turn left down (original label: 39)
Get Free Quote!
427 Experts Online