By the end of this assignment, you should have:
As we’ve discussed in class, trees are a fundamental data structure used to model all sorts of hierarchical data. Often, a tree represents a hierarchical categorization of a base set of data, where the leaves represent the data values themselves, and internal nodes represent groupings of this data. Files on a computer can be viewed this way: regular files (e.g., PDF documents, video files, Python source code) are grouped into folders, which are then grouped into larger folders, etc. Moreover, usually the data values are not all of equal “weight” or “importance.” For example, in our file system one of the most important properties of a regular file is its size; certainly not all files have the same size!
A treemap is a visualisation technique to show a tree’s structure according to the weights (or sizes) of its data values, using nested rectangles to show subtrees, scaled to different dimensions to reflect the proportional sizes of each piece of data. Treemaps are used in many contexts. Among the more popular uses are news headlines, various kinds of financial information, and computer disk usage. Here’s an example of a treemap visualisation of files on a computer, from Wikipedia:
Some free programs that use treemaps to visualise the size of files on your computer:
For this assignment, you will write an interactive treemap visualisation tool that you can use to visualise multiple types of hierarchical data. We will make the following simplifications for this assignment:
This assignment requires the pygame library, which you installed as part of Lab 7. We recommend working through the Additional Exercise of that lab to familiarize yourself with pygame.
Please download the following starter code files and save these into your assignments/a2 folder.
Sample testing files:
Your work for this assignment consists of writing two main components:
Please read through the following sections carefully to understand what is required for this program.
In order to use a treemap to visualise data, we first need the actual data to be stored in a tree. For this program, we have provided an AbstractTree class to define the common interface that our treemap visualiser will expect for any sort of data that we want to visualise.
This class is similar to the basic Tree class that you have already worked with, with a few differences. The two most conceptually important ones are discussed here, but there are others you’ll explore when you dig into the code.
First, the constructor is considered private, and should be overridden by each subclass. This is because these trees will be initialized through custom procedures that depend on the data we want to represent. (So a tree representing files on your computer will be created in a different way than a tree representing country population data from the World Bank. You’ll do both on this assignment!)
Second, the class has a data_size instance attribute that is used by the treemap algorithm to visualise the tree. data_size is defined recursively as follows:
This is a little abstract. Let’s now discuss one concrete type of hierarchical data that you will be responsible for modelling on this assignment.
Consider a folder on your computer (e.g., the assignments folder you have in your csc148 folder). It contains a mixture of other folders (“subfolders”) and some files; these subfolders themselves contain other folders, and even more files. This is a natural hierarchical structure, and can be represented by a tree as follows:
The familiar _root attribute will always store the name of the folder or file (not its path).
The data_size attribute of a leaf is simply how much space (in bytes) the file takes up on the computer. The data_size of an internal node – remember, representing a folder – corresponds to the size (in bytes) of all files contained in the folder, including its subfolders. (Computer folders are actually a little bit bigger than the sum of their contents’ sizes, but we’ll use this simplified version for this assignment to make it easier to work with.)
In code, you will represent this particular data using the FileSystemTree class, which is a subclass of AbstractTree.
The next component of our program is the treemap algorithm itself. This is an operation that takes a tree and a 2-D area to fill, and returns a list of rectangles to render, based on the tree structure and data_size attributes.
For both the 2-D area and the output rectangles, we’ll use the pygame representation of a rectangle, which is a tuple of four integers (x, y, width, height), where (x, y) represents the upper-left corner of the rectangle. Note that in pygame, the y-axis points down, and so the lower-right corner of the rectangle has coordinates (x + width, y + height).
There are many variations of the treemap algorithm, and we’ve chosen one for this assignment. For simplicity we’ll use “size” to refer to the data_size attribute in the algorithm below.
Input: an instance of some kind of AbstractTree, and a pygame rectangle (the display area to fill).
Output: a list of pygame rectangles and each one’s colour, where each rectangle corresponds to a leaf, and has area proportional to the leaf’s data_size.
Algorithm:
If the tree has size 0, return an empty list.
If the tree is just a single leaf with positive size, return a list containing just the single rectangle that covers the whole display area, as well as the colour of that leaf.
Otherwise, if the display area’s width is greater than its height: divide the display area into vertical rectangles, one smaller rectangle per non-zero-sized subtree, in proportion to the sizes of the subtrees.
Example: suppose the input rectangle is (0, 0, 200, 100), and the input tree has three subtrees, with sizes 10, 25, and 15.
- The first subtree has 20% of the total size,
so its smaller rectangle has width 40 (20% of 200): (0, 0, 40, 100).
- The second subtree should have width 100 (50% of 200),
and starts immediately after the first one: (40, 0, 100, 100).
- The third subtree has width 60, and takes up the remaining space:
(140, 0, 60, 100).
Note that the three rectangles’ edges overlap, which is expected.
You can assume that every non-zero-sized leaf will have a large enough size that its computed rectangle has a width and height >= 1.
If the height is greater than or equal to the width, then make horizontal rectangles instead of vertical ones, and do the analogous operations as in step 3.
Recurse on each of the subtrees, passing in the corresponding smaller rectangle from step 3 or 4.
Note about rounding: because you’re calculating proportions here, the exact values will often be floats instead of integers. For every intermediate boundary, always round down to the nearest integer. In other words, if you calculate that a smaller rectangle should be (0, 0, 10.9, 100), round the width down to (0, 0, 10, 100), and make sure the following rectangle starts at (10, 0, …).
However, the final (rightmost or bottommost) edge of the last smaller rectangle should always be equal to the outer edge of the input rectangle. This means that the last subtree might be a bit bigger than its true proportion of the total size, but it won’t be a noticeable difference in our application.
You will implement this algorithm in the generate_treemap method in AbstractTree.
The code in treemap_visualiser.py runs the treemap algorithm, and then uses pygame directly to render a graphical display to the user.
The pygame window is divided into two parts:
Every rectangle corresponds to one leaf (that has a nonzero data_size) in the tree. If the whole tree has data_size zero (so no rectangles are returned by treemap), the entire screen should appear black.
In addition to displaying the treemap, the pygame graphical display responds to a few different events:
The following two events allow the user to actually mutate the tree, which shows how to turn a static visualisation into a tool for simulating changes on a dataset, and seeing what would happen. Once the user performs one of these events, the tree is no longer in sync with the original data set, and instead represents hypothetical sizes based on the user’s actions.
All changes described below should only change the tree object; so if a rectangle is deleted or its size changes, DO NOT try to change the actual file on your computer!
If the user right-clicks on a rectangle, the corresponding leaf is deleted from the tree. You must update the data_size attributes of this leaf’s ancestors, and the visualisation must update to show the new sizes.
Details:
If the user presses the Up Arrow or Down Arrow key when a rectangle is selected, the selected leaf’s data_size increases or decreases by 1% of its current value, respectively. This affects the data_size of its ancestors as well, and the visualisation must update to show the new sizes.
Details:
In the starter code, we have provided both an incomplete version of AbstractTree, and a skeleton of the FileSystemTree class that could be used to subclass it and model how much space your files take up on your computer.
You should do two things here:
To complete the second part, you will need to read the Python os module documentation to learn how to get data about your computer’s files in a Python program. We recommend using the following functions:
The file print_dirs.py is a small sample program that works with the os module. Feel free to derive your own implementation from this.
You may not use the os.walk function - it does a lot of the recursive work for you, which defeats the purpose of this part!
Warning: don’t use string concatenation (+) to try to join paths together. Because different operating systems use different separators for file paths, you run the risk of using a separator that is invalid on the Teaching Lab machines on which we’ll test your code. So instead, make sure to use os.path.join to build larger paths from smaller ones.
You should add the subtrees in your FileSystemTree in the order they are produced from os.listdir.
After this step is done, you should be able to instantiate a FileSystemTree by calling the constructor on a path, and then inspect the tree’s contents in the debugger. Write doctests and pytests to test the attributes of your FileSystemTree objects; having these tests now will save you a lot of time and energy in later tasks.
Your next task should be to implement the generate_treemap method in AbstractTree, which is the actual method responsible for computing the treemap representation of a tree. Your implementation must match the treemap algorithm described above.
Once again, your approach should be recursive; make sure you understand the role of the rectparameter before writing any code.
We strongly recommend implementing the base case before starting the recursive step, to give yourself a check that you understand what the method is supposed to do.
For the recursive step, a good stepping stone to think about is when the tree has height 2, and all leaves have the same data_size values. In this case, the original rectangle should be partitioned into equal-sized rectangles.
Warning: make sure to follow the exact rounding procedure in the algorithm description above. If you deviate from this, you might get numerical errors that will lead to incorrect rectangle bounds.
The basic treemap algorithm does not require pygame at all: you can check your work simply by instantiating a FileSystemTree object, calling generate_treemap on it, and checking the list of rectangles returned. This is one way we’ll be testing your code. Write your own doctests and pytests for generate_treemap. This is the best way to check edge cases (e.g., making sure you’re rounding properly).
Now that you are able to both create concrete instances of FileSystemTree and compute a treemap, you are ready to enjoy the fruit of your labour by showing a pretty visualisation of the treemap.
Open treemap_visualiser.py, and complete the render_display function. You may need to review Lab 7 to practice rendering rectangles using pygame.
Make sure you use the provided constants to show both the treemap and the text display at the bottom of the screen.
Note: you might notice that treemap_visualiser.py is organized into a set of functions, rather than a class. Please do not change this design in any way. We have deliberately not used a class here to illustrate the idea that not all functions must be grouped together in a class. Part of this course is getting used to thinking about the different ways of designing programs, after all!
In the main block of treemap_visualiser.py, uncomment the call to run_treemap_file_system, and put in a path. Try running the program: if all goes well, you should see a beautiful rendering of your chosen folder’s files in the pygame window!
Get Free Quote!
383 Experts Online