As we’ve discussed in class, trees are a fundamental data structure used to model all sorts of hierarchical data.

computer science

Description

By the end of this assignment, you should have:

  • modeled different kinds of real-world hierarchical data using trees
  • implemented recursive operations on trees (both non-mutating and mutating)
  • implemented a geometric tree visualization algorithm called treemap
  • used inheritance to design classes according to a common interface
  • used the os library to write a program that interacts with your computer’s file system
  • used the pygame library to create an interactive graphical user interface for your program
  • used the PyCharm debugger to inspect the contents of a complex dictionary obtained from JSON data

Introduction

As we’ve discussed in class, trees are a fundamental data structure used to model all sorts of hierarchical data. Often, a tree represents a hierarchical categorization of a base set of data, where the leaves represent the data values themselves, and internal nodes represent groupings of this data. Files on a computer can be viewed this way: regular files (e.g., PDF documents, video files, Python source code) are grouped into folders, which are then grouped into larger folders, etc. Moreover, usually the data values are not all of equal “weight” or “importance.” For example, in our file system one of the most important properties of a regular file is its size; certainly not all files have the same size!

treemap is a visualisation technique to show a tree’s structure according to the weights (or sizes) of its data values, using nested rectangles to show subtrees, scaled to different dimensions to reflect the proportional sizes of each piece of data. Treemaps are used in many contexts. Among the more popular uses are news headlines, various kinds of financial information, and computer disk usage. Here’s an example of a treemap visualisation of files on a computer, from Wikipedia:

File treemap

Some free programs that use treemaps to visualise the size of files on your computer:

  • (Windows) WinDirStat, (OSX) Disk Inventory X, (Linux) KDirStat

For this assignment, you will write an interactive treemap visualisation tool that you can use to visualise multiple types of hierarchical data. We will make the following simplifications for this assignment:

  1. The order in which subtrees are shown matters when modelling files, but not when modelling population data.
  2. Colours are randomly generated independently for each node.
  3. The rectangles only correspond to tree leaves. (The sample image above shows nested rectangles representing the tree’s internal nodes, but we will not include those.)

Setup and starter code

This assignment requires the pygame library, which you installed as part of Lab 7. We recommend working through the Additional Exercise of that lab to familiarize yourself with pygame.

Please download the following starter code files and save these into your assignments/a2 folder.

  • print_dirs.py
  • population.py
  • tree_data.py
  • treemap_visualiser.py
  • populations.json
  • regions.json

Sample testing files:

  • a2_test.py
  • example-data.zip

Problem description

Your work for this assignment consists of writing two main components:

  1. A data modeller that takes in some hierarchical data source (such as, but not limited to, a folder in your computer) and stores it in a tree data structure.
  2. An interactive graphical window showing a treemap visualisation of this data to the user.

Please read through the following sections carefully to understand what is required for this program.

Data representation: abstract version

In order to use a treemap to visualise data, we first need the actual data to be stored in a tree. For this program, we have provided an AbstractTree class to define the common interface that our treemap visualiser will expect for any sort of data that we want to visualise.

This class is similar to the basic Tree class that you have already worked with, with a few differences. The two most conceptually important ones are discussed here, but there are others you’ll explore when you dig into the code.

First, the constructor is considered private, and should be overridden by each subclass. This is because these trees will be initialized through custom procedures that depend on the data we want to represent. (So a tree representing files on your computer will be created in a different way than a tree representing country population data from the World Bank. You’ll do both on this assignment!)

Second, the class has a data_size instance attribute that is used by the treemap algorithm to visualise the tree. data_size is defined recursively as follows:

  • If the tree is empty, data_size is 0.
  • If the tree is a single leaf, its data_size depends on the data being modelled; e.g., size of a file, or population of a country.
  • If the tree has one or more subtrees, its data_size is equal to the sum of the data_sizes of its subtrees.

This is a little abstract. Let’s now discuss one concrete type of hierarchical data that you will be responsible for modelling on this assignment.

Data representation: file system example

Consider a folder on your computer (e.g., the assignments folder you have in your csc148 folder). It contains a mixture of other folders (“subfolders”) and some files; these subfolders themselves contain other folders, and even more files. This is a natural hierarchical structure, and can be represented by a tree as follows:

  • Each internal node corresponds to a folder.
  • Each leaf corresponds to a regular file (e.g., PDF document, movie file, or Python source code file).

The familiar _root attribute will always store the name of the folder or file (not its path).

The data_size attribute of a leaf is simply how much space (in bytes) the file takes up on the computer. The data_size of an internal node – remember, representing a folder – corresponds to the size (in bytes) of all files contained in the folder, including its subfolders. (Computer folders are actually a little bit bigger than the sum of their contents’ sizes, but we’ll use this simplified version for this assignment to make it easier to work with.)

In code, you will represent this particular data using the FileSystemTree class, which is a subclass of AbstractTree.

Visualiser: the treemap algorithm

The next component of our program is the treemap algorithm itself. This is an operation that takes a tree and a 2-D area to fill, and returns a list of rectangles to render, based on the tree structure and data_size attributes.

For both the 2-D area and the output rectangles, we’ll use the pygame representation of a rectangle, which is a tuple of four integers (x, y, width, height), where (x, y) represents the upper-left corner of the rectangle. Note that in pygame, the y-axis points down, and so the lower-right corner of the rectangle has coordinates (x + width, y + height).

There are many variations of the treemap algorithm, and we’ve chosen one for this assignment. For simplicity we’ll use “size” to refer to the data_size attribute in the algorithm below.

Input: an instance of some kind of AbstractTree, and a pygame rectangle (the display area to fill).

Output: a list of pygame rectangles and each one’s colour, where each rectangle corresponds to a leaf, and has area proportional to the leaf’s data_size.

Algorithm:

  1. If the tree has size 0, return an empty list.

  2. If the tree is just a single leaf with positive size, return a list containing just the single rectangle that covers the whole display area, as well as the colour of that leaf.

  3. Otherwise, if the display area’s width is greater than its height: divide the display area into vertical rectangles, one smaller rectangle per non-zero-sized subtree, in proportion to the sizes of the subtrees.

    Example: suppose the input rectangle is (0, 0, 200, 100), and the input tree has three subtrees, with sizes 10, 25, and 15.

    - The first subtree has 20% of the total size,
      so its smaller rectangle has width 40 (20% of 200): (0, 0, 40, 100).
    - The second subtree should have width 100 (50% of 200),
      and starts immediately after the first one: (40, 0, 100, 100).
    - The third subtree has width 60, and takes up the remaining space:
      (140, 0, 60, 100).

    Note that the three rectangles’ edges overlap, which is expected.

    You can assume that every non-zero-sized leaf will have a large enough size that its computed rectangle has a width and height >= 1.

  4. If the height is greater than or equal to the width, then make horizontal rectangles instead of vertical ones, and do the analogous operations as in step 3.

  5. Recurse on each of the subtrees, passing in the corresponding smaller rectangle from step 3 or 4.

Note about rounding: because you’re calculating proportions here, the exact values will often be floats instead of integers. For every intermediate boundary, always round down to the nearest integer. In other words, if you calculate that a smaller rectangle should be (0, 0, 10.9, 100), round the width down to (0, 0, 10, 100), and make sure the following rectangle starts at (10, 0, …).

However, the final (rightmost or bottommost) edge of the last smaller rectangle should always be equal to the outer edge of the input rectangle. This means that the last subtree might be a bit bigger than its true proportion of the total size, but it won’t be a noticeable difference in our application.

You will implement this algorithm in the generate_treemap method in AbstractTree.

Visualiser: displaying the treemap

The code in treemap_visualiser.py runs the treemap algorithm, and then uses pygame directly to render a graphical display to the user.

The pygame window is divided into two parts:

  • The treemap itself (a collection of colourful rectangles).
  • A text display along the bottom of the window, showing information about the currently selected rectangle, or nothing, if no rectangle is selected.

Pygame sample treemap

Every rectangle corresponds to one leaf (that has a nonzero data_size) in the tree. If the whole tree has data_size zero (so no rectangles are returned by treemap), the entire screen should appear black.

Visualiser: user events

In addition to displaying the treemap, the pygame graphical display responds to a few different events:

  1. The user can close the window by clicking the X icon (like any other window)
  2. The user can left-click on a rectangle to select it. The text display updates to show two things:
    • the names of the nodes on the path from the root to the selected leaf, separated by a subclass-specific separator string
    • the selected leaf’s data_size
    The user can change their selection simply by clicking on a different rectangle. If the user clicks again on the currently-selected rectangle, that rectangle becomes unselected (and so no rectangles are selected, and no text is shown).

The following two events allow the user to actually mutate the tree, which shows how to turn a static visualisation into a tool for simulating changes on a dataset, and seeing what would happen. Once the user performs one of these events, the tree is no longer in sync with the original data set, and instead represents hypothetical sizes based on the user’s actions.

All changes described below should only change the tree object; so if a rectangle is deleted or its size changes, DO NOT try to change the actual file on your computer!

  1. If the user right-clicks on a rectangle, the corresponding leaf is deleted from the tree. You must update the data_size attributes of this leaf’s ancestors, and the visualisation must update to show the new sizes.

    Details:

    • Even if the user deletes all the children of a node, it should still remain in the tree. However, that node’s data_size will become 0, and so it will disappear from the visualisation.
    • If the user deletes the currently-selected rectangle, after the deletion no rectangle is selected, and no text is shown.
  2. If the user presses the Up Arrow or Down Arrow key when a rectangle is selected, the selected leaf’s data_size increases or decreases by 1% of its current value, respectively. This affects the data_size of its ancestors as well, and the visualisation must update to show the new sizes.

    Details:

    • A leaf’s data_size cannot decrease below 1. There is no upper limit on the value of data_size.
    • The 1% amount is always rounded up before applying the change. For example, if a leaf’s data_size value is 150, then 1% of this is 1.5, which is rounded up to 2. So its value could increase up to 152, or decrease down to 148.

Task 0: Getting started

  1. Make sure you have installed pygame, and downloaded the starter code.
  2. Open tree_data.py, and read through the starter code carefully. There’s quite a bit of information contained in the docstrings.
  3. Open treemap_visualiser.py. There’s less code in here, but more of it will look unfamiliar because of the pygame library.
  4. Uncomment the call to run_treemap_file_system in the main block, putting in a computer path as suggested in the comments. Run the program. You should see a black window appear on your screen, which you can also close to exit the program (note that you couldn’t do this on Lab 7!).
  5. You can leave population.py alone for now; you’ll only need this for the last part of this assignment.

Task 1: Representing the file system

In the starter code, we have provided both an incomplete version of AbstractTree, and a skeleton of the FileSystemTree class that could be used to subclass it and model how much space your files take up on your computer.

You should do two things here:

  1. Complete the constructor of AbstractTree according to its docstring. This will get you familiar with the attributes in this class.
  2. Complete the constructor of FileSystemTree. This method should be implemented recursively – in other words, part of this method should involve calling the FileSystemTree constructor to build subtrees.

To complete the second part, you will need to read the Python os module documentation to learn how to get data about your computer’s files in a Python program. We recommend using the following functions:

  • os.path.isdir
  • os.listdir
  • os.path.join
  • os.path.getsize
  • os.path.basename

The file print_dirs.py is a small sample program that works with the os module. Feel free to derive your own implementation from this.

You may not use the os.walk function - it does a lot of the recursive work for you, which defeats the purpose of this part!

Warning: don’t use string concatenation (+) to try to join paths together. Because different operating systems use different separators for file paths, you run the risk of using a separator that is invalid on the Teaching Lab machines on which we’ll test your code. So instead, make sure to use os.path.join to build larger paths from smaller ones.

You should add the subtrees in your FileSystemTree in the order they are produced from os.listdir.

Progress check!

After this step is done, you should be able to instantiate a FileSystemTree by calling the constructor on a path, and then inspect the tree’s contents in the debugger. Write doctests and pytests to test the attributes of your FileSystemTree objects; having these tests now will save you a lot of time and energy in later tasks.

Task 2: Implementing the treemap algorithm

Your next task should be to implement the generate_treemap method in AbstractTree, which is the actual method responsible for computing the treemap representation of a tree. Your implementation must match the treemap algorithm described above.

Once again, your approach should be recursive; make sure you understand the role of the rectparameter before writing any code.

We strongly recommend implementing the base case before starting the recursive step, to give yourself a check that you understand what the method is supposed to do.

For the recursive step, a good stepping stone to think about is when the tree has height 2, and all leaves have the same data_size values. In this case, the original rectangle should be partitioned into equal-sized rectangles.

Warning: make sure to follow the exact rounding procedure in the algorithm description above. If you deviate from this, you might get numerical errors that will lead to incorrect rectangle bounds.

Progress check!

The basic treemap algorithm does not require pygame at all: you can check your work simply by instantiating a FileSystemTree object, calling generate_treemap on it, and checking the list of rectangles returned. This is one way we’ll be testing your code. Write your own doctests and pytests for generate_treemap. This is the best way to check edge cases (e.g., making sure you’re rounding properly).

Task 3: Displaying static (non-interactive) treemaps

Now that you are able to both create concrete instances of FileSystemTree and compute a treemap, you are ready to enjoy the fruit of your labour by showing a pretty visualisation of the treemap.

Open treemap_visualiser.py, and complete the render_display function. You may need to review Lab 7 to practice rendering rectangles using pygame.

Make sure you use the provided constants to show both the treemap and the text display at the bottom of the screen.

Note: you might notice that treemap_visualiser.py is organized into a set of functions, rather than a class. Please do not change this design in any way. We have deliberately not used a class here to illustrate the idea that not all functions must be grouped together in a class. Part of this course is getting used to thinking about the different ways of designing programs, after all!

Progress check

In the main block of treemap_visualiser.py, uncomment the call to run_treemap_file_system, and put in a path. Try running the program: if all goes well, you should see a beautiful rendering of your chosen folder’s files in the pygame window!


Related Questions in computer science category


Disclaimer
The ready solutions purchased from Library are already used solutions. Please do not submit them directly as it may lead to plagiarism. Once paid, the solution file download link will be sent to your provided email. Please either use them for learning purpose or re-write them in your own language. In case if you haven't get the email, do let us know via chat support.