In this semester you will implement a stack-based virtual machine. It is patterned after the Java Virtual Machine, but the instruction set is considerably downsized to make it suitable for student projects.

computer science

Description

In this semester you will implement a stack-based virtual machine. It is patterned after the Java Virtual Machine, but the instruction set is considerably downsized to make it suitable for student projects. The objective of Project 1 is to implement a lexical analyzer for our VM language, which has 7 token categories ⟨unsigned int⟩ through ⟨comma⟩ defined by the following EBNF:

⟨digit⟩ → 0 | 1 | ... | 9
⟨unsigned int⟩ → {⟨digit⟩}+
⟨signed int⟩ → (+|−) {⟨digit⟩}+
⟨float⟩ → [+|−] ( {⟨digit⟩}+ "." {⟨digit⟩} | "." {⟨digit⟩}+ )
⟨floatE⟩ → ⟨float⟩ (e|E) [+|−] {⟨digit⟩}+
⟨instruction name⟩ → "iconst" | "iload" | "istore" | "fconst" | "fload" | "fstore" |
                                    "iadd" | "isub" | "imul" | "idiv" | "fadd" | "fsub" | "fmul" | "fdiv" |
                                    "intToFloat" |
                                    "icmpeq" | "icmpne" | "icmplt" | "icmple" | "icmpgt" | "icmpge" |
                                    "fcmpeq" | "fcmpne" | "fcmplt" | "fcmple" | "fcmpgt" | "fcmpge" |
                                    "goto" | "invoke" | "return" | "ireturn" | "freturn" | "print"
⟨colon⟩ → ":"
⟨comma⟩ → ","

According to the above definitions, the integers and floating-point numbers may be signed with "+" or "−". Moreover, the integer or fractional part, but not both, of a string in ⟨float⟩ may be empty. The instruction names are case-sensitive.


The following is a DFA to accept the token categories, except for ⟨instruction name⟩.

 


The DFA has an auxiliary final state id defining ⟨id⟩ → {⟨letter⟩}+ to implement the extraction of the instruction names in the following way:

1.     Create 33 additional special DFA states for the instruction names.

2.     The DFA initially accepts the instruction names as identifiers.

3.     Each time the DFA accepts an identifier, check if it is one of the instruction names. If so, move the DFA to the corresponding instruction-name state; otherwise, issue a lexical error message.

The implementation should be based on the above DFA. Your lexical analyzer program should clearly separate the driver and the state-transition function so that the driver will remain invariant and only state-transition functions will change from DFA to DFA. The enumerated or integer type is suggested for representation of states.

The lexical analyzer program is to read an input text file, extract the tokens in it, and write them out one by one on separate lines. Each token should be flagged with its category. The output should be sent to an output text file. Whenever invalid tokens are found, error messages should be printed, and the reading process should continue.

You may modify one of these sample Java programs into your solution; if you do so, modify the comments suitably as well.

Here's a sample set of test input/output files:

in1 | out1
in2 | out2
in3 | out3
in4 | out4
in5 | out5
in6 | out6
in7 | out7

Note that when the unexpected char is the newline char, the error message is displayed on the next line because my program, which is based on the sample lex analyzer, appends the newline char to the string read so far and displays it. You should make your own additional input files to test the program.

Since the purpose of this project is to reinforce, firsthand, the understanding of the internal mechanism of lexical analyzers built from finite automata as opposed to viewing them as black boxes, you are not allowed to use any library functions/tools for lexical analysis (like the Java StringTokenizer).

To make grading efficient and uniform, observe the following:

  • The program must read the input/output file names as external arguments to the main function. How to set external arguments to Java main function in Eclipse.
  • If Java is used, the main function to be invoked to run the program must be included in LexVM.java class.

Our project plan for the semester is to implement a top-down parser and instruction store for this VM language in Project 2 and an interpreter of the instructions in Project 3 and 4.

Submission

Project 1, your full name

Include concise instructions for how to compile and run your program. You may email the entire materials in a .zip or .rar compressed file.

The due date is 10/04/19, Friday, 11 PM. No late projects will be accepted. If you haven't been able to complete the project, you may send an incomplete program for partial credit. In this case, include a description of what is and is not working in your program along with what you believe to be the sources of the problems.


Related Questions in computer science category