Turning a Parenthesized Expression into a Tree
The algorithm below
shows how to turn an expression with symbols, parentheses, and commas into the
corresponding tree. For instance, we want to take as input the string
S(NP(Name(Amy)),VP(Verb(gave),NP(Det(the),Adj(pretty),Noun(flowers)),
PP(Prep(to),NP(Det(the),Adj(small),Noun(boy)))))
and output the tree
S--->NP--->Name--->Amy
|
|->VP--->Verb--->gave
|
|->NP--->Det--->the
| |
| |->Adj--->pretty
| |
| |->Noun--->flowers
|
|->PP--->Prep--->to
|
|->NP--->Det--->the
|
|->Adj--->small
|
|->Noun--->boy
This is similar to the
algorithm for turning a prefix expression into a tree, except that, in a prefix
expression, each operator has a fixed number of arguments; in a parenthesized
expression, a node can have an arbitrary number of children.
We'll use two
functions extractSymbol(s,k) and extractExpression(s,k). They both take two arguments: s is a string,
and k is an index in s where the function starts
reading. They both return two values. Function extractSymbol returns
the first complete symbol starting at S[K] and the index following that symbol.
Function extractExpression returns the first complete expression
starting at S[K] and the index following that symbol.
The pseudo-code, in a
sort of pseudo-Java that allows multiple returned values, follows below. (This
does not check for all possible input errors, e.g. unbalanced expressions.)
class Tree {
String root;
List<Tree>
children;
}
[String, int] extractSymbol(String s, int k) {
while (whiteSpace(s[k])
k++; % Skip white space
if (s[k] == '(' || s[k]
== ',' || s[k] == ')')
return ((String)
s[k], k+1);
if (! alphabetical(s[k])
error("Invalid input at index " + k);
% Otherwise, read an alphabetical symbol starting at s[k]
String m = "";
while (alphabetical(s[k]))
{
m = m+s[k];
k++;
}
return (m,k);
} % end extractSymbol
[Tree, int] extractExpression(String s, int k) {
[String m, int q] = extractSymbol(s,k);
if
(!(alphabeticalString(m))) error("Invalid input at index " + k)
Tree t = new
Tree(m,emptyList());
[String n, int
peek] = extractSymbol(s,q); % Peek at next symbol
if
(!n.equals("(") return (t,q);
q = peek; % Go past the open paren
repeat { % read the arguments
[Tree c, int q] =
extractExpression(s,q);
t.addChild(c); % add c as a child of t
[n,q] =
extractSymbol(s,q);
if
(n.equals(")") % reached balancing
close paren
return (t,q);
if
(!n.equals(",")) error("Invalid input at index " + q)
% if the
next symbol is a comma, then just continue the loop
} %end repeat
} %end extractExpression.
Example: Let s be the string
"S(NP(Pron(I)),VP(Verb(fish)))". In the trace below "eE"
means "extractExpression " and "eS" means
"extractSymbol".
S(NP(Pron(I)),VP(Verb(fish)))
0123456789012345678901234567
Get Free Quote!
357 Experts Online