berto -> txt.y -- the AGElint parser (12/19/2011 11:57:46 AM)
|
txt.y -- the AGElint parser Here is a very brief, but technical, overview of the AGElint txt.y parser. You may skip this discussion entirely if you wish. What is parsing? Quoting from Wikipedia: http://en.wikipedia.org/wiki/Parser quote:
In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens (for example, words), to determine its grammatical structure with respect to a given (more or less) formal grammar... Consider the sentence: The rain in Spain falls mainly on the plain. Grammatically speaking, that sentence may be viewed as a sequence of - article [The]
- noun [rain]
- prepositional phrase
- preposition [in]
- noun [Spain]
- verb [falls]
- adverb [mainly]
- prepositional phrase
- preposition [on]
- article [the]
- noun [plain]
but other organizing schemes are possible. The formal grammar for English usually specifies that articles precede nouns, that prepositional phrases begin with prepositions and end with nouns, that subject precedes predicate, and so on. The sentence rain The in falls Spain mainly on the plain. violates English grammar, as does plain The rain in Spain falls mainly on the. Obviously, there are innumerable ways to mangle English grammar. Note that English grammar would accept unusual, but still "correct", sentences such as The rain in Spain on the plain mainly falls. Although English grammar has been codified in exhaustive detail, mostly we just somehow "know" what is correct grammar or not. But in computer code, "it just doesn't sound right" is inadequate. We need to spell out the grammar as accurately and completely as possible. In AGElint, we do that in the file txt.y. The AGElint 'make' command inputs txt.y to the bison code generator to output C code -- txt.tab.h, txt.tab.c -- of hideous complexity. You shouldn't ever need to inspect txt.tab.h or txt.tab.c! In the AGElint txt.y, you will see a list of token declarations (see the earlier txt.l lexer description) (also some other technical stuff), followed by the AGE data file "formal grammar". The agelint AGE data file formal grammar begins with: start: abilities
| ais
| aliases
| diplomacies
| ethnics
| events
| facattribs
| facmods
| factions
| includes
| merchandises
| models
| regions
| religions
| researches
| rgndecisions
| rulers
| scripts
| structures
| terrains
| units
; That is, an AGE game data file must be one or the other of the listed possibilities (where '|' signifies OR). Let's look more closely at the aliases case. The aliases stanzas are coded as: aliases: aliasthings
;
aliasthings: aliasthing
| aliasthings aliasthing
;
aliasthing: val eq intvblist
| val eq val
| error
; That is, a series of alias specifications (aliasthings), where an alias specification might be a val, followed by an eq (equal sign =), followed by either an intvblist or another val; else a specification might be in error (in which case the parser can recover from the mistake and move forward with the parsing). Here are the specifications for val, eq & intvblist: val: alsval
| facval
;
eq: _EQ {
agecmdrhs = agecmd;
linenorhs = lineno;
}
intvblist: int
| intvblist vb int
; with some subsidiary specifications: alsval: _ALSVAL {
if (islist_aliases || islist_locals) {
fprintf(stdout, "%s:%d:%s\n", txtfile, lineno, $1+1);
}
free($1);
}
| gmaoptval
| abinamval
| abitxtval
| dinamval
...
| txtval
| unitxtval
| ldrval
| mdlval
;
facval: _FACVAL { free($1); }
;
int: intpos { $$ = $1; }
| intneg { $$ = $1; }
; and on and on ... Look rather complicated? It is! In the txt.y formal grammar, stanzas such as int: might be viewed as subroutines. It's programming, but programming of a sort different from what you are probably used to. As you can see, as with txt.l, in txt.y grammar components might include -- bracketed by { and } -- some C code saying what special things to do, if any. For now, the important thing to understand is: if the pattern of tokens (output by the agelint lexer, the compiled txt.l) doesn't match one of the precisely defined sequences in the formal grammar (specified in txt.y), agelint will report a syntax error (and possibly do other stuff) -- the game data broke the grammar rules! I invite you to look around the txt.y file. If you are really brave, you might try making changes here or there, then redoing the 'make' command to recompile the agelint executable. If you code a "little" change here or there, in the 'make' output, your conflicts line might say something like: txt.y: conflicts: 768 shift/reduce, 523 reduce/reduce rather than the current txt.y: conflicts: 3 shift/reduce, 2 reduce/reduce These "conflicts" represent ambiguities in the lexer/parser specification. Ideally you want 0 conflicts of each type, but that is difficult to impossible, depending on the coding pains you take, and possibly the language (in this case, the AGE scripting language) that you are attempting to model. As with lexer programming, programming parsers can be very weird indeed! But you really don't have to understand any of this. Just know that the AGElint parser checks the game data file(s) for syntactical (and other) correctness. If agelint reports a syntax error, you know who reports it (the compiled txt.y). I will have more, possibly much more, to say about the txt.y parser in future posts. The txt.l lexer is more or less settled. It's in txt.y where most of the action (and ambiguity and controversy and mistakes ...) takes place.
|
|
|
|