Infocom-type parser

From IFWiki

The current baseline standard in interactive fiction parsers was largely originated by Dungeon, later Zork, and further refined by Infocom. It was a huge leap ahead from the two-word parser of the original Adventure at the time, and though there has been a process of slow evolution since, there have been few really revolutionary changes to the grammar. A few rival IF companies of the 1980s (such as Magnetic Scrolls) built parsers of equal complexity, but none really did anything new.

(Any counterexamples from Magnetic Scrolls or similar games?)

While coding a two-word parser is simplicity itself, writing an Infocom-style parser can be an exercise in baffling complexity, even though the semantics of the final parsed result are not much more involved than in Adventure

There may yet be huge leaps to be made in parser grammar and semantics, but as yet the Infocom parser seems to be the sweet spot of features versus complexity and player intelligibility.

The minimal dialect of English understood by the Inform implementation of the Infocom-style parser is often referred to as Informese.


Features

The major advances of the Infocom parser over the two-word parser are:

  • Actors (JACK, PASS ME THE KETCHUP)
  • Indirect objects (UNSCREW THE THYRISTOR WITH THE FROBLITZ WRENCH)
  • Verbs as phrases rather than words (TURN THE LIGHT ON, DON'T PANIC)
  • Multi-word nouns, allowing for adjectives and complex names (THE THING YOUR AUNT GAVE YOU WHICH YOU DON'T KNOW WHAT IT IS)
  • Multiple object selection phrases (TAKE ALL FROM THE BASKET EXCEPT THE CHIP AND THE SOCKET)
  • Numeric quantities (TAKE FIVE GOLD COINS)
  • Raw text or numbers in the place of object slots (TURN DIAL TO 11, SAY "BOOJUM" TO THE SNARK)
  • Articles (THE CAT, AN APPLE, ONE OF THE GOLD COINS)
  • Pronouns (PICK UP THE TICKET. LOOK AT IT. FOLLOW MRS PODSNAP. ASK HER ABOUT THE CRIME)
  • Multiple commands on a line (GO NORTH THEN GO WEST. WAIT)
  • Interactive disambiguation (Do you mean the white chip, the back chip, or the taco chip?)
  • Automatic guessing of missing sentence elements (SHOOT ELEPHANT (with the rifle) )
  • Typo correction (OOPS <text>)

Grammar

The Infocom parser generally accepts sentences of the form:

[ACTOR,] [VERB PHRASE] [DIRECT OBJECT PHRASE] [PREPOSITION PHRASE] [INDIRECT OBJECT PHRASE] [.|THEN]...

or the special case

[PARTIAL WORDS MATCHING AN OBJECT PHRASE]

if waiting for a disambiguation response.


It gives an error if it fails to understand the sentence, or stops and queries the player for more information if the objects mentioned in the sentence or sentence fragment are (or remain) ambiguous. Otherwise, it continues, parsing out the four main facets of the sentence:

  • the actor who is being addressed

    This usually defaults to the 'player' object if no explicit actor was given. (The early Infocom game Suspended is one of the few games which breaks this convention, defaulting to the last spoken-to actor.)

  • the action being taken

    Many verb phrases may parse to the same action. It is very rare for a game to need to know the exact verb written by the player, but some games do this (usually by some kind of 'parser hack' to read the raw input buffer) as a special effect.

    If no verb is supplied, an 'implicit action' may be generated. By convention, the compass directions (NORTH, SOUTH, UP, etc) do this.

  • the direct object(s) if any (called the 'noun' in Inform)

    This can be:

    • A single object
    • A list of objects
    • A piece of raw text (called a 'topic' in Inform 6)
    • A number

    One interesting quirk of the Infocom parser is that in order to correctly parse the noun phrases, the verb usually needs to be known at this point, as only certain verbs allow a list of multiple objects. By convention TAKE and DROP usually support this feature, but other verbs only at the author's discretion.

    Use of multiple objects can spoil many game puzzles - the classic case is 'EXAMINE ALL' revealing objects in a room that are not explicitly listed in the room description and whose existence the player is supposed to learn only by searching - so it's rare for a game to use this feature much.

    The result of actions performed on multiple objects is usually listed in a special notation: 'Object: <action>'. Eg:

    >TAKE ALL FROM BASKET

    White chip: taken
    Black chip: taken

    Taco chip: taken

    By convention, generally multiple actions all occur within one 'move' of game time, which bends mimesis slightly. Generally this is done for ease of programming. Some games however have made the use of multiple-take a requirement for solving certain time-limited puzzles.

  • the indirect object(s) if any (called the 'second' in Inform)

    Follows similar rules to the direct object


With these four pieces of information, the action routine can be called. The parser's job is over and the process of interpreting the player's command starts.