More Parsing Terminology or "Teminator e"

formal grammar is an abstract structure that describes a formal language precisely: i.e., a set of rules that mathematically delineates a (usually infinite) set of finite-length strings over a (usually finite) alphabet.

formal language is a set of finite-length character strings drawn from some finite alphabet.

A context-free grammar (CFG) is a formal grammar in which every production rule is of the form

V ? w

where V is a non-terminal symbol and w is a string consisting of terminals and/or non-terminals. The term “context-free” comes from the fact that the non-terminal V can always be replaced by w, regardless of in what context it occurs.

A formal language is context-free if there is a context-free grammar that generates it.

terminals belong to the set of strings of the alphabet of the language. They do not get replaced during the derivation of a sentence.

non-terminals or variables or syntactic-categories are replaced during the derivation. They represent arbitrary strings.

As the parser accepts a sequence of tokens, it determines, based on this information, when the grammar’s respective rules are complete and verifies the syntactic correctness of the token sequence. The end result of the process is a derivation which represents the token sequence organized following the rules of the grammar.

A derivation of a string is a sequence of steps where nonterminals are expanded so that the string results.

  S ->
  "if" E "then" S "else" S ->
  "if" "2" "=" "2" "then" S "else" S ->
  "if" "2" "=" "2" "then" "print" E "else" S ->
  "if" "2" "=" "2" "then" "print" "2" "=" "2" "else" S ->
  "if" "2" "=" "2" "then" "print" "2" "=" "2" "else" "print" E ->
  "if" "2" "=" "2" "then" "print" "2" "=" "2" "else" "print" "2" "=" "2"

A leftmost derivation is one where every step expands the leftmost nonterminal. A rightmost derivation expands the rightmost nonterminal.

A derivation need not be rightmost or leftmost. Many derivations can result in the same syntax tree.

A derivation starting from the start symbol gives a member of the language (after all variables have been eliminated)

Every context gree language has a start symbol.

The start symbol represents all valid sentences in the language

The head (of a production) a single variable for context free langiuages in a similar notation for non-context-free languages, the head may contain more than one symbol. The body (of the production) a sequence of symbols, variables as well as terminals are allowed.

To infer that a sentence is in the language

Recursive Inference:

From body to head – Start from terminals and work up the tree to the start symbol

Derivation:

From head to body – Start from the start symbol and apply productions until you get the sentence

Regular Languages are a subset of Context Free Languages All regular languages are also context free

Definition of  a Context Free Grammar

G = {V,T,P,S}

V = Finite set of variables

nT = Finite set of terminals
 

nP = Set of productions
S = Start symbol

For language

L = { 0n1n | n >= 1}

S -> 0 X 1

X -> e

X -> 0 X 1

The grammar is

G = ( {S,X} , {0,1} , {(S,0X1),(X,e),(X,0X1)},S )

The ancient Indian linguist Panini described Sanskrit using a context-free grammar. Recently, it has been suggested that a class of Tamil poetry called Venpa is governed by a context-free grammar.

About James McParlane

CTO Massive Interactive. Ex Computer Whiz Kid - Now Grumpy Old Guru.
This entry was posted in Parsing Theory, XPath. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s