[Rescued from the old blog – Thanks to Damian]
I’m trawling through conventional theory at the moment to see if my approach shows up on the radar anywhere – if it does then I can probably get some hints and tips on what could make it better or if I am doing something wrong.
My algorithm skips the whole NFA -> DFA transformation stage and completes the build of what should be an optimal DFA in what I hope will be one pass through the grammar. This is only applicable to context free grammars where each lexeme is unambiguous.
The data structure for I use maps transitions into groups of states. This is evident in my description diagrams where there is a more than one dot notation for the applicable lexeme eg. http://set-top.net/tests/dot/test13.png . The aim has always been to preserve the state structure and lexeme context. More importantly it allows transitions to be reduced by effectively allowing the overloading of transitions, for each state group is effectively polymorphic as one or more potential states.
I have reasons for this approach that will become self evident when I explain the endgame of the implementation.
This is the first of three posts….
Finite State Automata In General
Formally, an automaton is represented by 5tupple Q ? d S0 F ,where:
- Q is a finite set of states.
- ? is a finite set of symbols, that we will call the alphabet of the language the automaton accepts.
- d is the transition function, that is
- S0 is the start state, that is, the state in which the automaton is when no input has been processed yet (Obviously, S0? Q).
- F is a set of states of Q (i.e. F?Q), called accept states.
With all this, we can now say that the language L accepted by an DFA automaton M is:
language.png
Deterministic Finite State Automata (DFA)
The term deterministic can mean two things:
- for recognizers, it means that for each state there cannot be more than one out transitions with the same label, there is one initial state, and there are no epsilon transitions;
- for transducers, it usually means that in addition there can be no two out transitions with the same label on the “input” level, and different labels on the “output” level.
A DFA has no e transitions
DFAs transition function is single-valued
For a DFA, the action of automaton on each input symbol is fully determined which provides an obvious table-driven implementation.
The following state diagram represents the regular language grammar ( a | b )* abb as a deterministic finite state automata. DFA.
( a | b )* abb
We can see that this is deterministic because
- All states have unambiguous transitions (from state to state, for a given character you will always end up in the same state).
- No e (empty string) transitions
Non-deterministic Finite State Automata (NFA)
A NFA has..
-
A set of states a start state and at least one accepting state.
-
Arrows connecting states labeled by input symbols, or e (which does not consume input)
-
More than one arrow leaving a state may have same label
For an NFA..
- an automaton may have choice on each step
- automaton accepts a string if there is any way to make choices to arrive at accepting state / every path from start state to an accept state is a string accepted by automaton
- not obvious how to implement efficiently!
The following state diagram represents the regular language ( a | b )* abb as a non-deterministic finite state automata (NFA).
( a | b )* abb
We can see that this is non deterministic because of the following.
-
S0 has a transition on e
-
S1 has two transitions on a
Alternating Finite Automata (AFA)
In automata theory, an alternating finite automaton (AFA) is a non-deterministic finite automaton whose transitions are divided into existential and universal transitions. Let A be an alternating automaton.
- For a transition afat1.png, A nondeterministically chooses to switch the state to either q1 or q2, reading a.
- For a transition afat2.png, A moves to q1 and q2, reading a.
Note that due to the universal quantification a run is represented by a run tree. A accepts a word w, if there exists a run tree on w such that every path ends in an accepting state.
A basic theorem tells that any AFA is equivalent to an non-deterministic finite automaton (NFA) by performing a similar kind of powerset construction as it is used for the transformation of a NFA to a deterministic finite automaton (DFA). This construction converts an AFA with k states to a NFA with up to 2k states.
Finite State Transducers (FST)
A finite state transducer (FST) is a finite state machine with two tapes.
A finite state transducer is defined in the same way as an FSM except that:
- In addition to the input alphabet, an output alphabet is specified
- Typically there is no notion of acceptance of a string, so no accepting states are specified
- Each transition has an associated output string (i.e., string over the output alphabet)
A finite state transducer is drawn in the same way as an FSM except that:
- Typically there is no notion of acceptance of a string, so no state is designated as accepting
- Each transition is marked with a symbol or set of symbols as in the case of an FSM, followed by a forward slash, followed by an output string
Termination is decided not on a reaching a particular state, but on a particular transition between states.
Contrast this with an ordinary finite state automaton, which has a single tape. An automaton can be said to recognize a string if we view the content of its tape as input. In other words, the automaton computes a function that maps strings into the set {0,1}. Alternatively, we can say that an automaton generates strings, which means viewing its tape as an output tape. On this view, the automaton generates a formal language, which is a set of strings. The two views of automata are equivalent: the function that the automaton computes is precisely the characteristic function of the set of strings it recognized. The class of languages generated by finite automata is known as the class of regular languages.
The two tapes of a transducer are typically viewed as an input tape and an output tape. On this view, a transducer is said to transduce (i.e., translate) the contents of its input tape to its output tape, by accepting a string on its input tape and generating another string on its output tape. It may do so nondeterministically and it may produce more than one output for each input string. A transducer may also produce no output for a given input string, in which case it is said to reject the input. In general, a transducer computes a relation between two formal languages. The class of relations computed by finite state transducers is known as the class of rational relations.
Finite State Transducers are typically useful in NLP (Natural Language Processing) research.
(a | b)* aab
Completes on output of 1
Normal Non-deterministic Finite State Automata (NNFA)
A normal NFA (abbr. NNFA) is an NFA in which all edges leading into the same state have the same label. Thus, it is convenient to label states instead of edges. Visually this has a strong correpondence with the language is recognises.
(a | b) * abb
Compressed Normal Non-deterministic Finite State Automata (CNNFA)
http://www.cs.nyu.edu/web/Research/Theses/chang_chia-hsiang.pdf
(a | b) * abb
Comparison between NFAs and DFAs
NFAs are more compact – they generally require fewer states to recognize a language.
It obvious that a DFA can be simulated with an NFA, because the set of all DFAs is a subset of the set of all NFAs. (Proof)
Its less obvious that every NFA can be simulated with a DFA, and NFA has an equivalent DFA. It is possible to convert an existing NFA into a DFA for the purpose of implementing a simpler machine. This can be performed using the powerset construction which may lead to an exponential raise in the number of necessary states. Such determinization however is necessary to build a complement automaton.
[UPDATE] – corrected an image that I mixed up when resizing them. The First NFA image was a repeat of the DFA.
Hello. Great blog here, but for the first state graph. how did you get to the regular expression (a|b)*abb?
If you follow the graph, that is what the grammar is.
Well Yes, I did, and following the general method/equation which is: Tx=0Tx0 + 1Tx1 + £x, I’m guessing that you are familiar with the equation. If you work it out, you will reach a point where you have recurring terminal not being equal to non-terminals. How did you go about that? And also how do you chose the bit combinations as well as magnitudes while equating recurring terminals to non-terminals.
Do you understand my question?
The equation looks like LL(1) parse tree generation?
The graph is of the regular expression for a given algorithm,
G = A(R)
That first graph is designed by a human.
Further down I demonstrate graphs arrived at by different algorithms.
I would really like to know.
Sure. Thank you so much.. Just one more question if I may. How do you go by find a regular expression for an epsilon NFA that has two initial states?
Alternatively what construction translates a NFA with multiple initial states to a NFA with single initial state?