Tuesday, June 29, 2004

Parsing And Protocols

There are really two types of information packing in protocols and grammars and these boil down to data being length or lexically delimited. A parser implements a certain grammar. The grammar parses a certain pattern of tokens into memory. How do you know when you have completed parsing a token from the grammar? Either the grammar defines an explicit length for the token or it defines a beginning and end signifier (either through a lower order lexical grammar or through a dictionary)

For example

Length Explicit

[5][H][e][l][l][o]

From this we have

Hello

In this grammar we start with the first byte defining the number of bytes to follow. This is my most favorite method of packing because it allows you to allocate the minimum amount of memory on the receiving end.. which in MetaWrap is often a thin client . This runs into problems if you want to pack things in a hierarchy. Eg.

[12][5][H][e][l][l][o][5][W][o][r][l][d]

When preparing this for transmission, you can’t send the first byte until you know the length of every element in the hierarchy.

Lexical

[“][h][e][l][l][o][”]

We have a starting character for a string and a terminating character. We start with the [“]and keep grabing data until we get the [”].We don’t know when we will get the [”].So we can’t prepare the correct buffer length.

This is why I prefer Length Explicit packing and grammars. They scale better. This is why MetaWrap uses this as its primary data packing and transmission format.

One Or Two Passes?

The XPath 1.0 Specification states that

Expressions are parsed by first dividing the character string to be parsed into tokens and then parsing the resulting sequence of tokens. Whitespace can be freely used between tokens. The tokenization process is described in [3.7 Lexical Structure].

This seems to suggest that it should be performed in two passes, but from reading the rest of the specification, I see no reason why it can't be performed in one pass.

Update:

The XPath 2.0 Specification is a bit more explicit

During the static analysis phase, the XPath expression is parsed into an internal representation called the operation tree (step SQ1 in Figure 1). A parse error is raised as a static error.[err:XP0003] The static context is initialized by the implementation (step SQ2). The static context is used to resolve type names, function names, namespace prefixes and variable names. 

Tuesday, June 29, 2004 6:26:48 PM (AUS Eastern Standard Time, UTC+10:00)  #    Comments [2]