Cold Calling On An Instant Messenger

If you are going to add someone out of the blue to your contact list in instant messenger, and you have the good luck that they don’t reject you out of hand, don’t then ignore a request of

"Hi, can you please tell me who you are?"

or reply to that request with

"1 sec"

And leave them waiting.

I don’t know about anyone else, but I hate IM spam and If I think you are at all susspect I will block you, with prejudice.

And then I will send my pet.

Posted in Rants | Leave a comment

Hire'm

Mr Air bribed me with a packet of potato chips, so I am now going to plug his new venture.

What is Hire’m? Hire’m is powerful service to streamline and simplify the hiring process for any kind of business that needs to hire people. Hence the title of this post, “Simplified Hiring”.

Hire’m is an idea that I’ve had in the back of my head since early last year. It’s the one idea that I keep coming back saying to myself “you’ve really got to do this, there is a real need for this service”.

We’re in the development stage at the moment so we’re still a little way off launch but you should expect to see something in the second half of 2006. As time goes on, we’ll be telling the story of Hire’m – the challenges we’ve faced, building the service and life in the start-up world.

Want to know more? Want to help test out the service later down the track? Go to the Hire’m website and sign up to the mailing list.

 

Posted in Coolhunting, Ookle, Web2.0 | Leave a comment

Creating A Scanner From A Given Input Grammar – The Current State Of The Art

I’m trawling through conventional theory at the moment to see if my approach shows up on the radar anywhere – if it does then I can probably get some hints and tips on what could make it better or if I am doing something wrong.

This is the second of three posts.

Conventional wisdom is that you perform the following steps.

  1. Write down the regular language for the input grammar.
  2. Convert the NFA into a DFA (deterministic finite state automata).
  3. Systematically shrink the DFA.
  4. Turn it into code or a lookup table.

Step 1) Write Down The Regular Language

( a | b )* abb

Step 2) Build An NFA (Using Thompson’s Construction aka Inductive Construction)

 

For a Given Regular language, Building the NFA is generally done using Thompson’s construction.

 

First parse a regular language into constituent subexpressions.

eg. for ( a | b )* abb you would produce the following expression tree…

REtree

Now traverse the tree depth first and construct the NFA using the following rules.

a) For e (no input) construct the following pattern

thompson_rule1

b) For a single character, ‘a’ construct the following pattern.

thompson_rule2

Now we get into the rules for compositing TFA’s together with the various logical operations.

c) For constructing a composite NFA from two pre-existing NFAs s and t in the form of (s | t) .  Let NFA(s) and NFA(t) be the NFAs for regular expressions s and t. Construct the following pattern.

thompson_rule3

s|t

Newly added nodes are in orange.

d) For constructing a composite NFA from two pre-existing NFAs s and t in the form of st .  Let NFA(s) and NFA(t) be the NFAs for regular expressions s and t. Construct the following pattern.

thompson_rule4

st

Merge final state of NFA(s) and initial state of NFA(t) (in orange)

e) For constructing an NFA in the patten of a Kleene Star (a sequence of zero or more elements) from an existing NFAs s in the form of s* .  Let NFA(s) be the NFA for the regular expressions s. Construct the following pattern.

thompson_rule5

s*

Newly added nodes are in orange.

f) For constructing an NFA in the patten of a Kleene Cross (a sequence of at least one element) from an existing NFAs s in the form of s+ .  Let NFA(s) be the NFA for the regular expressions s. Construct the following pattern.

thompson_rule6

s+

Newly added nodes are in orange.

 

g) For constructing an NFA in the patten of an option (zero or one element) from an existing NFAs s in the form of s? .  Let NFA(s) be the NFA for the regular expressions s. Construct the following pattern.

thompson_rule7

s?

Newly added nodes are in orange.

Our example language of ( a | b ) * abb would parse as

thompsons_example2

( a | b ) * abb

However the following is what you get if a human designs it.

thompsons_example2h

A more dramatic example of the inefficiency is the NFA generated for  a ( b | c ) * would look like

thompsons_example2-small

a ( b | c ) *

Generated by Thompson’s Method

However the following is what you get if a human designs it.

thompsons_example1h

a ( b | c ) *

Generated by Common Sense

But its good to know we can automate the generation – even if the result is not optimal.

Step 3) Convert the NFA into a DFA (Determinisation)

A Commonly used algorithm – subset construction.

From the point of view of the input, any two states that are connected by an e-transition may as well be the same, since we can move from one to the other without consuming any character. Thus states which are connected by an e-transition will be represented by the same states in the DFA.

If it is possible to have multiple transitions based on the same symbol, then we can regard a transition on a symbol as moving from a state to a set of states (ie. the union of all those states reachable by a transition on the current symbol). Thus these states will be combined into a single DFA state.

Ni is a state from the source NFA

? is a finite set of symbols, that we will call the alphabet of the language the automaton accepts

Move( Si , a )  is the set of states reachable from Si by a. This function takes a state and a character, and returns the set of states reachable by one transition on this character.

e-closure( Si ) is the set of states reachable from Si by e . This function takes a state and returns the set of states reachable from it based on (one or more) e-transitions. Note that this will always include the state tself. We should be able to get from a state to any state in its e-closure without consuming any input.

Start state derived from S0 of the NFA
Take e-closure(S0)
Take the image of S0, Move(S0, ?) for each ? ? S, and take its e-closure
Iterate until no more states are added

S0 := e-closure({N0})
S := { S0 }
W := { S0 }
while ( W != Ø )
select and remove s from W
for each ? ?S
t := e-closure(Move( s,? ))
T[ s , ? ] := t
if ( t not in S ) then
add t to S
add t to W

 

Output is a set of states S and a set of transitions T.

Worst case is an exponential increase in the number of states and transitions.

A list of conventional algorithms.

Ted Leslie, Efficient Approaches to Subset Construction, masters thesis, Computer Science, University of Waterloo, 1995.[bibtex]

J. Howard Johnson and Derick Wood, Instruction Computation in Subset Construction, in Automata Implementation, Darrell Raymond and Derick Wood and Sheng Yu (eds.), Lecture Notes in Computer Science 1260, pp. 64-71, Springer Verlag, 1997.[bibtex]

Gertjan van Noord, The Treatment of Epsilon Moves in Subset Construction, Computational Linguistics, 26(1), pp. 61-76, April 2000. Available from www.let.rug.nl/~vannoord/papers/[bibtex]

 

Mehryar Mohri, Finite-State Transducers in Language and Speech Processing, Computational Linguistics, 23(2), pp. 269-311, June 1997.[bibtex]

Mehryar Mohri, On some applications of finite-state automata theory to natural language processing, Natural Language Engineering, vol. 2, pp. 61-80, 1996. Originally appeared in 1994 as Technical Report, institut Gaspard Monge, Paris.[bibtex]

Mehryar Mohri and Fernando C.N. Pereira and Michael Riley, A Rational Design for a Weighted Finite-State Transducer Library, Automata Implementation. Second International Workshop on Implementing Automata, WIA ’97, Lecture Notes in Computer Science 1436, Springer Verlag, 1998.[bibtex]

Emmanuel Roche and Yves Schabes, Deterministic Part-of-Speech Tagging with Finite-State Transducers, Computational Linguistics, 21(2), pp. 227-253, June, 1995.[bibtex]

 

Step 4) Systematicaly Shrink The DFA (Minimisatiion)

A list of conventional algorithms.

The fastest (O(|Q|log|Q|), where |Q| is the number of states) minimization algorithm (by John E. Hopcroft) is described in:

John E. Hopcroft, An n log n algorithm for minimizing the states in a finite automaton, in The Theory of Machines and Computations, Z. Kohavi (ed.), pp. 189-196, Academic Press, 1971.[bibtex]

David Gries, Describing an Algorithm by Hopcroft, Acta Informatica, vol. 2, pp. 97-109, 1973.[bibtex]

A. V. Aho, J. E. Hopcroft and J. D. Ullman, The Design and Analysis of Computer Algorithms, Addison-Wesley Publishing Company, 1974.[bibtex]

The third item on the list (i.e. the book) is probably the easiest to find, but beware that the description is somewhat hidden. Look for “partitioning”. The problem of minimizing states in the set of states of an automaton is equivalent to finding the coarsest partition of the set of states, consistent with the initial partition into final and non-final states, such that if states s and t are in one block of the partition, then states delta(s, a) and delta(t, a) are also in one block of the partition for each input symbol a.

An algorithm that is considered by some people to be equally fast for frequently encountered data is Brzozowski’s algorithm. It minimizes an automaton by performing reversal and determinization twice. The subset construction (determinization) is O(2|Q|), where Q is a set of states, but it depends on input (see master thesis by Ted Leslie in the Determinization section). The second paper here describes complexity of operations needed to implement Brzozowski’s algorithm.

J. A. Brzozowski, Canonical regular expressions and minimal state graphs for definite events, in Mathematical Theory of Automata, Volume 12 of MRI Symposia Series, pp. 529-561, Polytechnic Press, Polytechnic Institute of Brooklyn, N.Y., 1962.[bibtex]

C. Câmpeanu, K. Culik II, and Sheng Yu, State Complexity of Basic Oprations on Finite Languages, utomata Implementation. Proceedings of 4th International Workshop on Implementing Automata, WIA’99, Oliver Boldt and Helmut Jürgensen (Eds.), Postdam, Germany, LNCS 2214, Springer, 2001.[bibtex]

The following algorithm is different from all others because partial results can be used. It tests pairs of states for equivalence. The first paper describes a version with exponential time complexity. The second paper adds modifications that make it almost O(|Q|2).

Bruce W. Watson, An incremental DFA minimization algorithm, Finite State Methods in Natural Language Processing 2001, ESSLLI Workshop, August 20-24, Helsinki, Finland, 2001.[bibtex]

Bruce B. Watson, Jan Daciuk, An efficient incremental DFA minimization algorithm, Natural Language Engineering, Volume 9, Issue 01, March 2003. [bibtex]

The Hopcroft-Ullman is one of the best known. Its complexity is O(|Q|2), and it also requires O(|Q|2) memory, so it does not seem to be suitable for large automata. Pairs of states are considered, and the automaton has to be complete (i.e. it must have a sink, or dead state). A pair of states is distinguished if one of the states in the pair is final, and the other non-final. A pair of states is also distinguished if transitions labeled with the same symbol lead to another pair of states that was found distinguishable. For each pair of states, a list of pairs is found that are targets of transitions. Initially, pairs containing one final, and one non-final state are marked as distinguishable. Then all pairs that are on lists associated with those pairs are marked, and so on until no new pairs are marked. Pairs that are not marked are equivalent.

John E. Hopcroft and Jefferey D. Ullman, Introduction to Automata Theory, Languages, and Computation, Adison-Wesley Publishing Company, Reading, Massachusets, USA, 1979.[bibtex]

The Aho-Sethi-Ullman algorithm has the same complexity as the Hopcroft-Ullman algorithm, i.e. O(|Q|2), but it needs only O(|Q|) memory. As in the Hopcroft algorithm, the set of states is partitioned, and the blocks are refined in each step, until no more divisions occur. Two states are put into different blocks if they have transitions leading to states in different partitions. In difference to the Hopcroft algorithm, forward transitions are used for that purpose (the Hopcroft algorithm uses back transitions to split blocks with transitions leading to the current block).

Alfred V. Aho and Ravi Sethi and Jeffrey D. Ullman, Compilers. Principles, Techniques and Tools, Addison Wesley, 1986.[bibtex]

For acyclic automata, there are better methods than those given above. The minimization part of incremental and semi-incremental construction algorithms can be used.

Transducers

Mehryar Mohri, Compact Representations by Finite-State Transducers, ACL’94, Morgan Kaufmann, San Francisco, California, USA, 1994.[bibtex]

 

Step 5) Generate Code

Trivial Exercise

Posted in Parsing Theory | 2 Comments

MetaWrap Time Converter, Build 4

New Version Here

This build fixes some reported bugs in the timezone data.

  1. Christchurch NZ, Incorrect timezone – traced to bug in timezone memory decompression code.
  2. Juneau – Alaska – U.S.A, Incorrect timezone
  3. USA and Canada – corrected Daylight saving start times

Thanks for the feedback people! 🙂

To upgrade you can either,

Choose Uninstall on the tool-tip menu on your current version and run the new one

uninstall.png

or

Select Exit via the tool-tip menu and replace your current mwtimeconverter.exe executable, and then run it.

exit.png

Posted in TimeConverter | Leave a comment

User Centred Design and Leg Waxing

Fantastic post in which Leisa Reichelt talks about User Centred Design and a leg-waxing experience.

Perhaps it was the combination of pain and dissatisfaction that made me combine my work with my leg wax. In terms of user experience design, nothing I’m saying here is new. But its worth repeating over and over again… its worth investing in good user experience.

Do this and you can ensure that your users repeat their good experience themselves and promote you to their friends and colleagues. Don’t do it, and they’ll go out of their way to avoid you, and to make sure everyone they know does likewise.”

Posted in Web2.0 | Leave a comment

JavaScript Arguments

arguments

The arguments object provides access to the arguments that are passed into a function.

You can refer to a function’s arguments within the function by using the arguments array. This array contains an entry for each argument passed to the function.

The arguments array is available only within a function body. Attempting to access the arguments array outside a function declaration results in an error.

eg.

function Average()
{
   var sum = 0
   for(var i=0; i < arguments.length; i++)
   {
      sum = sum + arguments[i]
   }
   return (sum/arguments.length);
}

You can use the arguments array if you call a function with more arguments than it is formally declared to accept. This technique is useful for functions that can be passed a variable number of arguments.

You can use arguments.length to determine the number of arguments passed to the function, and then process each argument by using the arguments array. (To determine the number of arguments declared when a function was defined, use the Function.length property.)

Example.

function listItems() 
{
   var items = arguments.length
   document.write("<UL>n")
   for (i = 0;i < items;i++)
   {
      document.write("<LI>" + arguments[i] + "n")
   }
   document.write("</UL>n")
}

The function may be called as follows:

listItems("Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Happyday");

 

The arguments object has some very handy members.

arguments.length

The number of arguments passed into the function.

arguments.callee

This property is a reference to the function that you are in. Which is very handy if you want to get a reference to an anonymous function from inside itself. One good reason for doing this would be in a recursive anonymous function.

function makeFactorialFunc()
{
   return function(x)
   {
      if (x <= 1)
       {
         return 1;
       }
       return x * arguments.callee(x - 1);
   };
}

var result = makeFactorialFunc()(5); // returns 120 (5 * 4 * 3 * 2 * 1)

arguments.caller

This property is a reference to the function that called the function you are in. This allows you to get access to the entire current callstack which is very handy for debugging.

function myFunc()
{
  if (arguments.caller == null)
  {
    return ("The function was called from the top!");
  }
  else
  {
    return ("This function's caller was " + arguments.caller);
  }
}

 

There are two other things you can do with the arguments object if you combine it with the fn.apply function.

Pass arguments from function to function.

function Called(a,b)
{
	return (a + " certainly " +  b);
}

function Caller(a,b)
{
	return(Called.apply(this,arguments));
}

Fake arguments by creating an array

function Called(a,b)
{
	return (a + " certainly " +  b);
}

function FakeCaller(a,b)
{
	var l_arguments = new Array();

	l_arguments[l_arguments.length] = a;
	l_arguments[l_arguments.length] = nb;

	return(Called.apply(this,l_arguments));
}

See this test-case for an example of both of these cases.

Posted in JavaScript, MetaWrap Server, Web2.0 | 7 Comments

Coolest User Interface Ever – The Future Is Touchy Feely

Thanks to Lela Kaunitz for sending me this link – I watched it and it just blew me away.

While touch sensing is commonplace for single points of contact, multi-touch sensing enables a user to interact with a system with more than one finger at a time, as in chording and bi-manual operations. Such sensing devices are inherently also able to accommodate multiple users simultaneously, which is especially useful for larger interaction scenarios such as interactive walls and tabletops.

http://mrl.nyu.edu/~jhan/ftirtouch/

Posted in Coolhunting | Leave a comment

Adding XPath to Metawrap – Part I

As part of the MetaWrap Continuous Integration Project (mw_monitor), I have decided to add XPath to the MetaWrap XML engine. There are currently three methods I could use to process XML in mw_monitor.

DTD Plug-in – Using this method I would define a DTD and then create handler code for each of the elements. This produces a very fast and tightly bound XML processor that can trigger events during parse, import or as a post load processing run.
XML Visitor – Classes can be created that will handle a particulart element or atribute combination. The XML Tree is then processed and those objects that match fire events.
DOM – The MwXML Document class was created before DOM became a standard so its functionality intersects with but is not 1:1 with DOM. Its a future project to add full DOM like behavior and member names.

I could use one of these – but I want to add one more – which is XPath – mainly because I have gotten myself into a very sexy R&D project that predicates the creation of a custom XSLT processor. I also want to cleave off into XQuery at some point in the future .

The MetaWrap XML Parser owes its origins to work I did with SGML and electronic publishing in the early 90’s. The current version of the MetaWrap parser has come a long way since its primitive beginnings.

Fast forward 8 years and The MetaWrap XML Engine was being used for a specific purpose, which was the automatic classification and transcoding of HTML sites into XML. Before XSLT there was the ‘MetaWrap Pattern Language’ This could define a predicate that could be thrown against a website. It would bind to the website in a similar fashion to how an antigen bonds to a chemical. The theory was that those patterns that bound well would be thrown back into a genetic algorithm and interbred. With a carefully orchestrated breeding program, a population of different species of pattern would emerge that could classify a HTML site and extract content from them.

<F:PATTERN name=”pgrid_mon”>
<F:PAT name=’day’ xml noid>
  <F:PAT tag=’TR’>
<F:PAT tag=’TD’ flex match=”contains(‘*CMT*PACIFIC*RIM*PROGRAMMING*SCHEDULE*’)”></F:PAT>

</F:PAT>
<F:PAT tag=’TR’>
<F:PAT tag=’TD’>
<F:PAT tag=’FONT’ name=’monthyear’ xml noid  ></F:PAT>
</F:PAT>
</F:PAT>
<F:PAT tag=’TR’>
<F:PAT tag=’TD’ count=1></F:PAT>
<F:PAT tag=’TD’>
<F:PAT tag=’FONT’ name=’monthday’ onparse=”mymonthday = trim(tag_contents(document.this))” xml noid></F:PAT>
</F:PAT>
<F:PAT tag=’TD’ count=N></F:PAT>
</F:PAT>
<F:PAT tag=’TR’></F:PAT>
<F:PAT tag=’TR’ name=”timeslot” count=N noid xml quiet>
<F:PAT tag=’TD’>
<F:PAT tag=’FONT’ match=’istime’ name=’time’ onparse =”append_contents(‘ <%/mymonthday%>’)” xml noid></F:PAT>
</F:PAT>
<F:PAT tag=’TD’>
<F:PAT tag=’FONT’ name=’name’ xml noid></F:PAT>
</F:PAT>
<F:PAT tag=’TD’ count=N></F:PAT>
</F:PAT>
<F:PAT tag=’TR’ count=n ></F:PAT>
</F:PAT>
</F:PATTERN>

Example Of First Generation MetaWrap pattern. This pattern was part of a larger set of patterns that could take a Cable TV Station playlist Excel file saved as HTML and convert it from a set of columns to a linear XML programming schedule schedule schema which was then processed and added to a database. This was developed late 1998,. Early 1999

<pattern name=”spracipat”>
<pat name=”day”>
<pat tag=”dt” lasttag=”dd” unary=”true” text=true name=”dayname” />
<pat name=”event” count=”N”>
<pat tag=”b” lastflex=”true” text=”true” name=”name” xml=”true” />
<pat tag=”br” lasttag=”hr” lastflex=”true” text=true name=”content” unary=”true”/>
</pat>
</pat>
</pattern>  

Example Of Second Generation MetaWrap pattern.

Here is an example of the pattern in use. Looks remarkably like XAML huh? 🙂 Not bad for something built in late 1999.

screenshot1

Here is screenshot for a tool developed by the MetaWrap project for generating those patterns.

screenshot2

Posted in XPath | Leave a comment

Old MetaWrap Demo

Found an an early draft of some concept materials for the original MetaWrap Project “Fish” (1998-2001) that could turn any existing HTML website into an XML data-source by the application of the MetaWrap pattern language.

This single XML file recursively defined a set of XML pages that were scraped from the existing website. These XML pages could be dynamically transcoded into any target platform that had a supported plugin. Target platforms were XML, HTML, WAP (over a hundred mobile phone profiles) and OpenTV were supported.

Created GUI tools for selecting sections of the website. The last version we created enabled you to mark sections off in internet explorer and automatically create the appropriate pattern. Was well on the way to creating a tool that could analyse site structure and automatically generate a robust pattern before “The Accident” blunted the worlds desire for big wacky ideas 🙂

Click here then click on choose the “Developer Demo”.  This demo is of the first generation of the GUI and its rather rough, but it gets the point across.

Screenshot1

Screenshot2

Posted in MetaWrap Server, Nostalgia for Misspent Youth, Rants | 4 Comments

More Blasts From The Past

media_prSMH_TechGen.jpg

Names wrong on the one below.

media_prnetGuide_mag1996.jpg

Posted in Nostalgia for Misspent Youth | Leave a comment