Polls

How Is My Site?

View Results

Loading ... Loading ...
June 11th, 2008 Code-theory none Comments

Maximal Munch Problems

1.Ever tried evaluating this expression?

  1. ++++p->*mp

2.Ever dealt with the “Sergeant operator”?

  1. template
  2. class R {
  3.    // . . .
  4.    friend ostream &amp;operator <<< // a sergeant operator?
  5.        T >( ostream &amp;, const R &amp; );
  6. };

3.Have you ever wondered whether the following expression is legal?

  1. a+++++b

Welcome to the world of maximal munch.

In one of the early stages of C++ translation, the portion of the compiler that performs “lexical analysis” has the task of breaking up the input stream into individual “words,” or tokens. So when the lexical analyzer encounters a character like (->*), it might reasonably identify three tokens (-, >, and *), two tokens (-> and *), or a single token (->*)! However to avoid this ambiguous state of affairs, the lexical analyzer has been taught to identify the lingest sequence, thus consuming as many characters as it legally can. The situation :: a maximal munch!

The expression a+++++b is illegal, because it’s tokenized as a ++ ++ + b, and it’s illegal to post-increment an rvalue like a++.
If you had wanted to post-increment a and add the result to a pre-incremented b, you’d have to introduce at least one space as in: a+++ ++b. If you are writing a code which should be reusable, even though it’s not strictly necessary you should consider including one more space as: a++ + ++b, and the last and best is the addition of a few parentheses: (a++) + (++b).

Maximal munch solves many more problems than it causes, but in two common situations, it’s an annoyance. The first is in the instantiation of templates with arguments that are themselves instantiated templates. For example, using the standard library, one might want to declare a list of vectors of strings:

  1. list> lovos; // error!

Unfortunately, the two adjacent closing angle brackets in the instantiation are interpreted as a shift operator, and we’ll get a syntax error. Addition of whitespaces fixes the problem ::

  1. list< vector > lovos;

Another situation - default argument initializers for pointer formal arguments ::

  1. void process( const char *= 0 ); // error!

This declaration is attempting to use the *= assignment operator in a formal argument declaration.That is a syntax error. This problem comes under the “wages of sin” category. It wouldn’t have happened if the author of the code had given the formal argument a name. Not only is such a name some of the best documentation one can provide, its presence would have made the maximal munch problem impossible:

  1. void process( const char *processId = 0 );

No Responses to “Maximal munch”

No comments yet

Leave a Reply

 

RSS