This is completely valid C code

Would you believe me if I told you the following code is completely valid in C?

??=define lksjdafoaidsufoisdfu int main(void)
??=include <stdio.h>

// a multiline ??/
   comment? Ha!


lksjdafoaidsufoisdfu
  ??<
  /??/
* completely valid! *??/
/

  printf("test??/n");

  // ??/
  printf("don't output??/n");

  printf("confused??!??/n");

  return 0;
  ??>

Now, granted, it doesn’t actually, do much. All it does is print this to stdout:

test
confused|

It doesn’t look anything like typical C code, so how could it possibly be valid?

This code makes heavy use of trigraphs, which are specific sequences of three characters representing a single character (hence “trigraph”). These trigraphs are processed on the preprocessor‘s first pass, substituting the trigraphs before executing any of the preprocessor directives.

Trigraphs are translated according to this table:

Trigraph Equivalent
??= #
??/ \
??' ^
??( [
??) ]
??! |
??< {
??> }
??- ~

The sequence ?? is used because it isn’t used in any valid C operators. These trigraphs may seem useless, but they’re still supported; some text editors may reserve the use of some characters, or perhaps some keyboards may not have the characters. Furthermore, some character sets lack the necessary characters.

Given the table, the code now looks like this after the preprocessor substitutes the trigraphs:

#define lksjdafoaidsufoisdfu int main(void)
#include <stdio.h>

// a multiline \
   comment? Ha!


lksjdafoaidsufoisdfu
  {
  /\
* completely valid! *\
/

  printf("test\n");

  // \
  printf("don't output\n");

  printf("confused|\n");

  return 0;
  }

This looks a little more like C. (Yes, I use Whitesmiths style. Get over it). In C99 (and perhaps earlier), a single line can be broken up into multiple lines by escaping the newline character with a backslash. The code is now parsed as

#define lksjdafoaidsufoisdfu int main(void)
#include <stdio.h>

// a multiline comment? Ha!


lksjdafoaidsufoisdfu
  {
  /* completely valid! */

  printf("test\n");

  // printf("don't output\n");

  printf("confused|\n");

  return 0;
  }

Looking at that, it makes a lot of sense why "don't output\n" was not printed: the command was effectively commented out.

A well-known C trick is to use the #define preprocessor directive to make code clearer. For example, #define LOOP_FOREVER for(;;) allows you to use LOOP_FOREVER instead of for(;;) to delimit an infinite loop, making your code somewhat clearer. At the same time, #define can be used for some crazy obfuscation. Look:

#define true 0
#define false 1

(In proper C, a value of 0 indicates false)

Here, my use of #define isn’t that bad. I only substitute lksjdafoaidsufoisdfu for int main(void). Making the substitution, the code now looks like this:

#include <stdio.h>

// a multiline comment? Ha!


int main(void)
  {
  /* completely valid! */

  printf("test\n");

  // printf("don't output\n");

  printf("confused|\n");

  return 0;
  }

That looks perfectly normal. Useless program, but it works.


Links

Advertisements
5 comments
  1. aa said:

    you should do an entire post thats in some sort of programming language

  2. It’s interesting how code could be so weird and still works.

`$name' says...

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: