Friday, May 13, 2011

My first scripting language

I've always enjoyed tinkering with scripting languages, especially after years of working with/on Unrealscript for the Unreal Engine, and so at some point in the last few months (March 2nd was the first changelist it looks like) I decided to take a stab at creating my own language. It's been a fun challenge filled with many rewrites already and there's a tremendous amount of work left before I think I'll start seeing real benefits from the effort, but it's been quite rewarding and educational regardless.

So far the feature list is pretty mundane - simple math operations, declaring/calling functions (both 'script-only' and mapping to C++ functions), setting context through member properties on objects, etc. Up until yesterday there was no serialization either as I was just entering expressions manually in the in-game console after every compile and sifting through the contents of my debug log to see where things went wrong (or surprisingly right). Now that I've built a reasonable foundation for parsing and execution however I've taken a step back to improve usability, probably most important being the introduction "useful" error messages from the parser.

To keep it interesting I've been purposely building this from scratch with no significant prior knowledge on a good way to build a good parser/interpreter. My plan is to get to a working state that I'm happy with and then go read a few compiler design books to see where I landed, and discover a few ways to improve it further for the next attempt. This obviously doesn't lead to finished results as quickly but it does keep it entertaining and satisfying as I find myself learning quite a bit.

typedef bool (*ParseFunc)(TString &Token, ParseContext *Context, ExprStream *Stream);
static ParseFunc StatementParsers[] =
{
 &Declare_Function::Parse,
 &Call_Function::Parse,
 &Context_Assign::Parse,
 NULL,
};

bool Parser::ParseStatement(TString &Token, ParseContext *Context, ExprStream *Stream)
{
 Token.TrimWhiteSpace();
 if (Token.Length() == 0)
 {
  return false;
 }
 bool ParsedToken = false;
 int ParserIdx = 0;
 while (StatementParsers[ParserIdx] != NULL)
 {
  if (!StatementParsers[ParserIdx](Token,Context,Stream))
  {
   if (Errors.Num() > 0)
   {
    // fatal error, abort
    return false;
   }
   ParserIdx++;
  }
  else
  {
   return true;
  }
 }
 return false;
}

My first 'recursive' attempt yielded simple results very quickly (i.e. '1+2') but fell apart once I needed to start re-arranging or inserting expressions elsewhere in the stream for execution. Especially once I started supporting automatic type conversions and simple type deduction things started to get messy very quickly, and so I scrapped that approach for a new one that attempted to parse the entire token linearly. At some point that became frustrating as well, so I ended up with the current approach which is a strange hybrid of the two previous attempts.

     Parse: Parsing token 'func test_a()'...
     Parse: Attempting to declare function 'test_a'...
     Parse:    Parsing token '{'...
     Parse:    Parsing token 'result = "hello"'...
     Parse:    Parsing assignment statement...
     Parse:        Parse expression from 'result ', expected type 'Undefined'...
     Parse:            Add expr 'set_context: prop -> Result' at [0]
     Parse:        Add expr 'noop' at [1]
     Parse:        Parse expression from ' "hello"', expected type 'String'...
     Parse:            Add expr 'const_Str: hello' at [2]
     Parse:        Insert expr 'assign' at [0]
     Parse:    Parsing token '}'...
     Parse:    Declared function 'test_a' for type 'Environment'
     Parse: Parsing token 'test_a()'...
     Parse: Add expr 'func: test_a' at [0]
     Parse: Calling function 'test_a' of type 'Undefined'...

One thing that has been a surprise with building a language is how once you get all the puzzle pieces in the correct places you've accidentally unlocked a large amount of functionality. With functions for example it sort of went like this -

  • Write initial simple declaration parser
  • Debug a few test parsing cases and work out the kinks
  • Write the initial simple call function parser
  • Debug a few test cases as well, test out simple execution
  • Go back and add support for declaring and calling functions with parameters
  • Add support for return types/values
  • Debug a while, track down a case where you're corrupting the stack, debug some more
  • Holy shit, can now call functions that call functions with other functions as parameters, etc :)

Like I said, I've got a large list of work items ahead of me before I can start seeing benefits in the game, but the potential is great. I'm excited about the prospects of being able to edit behaviors on the fly, defining new item/creature/whatever types on the fly and build the game in real-time within the game. It's definitely been a long detour from actually building the meat of the game however...