4 kyu

Simplexer

249 of 749Mihail-K

Description:

The Challenge

You'll need to implement a simple lexer type Simplexer, which, when constructed with a given string containing an expression in a simple language, transforms that string into a stream of Tokens.

Simplexer

Your Simplexer type is created with the expression it should tokenize. It should act like an iterator, yielding Token items until there are no more items to yield, at which point it should do whatever the appropriate action is for your chosen language.

Instances of the Simplexer class are initialized with a string and should be iterators as well as iterable, i.e. they must implement both __iter__ and __next__.

Like all iterators, __next__ should raise a StopIteration exception when no more tokens remain to be yielded.

Tokens

Tokens are represented by Token objects, which are preloaded for you and take the following shape:

class Token:
    def __init__(self, text: str, kind: str):
      self.text = text
      self.kind = kind
  • Token.text is the value of the matched portion of the expression
  • Token.kind is the type of the token (see below)

Language Grammar

The language for this task has a simple grammar, consisting of the following constructs and their associated token types:

Type         Construct

integer:     Any sequence of one or more decimal digits (leading zeroes allowed, no negative numbers)

boolean:     Any of the following words: [true, false]

string:      Any sequence of zero or more characters surrounded by "double quotes"

operator:    Any of the following characters: [+, -, *, /, %, (, ), =]

keyword:     Any of the following words: if, else, for, while, return, func, break

whitespace:  Any sequence of the following characters: [' ', '\t', '\n']
             - Consecutive whitespace should be collapsed into a single token

identifier:  Any sequence of alphanumeric characters, as well as '_' and '$'
             - Must not start with a digit
             - Make sure that keywords and booleans aren't matched as identifiers

Notes

  • Individual constructs are disambiguated by whitespace if necessary, so

    • true123 is an identifier, as opposed to boolean followed by integer
    • 123true is an integer followed by boolean
    • "123"true is a string followed by boolean
    • x+y is identifier op identifier
  • Any character is permissable between double quotes, including keywords, numbers and arbitrary whitespace, so "true" and "123" are strings. The quotes "" are to be included in the Token.

  • The input strings are guaranteed to be lexically valid according to the grammar above. Specifically:

    • Input will consist only of valid constructs that can be mapped unambiguously to one of the above tokens
    • No assumptions need be made regarding the structure of tokens in the input, i.e. syntax.
    • Input may be the empty string

    That means the input will not contain any surprising characters, there is no need for error handling, and quotes will always appear in balanced pairs. This does not mean that the input needs to make semantic or syntactic sense. For example, if 123) return else"five")( is valid input for this task.

    After all, the job of a lexer is not to interpret the given input, merely transform it into tokens that could then be passed on to e.g. a parser, which would then check that the tokens received are syntactically valid and imbue them with semantics.

Strings
Parsing
Algorithms

More By Author:

Check out these other kata created by Mihail-K

Stats:

CreatedJan 15, 2015
PublishedJan 15, 2015
Warriors Trained5753
Total Skips2599
Total Code Submissions6712
Total Times Completed749
Java Completions208
JavaScript Completions169
Python Completions249
C# Completions109
Rust Completions42
Total Stars185
% of votes with a positive feedback rating91% of 192
Total "Very Satisfied" Votes167
Total "Somewhat Satisfied" Votes17
Total "Not Satisfied" Votes8
Total Rank Assessments19
Average Assessed Rank
4 kyu
Highest Assessed Rank
3 kyu
Lowest Assessed Rank
6 kyu
Ad
Contributors
  • Mihail-K Avatar
  • ChristianECooper Avatar
  • joecastle Avatar
  • myjinxin2015 Avatar
  • Blind4Basics Avatar
  • Voile Avatar
  • Awesome A.D. Avatar
  • hobovsky Avatar
  • Mednoob Avatar
  • XoRMiAS Avatar
Ad