RESS

  • impl Iterator for Scanner
  • Converts text into Tokens
  • Flat Structure

Before we start on any examples let's dig a little into what ress does. The job of a scanner or tokenizer in the parsing process is to convert raw text or bytes into logically separated parts called tokens and ress does just that. It reads your JavaScript text and then tells you what a given word or symbol might represent. It does this through the Scanner interface, to construct a scanner you pass it the text you would like it to tokenize.


# #![allow(unused_variables)]
#fn main() {
    let js = "var i = 0;";
    let scanner = Scanner::new(js);
#}

Now that you have prepared a scanner, how do we use it? Well, the Scanner implements Iterator so we can actually use it in a for loop like so.


# #![allow(unused_variables)]
#fn main() {
    for token in scanner {
        println!("{:#?}", token);
    }
#}

If we were to run the above program it would print to the terminal the following.

Item {
    token: Keyword(
        Var
    ),
    span: Span {
        start: 0,
        end: 3
    }
}
Item {
    token: Ident(
        Ident(
            "i"
        )
    ),
    span: Span {
        start: 4,
        end: 5
    }
}
Item {
    token: Punct(
        Assign
    ),
    span: Span {
        start: 6,
        end: 7
    }
}
Item {
    token: Numeric(
        Number(
            "0"
        )
    ),
    span: Span {
        start: 8,
        end: 9
    }
}
Item {
    token: Punct(
        SemiColon
    ),
    span: Span {
        start: 9,
        end: 10
    }
}
Item {
    token: EoF,
    span: Span {
        start: 10,
        end: 10
    }
}

The scanner's ::next() method returns an Item which has 2 properties token and span. The span is the byte index that starts and ends the token, the token property is going to be one variant of the Token enum which has the following variants.

  • Token::Boolean(BooleanLiteral) - The text true or false
  • Token::Ident(Ident) - A variable, function, or class name
  • Token::Null - The text null
  • Token::Keyword(Keyword) - One of the 42 reserved words e.g. function, var, delete, etc
  • Token::Numeric(Number) - A number literal, this can be an integer, a float, scientific notation, binary notation, octal notation, or hexadecimal notation e.g. 1.5e9, 0xfff, etc
  • Token::Punct(Punct) - One of the 52 reserved symbols or combinations of symbols e.g. *, &&, =>, etc
  • Token::String(StringLit) - Either a double or single quoted string
  • Token::RegEx(RegEx) - A Regular Expression literal e.g. /.+/g
  • Token::Template(Template) - A template string literal e.g. one ${2} three
  • Token::Comment(Comment) - A single line, multi-line or html comment

For a more in depth look at these tokens, take a look at the Appendix

Overall the output of our scanner isn't going to provide any context for these tokens, that means when we are building our development tools it is going to be a little harder to figure out what is going on with any given token. One way we could take that is to just build a tool that is only concerned with the token level of information. Say you work on a team of JavaScript developers that need to adhere to a strict code style because the organization needs their website to be usable in Internet Explorer 8. With that restriction there are a large number of APIs that are off the table, looking over this list we can see how big that really is. It could be useful to have a linter that will check for the keywords and identifiers that are not available in IE8. let's try and build one.