CSS Lexer

November 27th, 2010. Tagged: CSS

I have so much stuff to do and I've been feeling a little overwhelmed lately. Not depressed, because it's next to impossible to be depressed at a climate including 320 sunny days a year and a beach. So I thought why not drop everything and relax. I'm currently staying at home, enjoying my unused vacation days. So no work, no meetings, no nothing. I thought I should relax by taking on a task that requires some degree of concentration as opposed to just jumping from one task to the next.

I have ideas for a bunch of CSS related tools and utilities, all of which, naturally, require understanding of CSS code. So I need a parser and thought I should write one in JavaScript.

The first step is a lexer scanner and I'm happy to share what I committed to github today. It's right here. I called it cssex (yep, cheesy but I didn't want to spend any time thinking of a proper name).

It's not doing much currently but it's a step. It takes a piece of CSS code and tokenizes it producing the following token types:

  • comment
  • string
  • white (spaces or tabs)
  • line (new lines)
  • identifier (could be anything, such as a property or value or font name)
  • number
  • match (5 combinations of two operators used in attribute matching such as ^=)
  • operator - such as . # % * and so on

You can see a test page here and I'll be happy to hear any bug reports. The test page takes CSS, tokenizes it, then recreates the source from the tokens to compare that the original is reproducible. It also highlights the different tokens in different colors and finally dumps the types and values of the tokens (the complete dump is in console.log)

As you can see, I'm continuing with the cheesiness:

  • foreplay.html is the test page (instead of "playground")
  • test-osterone.js (instead of simply "test") is the test runner that uses JavaScriptCore
  • penthouse.sh (instead of "suite") runs tests with the 213 CSS files from CSSZenGarden.com
  • sex.js is the lexer itself which defines the global CSSEX object with two methods - lex(source) and its opposite toSource(tokens)

So what's next is a proper parser validating those tokens produced by the lexer. Then the tools such as a minifier, highlighter, lint, and whatnot (for example something that will add automatically all -moz- and -o- and stuff to your border-radius). But first I need to draw me some railroad diagrams like those Douglas Crockford has for JavaScript and JSON, they should be immensely helpful when parsing. As you can probably guess, Crockford's JSlint and JSON parser and his writeup on Pratt's top down operator precedence is my source of "view source" 🙂

My main motivation behind all this (other than the itch) is a proper minifier written in JavaScript (therefore running everywhere), not just a collection of regular expressions that YUICSSmin is right now. Also a proper validator, one that understands the nature of the frontend beast and can handle everything from CSS2.1, CSS3's media queries, transitions, latest -webkit and -moz craziness all the way down to IE hacks, expressions, behaviors and filters. And everything in between. Because more often than not we don't validate CSS simply due to the w3c validator being too strict and out of touch with reality.

Tell your friends about this post on Facebook and Twitter

Sorry, comments disabled and hidden due to excessive spam.

Meanwhile, hit me up on twitter @stoyanstefanov