Today's Question:  What does your personal desk look like?        GIVE A SHOUT

Valid JavaScript variable names

  Mathias Bynens        2012-02-22 05:16:53       9,044        0    

Did you know var π = Math.PI; is syntactically valid JavaScript? I thought this was pretty cool, so I decided to look into which Unicode glyphs are allowed in JavaScript variable names, or identifiers as the ECMAScript specification calls them.

Reserved words

The ECMAScript 5.1 spec says:

An Identifier is an IdentifierName that is not a ReservedWord.

The spec describes four groups of reserved words: keywords, future reserved words, null literals and boolean literals.

Keywords are tokens that have special meaning in JavaScript: break, case, catch, continue, debugger, default, delete, do, else, finally, for, function, if, in, instanceof, new, return, switch, this, throw, try, typeof, var, void, while, and with.

Future reserved words are tokens that may become keywords in a future revision of ECMAScript: class, const, enum, export, extends, import, and super. Some future reserved words only apply in strict mode: implements, interface, let, package, private, protected, public, static, and yield.

The null literal is, simply, null.

There are two boolean literals: true and false.

None of the above are allowed as variable names.

Valid identifier names

Reserved words aren’t allowed to be used as variable names. So, what is allowed?

An identifier must start with $, _, or any character in the Unicode categories “Uppercase letter (Lu)”, “Lowercase letter (Ll)”, “Titlecase letter (Lt)”, “Modifier letter (Lm)”, “Other letter (Lo)”, or “Letter number (Nl)”.

The rest of the string can contain the same characters, plus any U+200C zero width non-joiner characters, U+200D zero width joiner characters, and characters in the Unicode categories “Non-spacing mark (Mn)”, “Spacing combining mark (Mc)”, “Decimal digit number (Nd)”, or “Connector punctuation (Pc)”.

That’s it, really. There are a few things to note, though:

Unicode escape sequences are also permitted in an IdentifierName, where they contribute a single character. […] A UnicodeEscapeSequence cannot be used to put a character into an IdentifierName that would otherwise be illegal.

This means that you can use var \u0061 and var a interchangeably. Similarly, since var 1 is invalid, so is var \u0031.

For web browsers, there is an exception to this rule, namely when reserved words are used. Browsers support identifiers that unescape to a reserved word, as long as at least one character is escaped using a Unicode escape sequence. For example, var var; wouldn’t work, but e.g. var v\u0061r; would — even though strictly speaking, the ECMAScript spec disallows it. Subsequent use of such identifiers must also have at least one character escaped (otherwise the reserved word will be used instead), but it doesn’t have to be the same character(s) that were originally used to create the identifier. For example, var v\u0061r = 42; alert(va\u0072); would alert 42.

Two IdentifierNames that are canonically equivalent according to the Unicode standard are not equal unless they are represented by the exact same sequence of code units.

So, ma\xf1ana and man\u0303ana are two different variable names, even though they’re equivalent after Unicode normalization.

ECMAScript implementations may recognize identifier characters defined in later editions of the Unicode Standard. If portability is a concern, programmers should only employ identifier characters defined in Unicode 3.0.

Examples

The following are all examples of valid JavaScript variable names.

// How convenient!
var π = Math.PI;

// Sometimes, you just have to use the Bad Parts of JavaScript:
var ಠ_ಠ = eval;

// Code, Y U NO WORK?!
var ლ_ಠ益ಠ_ლ = 42;

// How about a JavaScript library for functional programming?
var λ = function() {};

// Obfuscate boring variable names for great justice
var \u006c\u006f\u006c\u0077\u0061\u0074 = 'heh';

// …or just make up random ones
var Ꙭൽↈⴱ = 'huh';

// While perfectly valid, this doesn’t work in most browsers:
var foo\u200cbar = 42;

// This is *not* a bitwise left shift (`<<`):
var 〱〱 = 2;
// This is, though:
〱〱 << 〱〱; // 8

// Give yourself a discount:
var price_9̶9̶_89 = 'cheap';

// Fun with Roman numerals
var â…£ = 4,
   
â…¤ = 5;
â…£ + â…¤; // 9

// Cthulhu was here
var Hͫ̆̒̐ͣ̊̄ͯ͗͏̵̗̻̰̠̬͝ͅE̴̷̬͎̱̘͇͍̾ͦ͊͒͊̓̓̐_̫̠̱̩̭̤͈̑̎̋ͮͩ̒͑̾͋͘Ç̳͕̯̭̱̲̣̠̜͋̍O̴̦̗̯̹̼ͭ̐ͨ̊̈͘͠M̶̝̠̭̭̤̻͓͑̓̊ͣͤ̎͟͠E̢̞̮̹͍̞̳̣ͣͪ͐̈T̡̯̳̭̜̠͕͌̈́̽̿ͤ̿̅̑Ḧ̱̱̺̰̳̹̘̰́̏ͪ̂̽͂̀͠ = 'Zalgo';

Some of these don’t work in all browsers/environments — at least, not yet. See WebKit/JavaScriptCore bug #78908, Chrome/v8 bug #1958, Internet Explorer/Chakra bug #725622, and Opera/Carakan bug DSK-357714 (already fixed in trunk). I haven’t been able to find any bugs in Firefox/SpiderMonkey (in recent nightlies, that is).

JavaScript variable name validator

Even if you’d learn these rules by heart, it would be virtually impossible to memorize every character in the different Unicode categories that are allowed. If you were to summarize all these rules in a single ASCII-only regular expression for JavaScript, it would be 11,335 characters long.

For that reason, I created mothereff.in/js-variables, a tool that makes it easy for you to check if a given string is a valid variable name in JavaScript.

JavaScript variable name validator

If a valid variable name is entered, the tool checks if the browser you’re using handles the identifier correctly. If not, it will show a warning, encouraging you to file a browser bug.

The validator will warn you if an ECMAScript 3 reserved word (that isn’t a reserved word anymore) is entered. Try char, for example.

This tool uses the Unicode 6.1 character database. Of course, not all JavaScript engines have the same level of Unicode support yet. As the spec says:

ECMAScript implementations may recognize identifier characters defined in later editions of the Unicode Standard. If portability is a concern, programmers should only employ identifier characters defined in Unicode 3.0.

For this reason, the validator will warn you if a variable name is invalid when the Unicode 3.0 database is used. Try \u0cf1, for example.

Source:http://mathiasbynens.be/notes/javascript-identifiers

JAVASCRIPT  STANDARD  NAME CONVENTION 

Share on Facebook  Share on Twitter  Share on Weibo  Share on Reddit 

  RELATED


  0 COMMENT


No comment for this article.