Wednesday, June 30, 2010

JavaScript Random Hints

Update some point has been made more clear, thanks to Dmitry Soshnikov for suggestions.


Forget the global undefined


Too many developers relies into undefined variable in ES3, and all they should do is to set undefined = true on global scope and see if the application or all unit tests break or not. I am going to demonstrate how simple is, at least in ES3, to redefine by mistake the global undefined.

// inside whatever closure/scope
function setItem(key, value) {
this[key] = value;
}

// later in the scope, setItem may be reused
// through call or apply for whatever object
setItem.call(myObject, myKey, "whatever value");

// if for some reason the first argument used as context
// is undefined, the "this" will point to the global context

// if for some reason the second argument used as key
// is undefined, the accessor will cast it as undefined string

// the result of latter call with these two
// common or simple conditions
// is the quivalent of:

window.undefined = "whatever value";

// where window["undefined"] or window[undefined]
// are the equivalent of window.undefined

undefined; // "whatever value"

Here a list of side effects every time we deal with undefined:

  • it is not safe to compare variables against undefined since it is simply implicitly declared as unassigned variable, as var u; could be, but in the global scope

  • it is not possible to minify it since it is a well known variable

  • its access is slow, since requires scope lookup potentially up to the global one every time we write that bloody variable name in our code, wherever it is


Here is a list of best practices to avoid the usage of undefined:

  • define your own undefined variable, if necessary, via var undefined;, and only if you are sure that nothing can change it's value in the current scope (e.g. eval)

  • the first point will allow minifiers/compilers to shrink the undefined variable into, possibly, one char, so it is size safe

  • if null value can be considered undefined as well, where both undefined and null do not support accessors such unknown.stuff, compare the potential undefined variable against null, since by specs null == null && null == undefined && null != 0 && null != "" && null != NaN && null != false && null != whateverThatIsNotNullOrUndefined


About latter point, null is a constant so no lookup is performed since we cannot re-assign the null value and every time somebody tells you something like: "doooode, JSLint is complaining about that 'v == null'" simply tell him that JSLint is suggesting a bad practice and point this person to this post :D
About typeof v === "undefined" ? Bullshit! a typeof operation with an eqeqeq against a string that cannot be minified ... are you a programmer that knows the language or you think JSLint, as automation tool, is the bible? In the second case I have already posted JSLint: The Bad Part why this tool is not always ideal: have a look!
It must be told that in ES5 the global undefined won't be enumerable/writable/configurable anymore, and that showed example will fail since this reference, when null or undefined is passed through call/apply, will be null as well (errors).

Cache the bloody variable or namespace !


It does not matter how fast and cool are nowadays CPUs, it's about common sense.
If you spot something like:

my.lib.utils.Do.stuff(some);
my.lib.utils.Do.stuff(thing);

fix it ASAP!
This is a list of side effects caused by duplicated access for whatever it is:

  • a namespace requires a lookup usually up to the global scope, this costs time behind the scene

  • minifiers/compilers cannot optimize anything so far since properties cannot be shrinked so this technique is bigger application size prone

  • getters are always invoked, and 99.9% of the time this is not what we are looking for. JavaScript has a beautiful and easy interface exposed to developers but behind the scene there are 90% of the time getters which means slower performances for everybody.


About latter point, we can simply check, from one of the fastest browsers in the market, how much a simple node.childNodes[0] could cost.
If you are not familiar with C++, just imagine this piece of JavaScript every time we access an index of some Array:

Array.prototype.item = function (index) {
var
undefined,
pos = 0,
// useless if we return lastItem but for some reason there ...
n = this.slice(0, 1)
;
// optimized for multiple access with the same index
if (this._isItemCacheValid) {
if (index == this._lastItemOffset)
return this._lastItem;

var
diff = index - this._lastItemOffset,
dist = Math.abs(diff)
;
if (dist < index) {
n = this._lastItem;
pos = this._lastItemOffset;
}
}
if (this._isLengthCacheValid) {
if (index >= this._cachedLength)
return undefined;

var
diff = index - pos,
dist = Math.abs(diff)
;
if (dist > (this._cachedLength || (this._cachedLength = this.length)) - 1 - index) {
n = this[this._cachedLength - 1];
pos = this._cachedLength - 1;
}
}
if (pos <= index) {
while (n && pos < index) {
n = this[pos];
++pos;
}
} else {
while (n && pos > index) {
n = this[pos];
--pos;
}
}
if (n) {
this._lastItem = n;
this._lastItemOffset = pos;
this._isItemCacheValid = true;
return n;
}
return undefined;
};

Array.prototype._lastItemOffset = 0;

[1,2,3].item(0);

Now, consider above code against what we usually do which is simply arr[0] ... and consider that this is just the single item access for a ChildNodeList collection ... how many other operations we want to perform through DOM searches, namespaces, Array access, etc etc? Cache It Whenever It Is Possible!, and this should be the point number one in every "performances oriented" article or book.

The only thing to consider when we cache are object methods, if we "de-context" a method that use this reference inside, we can simply cache the object, it is going to be enough, but if we access a property twice, as often happens with domNode.style property, as example, cache it!

// my.name.space.Do.stuff is a method
// of my.name.space.Do where this is used

// WRONG
var stuff = my.name.space.Do.stuf;
stuff(); // global this in ES3, error in ES5

// BETTER
var Do = my.name.space.Do;
Do.stuff();


Use the in operator


For the same getter/access reason, this classic check can be harmful:

if (someObject.property) {
// do stuff with property
}

Specially if we are dealing with host objects, some access could cause errors (e.g. (domNode || unknown).constructor in IE or similar operations) while a classic:

if ("property" in object) {
// do stuff with object.property
}

can "save the world" since we do not access the property but we simply check if it is accessible ... a tiny difference extremely important and fast in any case.

Avoid redundant Function Expressions


We are kinda lucky here, since functions as first class objects, are truly fast to create in JavaScript. These do not require a class to be used, neither an object or special tricks, these are simply variables able to be invoked executing what has been defined inside their body through an activation context process, plus named arguments, the length of these arguments, the name of the function, if any, plus arguments variable if accessed in the body, and "almost nothing else" ... but we can already get the fact functions do not come for free, do we?

Here there are a couple of function expression common mistakes.

Closure inside a Loop



// WRONG
// the classic way to avoid
// unexpected behavior on lazy evaluation
// the equivalent of 20 functions
for (var i = 0; i < 10; ++i) {
(function (i) {
setTimeout(function () {
alert(i);
}, 15);
}(i)); // trap it!
}

// BETTER
// 11 functions rather than 20
// same behavior, better performances
for (var
getTimeout = function (i) {
return function () {
alert(i);
};
},
i = 0; i < 10; ++i
) {
setTimeout(getTimeout(i), 15);
}


Array.extras Misunderstood



// WRONG
// a new expression for each Array.extra operation
// a lookup to access another this reference
var self = this;
what.forEach(function (value, index, what) {
// do something
self[index] = value;
});
ever.forEach(function (value, index, what) {
// do something
self[index] = value;
});

// BETTER
// 1 function against N
// this reference through the native interface
// easier to debug/maintain/improve/change
function forThisEachCase(value, index, what) {
this[index] = value;
}
what.forEach(forThisEachCase, this);
ever.forEach(forThisEachCase, this);


Use Natives !!!


Newcomers are lazy, it does not matter if they are noob or they have 10 years of Java, PHP, Python, C#, or Ruby over their shoulders, they will always look for a framework able to do truly simple stuff for the simple reason that they don't know/get yet JavaScript which is different from every other common programming language. This is the best starting point to slow down every little operation.
Many frameworks offer classes, mixins, native wrapper which aim is often the one to invert arguments for whatever reason simplifying operations (e.g. the classic $.eash in jQuery which is making junior developers think that the native forEach will pass the index as first argument and this as current reference).
If three lines based on native prototypes/functionality are more than 1 magic method call, go for it!
Specially if standard, natives will never change while libraries are constantly improving and APIs changing as well for whatever valid reason.
If you need a for in loop, do the for in knowing what you are doing, ignoring JSLint if necessary 'cause you are dealing with objects that inherits from objects and you may be interested into inherited properties/methods as well.
If the problem is the list of property, we can always create safer ways to interact with what we would like to enumerate, as example:

var SafeLoop = (function (id) {
function SafeLoop() {
this[id] = [];
}
SafeLoop.prototype.keys = function() {
return this[id];
};
SafeLoop.prototype.enum = function (key) {
var enumerable = this.keys();
enumerable.push.apply(
enumerable,
typeof key !== "object" ? arguments : key
);
return this;
};
return SafeLoop;
}(Math.random()));

var o = new SafeLoop().enum("a", "b");
o.a = o.b = o.c = o.d = 123;
o.e = 456;

// enum accepts N arguments or an array
o.enum(["c", "d"]);

// fast and safe, without an hasOwnProperty call for each item
for (var key = o.keys(), i = key.length; i--;) {
alert([key[i], o[key[i]]]);
// d, c, b, and a with 123
}

// extend the prototype if necessary
SafeLoop.prototype.forEach = function (callback, context) {
for (var
enumerable = this.keys(),
i = 0, length = enumerable.length,
key;
i < length; ++i
) {
key = enumerable[i];
callback.call(context, this[key], key, this);
}
};

o.forEach(function (value, key, o) {
alert([value, key, o]);
// 123,a|b|c|d,[object Object]
});

Above code is just an example "AS IS" and there are many part to improve. The concept is that JavaScript allows us to define what we need in such simple way and most of the time we don't want to include and "move" a whole framework to do something simple as loops are, as example, do we? If we do, well, we are creating redundant function expressions, including extra bytes for just some extra functionality, and potentially making the application slower ... remember: we are in the mobile era, CPUs are not those you have in your MacBook Pro and frameworks should be used only when we have real benefits, e.g. selector engines or much more complicated methods. Do you agree?

Thursday, June 3, 2010

WebSocket Handshake 76 Simplified

update
there was a superfluous CR+LN with char 0x00 that was causing buffer troubles, now fixed



I am working during my free time (... recently extremely hard to have ...) over a little project that I'd like to show at the Front Trends event this October and WebSocket is the key of this project.

While 2 days ago I eventually found a way to communicate in few lines of php with a WebSocket, yesterday Chromium blog announced they "simply changed it", causing basically problems to all those projects based over the good old handshake75.

After I have found in Axod's Hack that somebody else had basically my same thoughts, I still could not find any valid example able to do the new handshake ... so here I am with the first draft-ietf-hybi-thewebsocketprotocol-00 php implementation I know, inspired somehow from the go version.


<?php

class WebSocketHandshake {

/*! Easy way to handshake a WebSocket via draft-ietf-hybi-thewebsocketprotocol-00
* @link http://www.ietf.org/id/draft-ietf-hybi-thewebsocketprotocol-00.txt
* @author Andrea Giammarchi
* @blog webreflection.blogspot.com
* @date 4th June 2010
* @example
* // via function call ...
* $handshake = WebSocketHandshake($buffer);
* // ... or via class
* $handshake = (string)new WebSocketHandshake($buffer);
*
* socket_write($socket, $handshake, strlen($handshake));
*/

private $__value__;

public function __construct($buffer) {
$resource = $host = $origin = $key1 = $key2 = $protocol = $code = $handshake = null;
preg_match('#GET (.*?) HTTP#', $buffer, $match) && $resource = $match[1];
preg_match("#Host: (.*?)\r\n#", $buffer, $match) && $host = $match[1];
preg_match("#Sec-WebSocket-Key1: (.*?)\r\n#", $buffer, $match) && $key1 = $match[1];
preg_match("#Sec-WebSocket-Key2: (.*?)\r\n#", $buffer, $match) && $key2 = $match[1];
preg_match("#Sec-WebSocket-Protocol: (.*?)\r\n#", $buffer, $match) && $protocol = $match[1];
preg_match("#Origin: (.*?)\r\n#", $buffer, $match) && $origin = $match[1];
preg_match("#\r\n(.*?)\$#", $buffer, $match) && $code = $match[1];
$this->__value__ =
"HTTP/1.1 101 WebSocket Protocol Handshake\r\n".
"Upgrade: WebSocket\r\n".
"Connection: Upgrade\r\n".
"Sec-WebSocket-Origin: {$origin}\r\n".
"Sec-WebSocket-Location: ws://{$host}{$resource}\r\n".
($protocol ? "Sec-WebSocket-Protocol: {$protocol}\r\n" : "").
"\r\n".
$this->_createHandshakeThingy($key1, $key2, $code)
;
}

public function __toString() {
return $this->__value__;
}

private function _doStuffToObtainAnInt32($key) {
return preg_match_all('#[0-9]#', $key, $number) && preg_match_all('# #', $key, $space) ?
implode('', $number[0]) / count($space[0]) :
''
;
}

private function _createHandshakeThingy($key1, $key2, $code) {
return md5(
pack('N', $this->_doStuffToObtainAnInt32($key1)).
pack('N', $this->_doStuffToObtainAnInt32($key2)).
$code,
true
);
}
}

// handshake headers strings factory
function WebSocketHandshake($buffer) {
return (string)new WebSocketHandshake($buffer);
}

?>


I am pretty sure above code does not need any other comment and methods are "as much semantic as possible", since I completely agree about the Axod point and the fact it's both over engineered and absolutely badly documented via those "specs" ... weird from a company famous for its simplicity concept that maybe this time forgot some KISS approach ...