Friday, December 11, 2009

document.write(shenanigans)

As I have twitted yesterday, I have discovered one of the most weird JavaScript behaviors ever thanks to the document.write method.

The Problem We All Know ...

document.write has been used for ages since it is basically the only way we have to create a blocking script during a page download.
The advantage is that a script included via document.write could write something else in that point of the page and again via document.write.
Classic examples are Google AdSense or others similar services where we would like to put in place an iframe with banners or an external service.
The most known side effect of the document.write method is that if the document has been already loaded (DOMContentLoaded or onload event) it could "destroy" our web page erasing everything and showing nothing except what we wrote via this method. This means that every time we include in our page an external resource, we should be sure this resource will be included synchronously otherwise we could break the whole web site showing "blank scripts" and nothing else.

... And The WTF We Should Be Aware About!

Bear in mind:blocking script does NOT necessary mean synchronous.
There are truly few de facto standards about document.write behavior and the only one that seems to be consistent across different browsers is the script without an external source:

<!DOCTYPE html>
<html>
<head>
<script>
var sequence = [1];
document.write('<script>sequence.push(2);<'+'/script>');
alert(document.getElementsByTagName("script")[1].text); // sequence.push(2);
sequence.push(3);
</script>
<script>
alert(sequence); // 1,2,3
</script>
</head>
<body>
</body>
</html>

Everything seems obvious and normal, right? It's NOT!
We need to do a step backward about the document write method ... what does it exactly do?
As example, this is not a valid markup:

<script>
var a = 1;
<script>
var b = 2;
</script>
var c = 3;
</script>

If we assume that document.write is writing the code inside the current executed script tag we are wrong.
This method is basically able to splice the "download flow" inserting everything we write instantly after the current script node.
In other words, with document write we can perform that task able to crash Internet Explorer: append a node after the last one rather than inserting it before the current one, which is the reason you should never appendChild when you add a script node, as example, while you can always insertBefore, specially for these days where everybody is talking about non-blocking scripts.
Let's go back in the first example, OK?
If we analyze that simple HTML page we can easily spot there is only ONE script node. The hard coded alert is there to show that the browser has already put and evaluated a script node that is after. Virtually speaking, this is how the page looks when we use the document write.

<!DOCTYPE html>
<html>
<head>
<script>
var sequence = [1];
document.write('<script>sequence.push(2);<'+'/script>');
// put there and execute then come back in this execution flow ...
sequence.push(3);
</script>
<script>
sequence.push(2);
</script>
<script>
alert(sequence); // 1,2,3
</script>
</head>
<body>
</body>
</html>

If we strip out the write operation nobody in this world would expect a 1,2,3 result when sequence is alerted.
This is already an inconsistent behavior since we are breaking the execution flow adding nodes sequentially after but executed in line.
What we could have expected is a sort of "break this script node, write the new one here, open and go on with the original script".
Well, this makes sense in a global context flow, but if we where inside a nested closure we cannot split the current node or the code will break and the context spread around ... (metaphorically speaking)

... guess what? what I have said so far is the less interesting part ...

External Source Double WTF!

Against every logic we can still assume that a script written via document.write is both blocking and synchronous but things change "a tiny bit" if we write a script node with an src attribute.

<!DOCTYPE html>
<html>
<head>
<script>
var sequence = [1];
document.write('<script src="test2.js"><'+'/script>');
// how cute, the blocking script is already there
// but it is not blocking the execution flow!
alert(document.getElementsByTagName("script")[1].src); // test2.js
sequence.push(3);
</script>
<script>
// indeed!!!
alert(sequence); // 1,3,2
</script>
</head>
<body>
</body>
</html>

What just happened? First of all, the test2.js file is nothing different from:
sequence.push(2);

but the result could sound a bit weird, isn't it?
As I have commented, the script is still a blocking one, in the meaning that after the current script node its content will be available but never before!

<script>
var sequence = [];
// global id
var id = 1;
// test3.js contains the code
//sequence.push(id);
document.write('<script src="test3.js"><'+'/script>');
// global id re-defined
var id = 2;
document.write('<script src="test3.js"><'+'/script>');
</script>
<script>
alert(sequence); // 2,2
</script>

The non blocking external script involves network traffic so every browser vendor decided that a scope cannot be "paused" that long and every browser I have tested confirms the behavior: the current scope has already finished when the injected script will be parsed and evaluated.
The script execution is something performed in the future, not in the current script scope. To put order in this chaos we need to split the script execution flow

<!DOCTYPE html>
<html>
<head>
<script>
var sequence = [];
var id = 1;
document.write('<script src="test3.js"><'+'/script>');
</script>
<!-- the precedent write will put the blocking node here -->
<script>
var id = 2;
document.write('<script src="test3.js"><'+'/script>');
</script>
<!-- the precedent write will put the blocking node here -->
<script>
alert(sequence); // 1,2 in every browser
</script>
</head>
<body>
</body>
</html>


So Why Bother?

Imagine we are doing everything is possible to do to use best practices to speed up our page, avoid network calls, pre-optimize, pre-compile, pre-gzip/deflate a single JavaScript file with everything we want and we need for that page ... well, if for some reason we have a document.write in the middle we will never know what the hell will happen!

var myServiceKey = "I_FEEL_SAFE";
document.write('<'+'script src="myservice.stuff.com"><'+'/script>');
// where myservice.stuff.com will consider the global myServiceKey


// and 2 thousands lines after inside a third part library
// and gosh knows which kind of nested closure is doing it ...
myServiceKey = "HE_FELT_SAFE";
document.write('<'+'script src="myservice.stuff.com"><'+'/script>');


Good stuff, myservice.stuff.com will never know my API, is it cool?

The Long Ride To The Solution

I don't want to create illusions, so I start this paragraph with this fact:
I have not been able to find a reasonable solution!

If still interested, follow me into my "evil monkey patch procedure" ...
First of all, it's a matter of nodes? So, if I create an external node to mess up the execution flow something should happen, right?

<!DOCTYPE html>
<html>
<head>
<script>
function elsewhere(text){
// wuick way to inject script nodes
// to execute on a global context
with(document)
with(documentElement)
insertBefore(
createElement("script"),
lastChild
)
.text = "(" + text + ")()"
;
};
var sequence = [];
var id;
elsewhere(function(){
id = 1;
});
document.write('<script src="test3.js"><'+'/script>');
elsewhere(function(){
id = 2;
});
document.write('<script src="test3.js"><'+'/script>');
</script>
<script>
alert(sequence); // 1,2 in Firefox
// 2,2 in IE, Chrome, Safari
// undefined,undefined in Opera
</script>
</head>
<body>
</body>
</html>

I have basically opened "Hell Doors" with this hack, but trust me, I felt the coolest JavaScript hacker ever when I first tried with Firefox, the only one that gave me the expected messed result. NO WAY, IE, Chrome, and Safari, still respect the order and since the injected global evaluated node is before the one appended in the middle of the browser parsing flow, nothing changes, and it's even kinda expected (I have tried to find out how buggy could have been our browsers).
Apparently, Firefox demonstrates an absolute non-sense: the blocking not synchronous script is still appended but some how it will consider the JavaScript snapshot created via another node as privileged ... I am sure this is a hole in the parser security model but I don't want to investigate more (specially for this topic).
What about Opera 10? Apparently it simply executes the blocking script before the injected one, so some how it has a proper queue and maybe, Opera behavior is the most logical ever: a queue of scripts FIFO style, excellent!

The Evil Plan + A Partial Solution

Still about the assumption that everything is because of the opened/closed script tag:

<!DOCTYPE html>
<html>
<head>
<script>
var sequence = [];
var id;
document.write('<script>id = 1;<'+'/script><script src="test3.js"><'+'/script>');
document.write('<script>id = 2;<'+'/script><script src="test3.js"><'+'/script>');
</script>
<script>
alert(sequence); // 2,2 in IE - 1,2 Others
</script>
</head>
<body>
</body>
</html>

This time the queue is almost respected by every browser, but remember the difference between a script without an external source and one with an external source?
Well, in Internet Explorer the script without an external source is extrapolated from the write context and parsed/evaluated in-line while external source based scripts are still considered "something that will be but that cannot be right now".
There is a perverse logic in this behavior but somehow makes sense from a programming point of view. Not because I agree with what happens in IE, simply because if we have "attached" a behavior, we expect it works just like that ... not everybody is able to think smart, and somehow I can see your faces: nobody surprised IE failed here as well!

The Partial Solution

So IE is the last one, but it is consistent with its crap. This means that we need to put order in the execution flow.
How? Creating two scripts with an external source:

<!DOCTYPE html>
<html>
<head>
<script src="params.js"></script>
<script>
var sequence = [];
document.write('<script src="test6.js?id=1"><'+'/script><script src="test3.js"><'+'/script>');
document.write('<script src="test6.js?id=2"><'+'/script><script src="test3.js"><'+'/script>');
</script>
<script>
alert(sequence); // 1,2
</script>
</head>
<body>
</body>
</html>

In few words we need a sort of clever javascript file able to "understand itself" and at the same time able to mutate run-time global scope variables.
This is the commented content of the test6.js file:

(function (script) {
// a loop is necesary since Opera and IE will
// have already every node in the DOM
// (document.write writes everything in a go
// so everything is already there, id 2,2 indeed if we don't check nodes)
// Firefox and Chrome will consider one script a time
for (var
filter = {id:function(i){return parseInt(i,10)}},
script = document.getElementsByTagName("script"),
i = 0, length = script.length,
tmp; i < length; ++i
) {
tmp = params(script[i].src, filter);
if (tmp.id) {
// let's mark somehow the script so that next loop
// won't consider this one
if (!script[i].touched) {
script[i].touched = true;
window.id = tmp.id
break;
};
};
};
})();

The params function is something I wrote few minutes ago able to transform an url with a query string into an object with key/value pairs:

function params(url, filter) {
// (C) WebReflection params snippet - Mit Style
// create a key/value(s)? object
// with urls query string and optional filters
var query = url.indexOf("?"),
result = {},
key, match, re, value
;
if (query < 0)
return result
;
if (!filter)
filter = {}
;
re = /([^=]*?)=(.*?)(?:&|#|$)/g;
url = url.slice(query + 1);
while(match = re.exec(url)) {
key = match[1];
value = typeof filter[key] === "function" ? filter[key](match[2]) : match[2];
if (result[key]) {
if (typeof result[key] === "string")
result[key] = [result[key], value]
; else
result[key].push(value);
} else
result[key] = value
;
};
return result;
};

Above snippet is the "re-usable for other purpose" content of the params function.
The main side effect of this approach is that rather than speed up the page download we will ask the client browser to download an extra file.
Sure, it can be static, it can be handled via 304, but it is still an extra call to our server.
The good part is that this file could be reused in many situations, the bad one is that is potentially a file we cannot trust, since other scripts could be able to use this file for other purposes ... this is why it is a partial solution, and why I have tried another experiment:

<!DOCTYPE html>
<html>
<head>
<script>
function $write(src, exec) {
var eval = "Function('('+unescape('" + escape("" + exec) + "')+')()')()";
document.write([
'<script src="javascript:void(0)"',
' onerror="',
eval,
'"><',
'/script><script src="',
src,
'"><',
'/script>'
].join(""));
};
var sequence = [];
var id;
$write("test3.js?1", function(){
id = 1;
});
$write("test3.js?2", function(){
id = 2;
});
</script>
<script>
alert(sequence); // 1,2 Firefox, Chrome
</script>
</head>
<body>
</body>
</html>

The problem is that IE won't work, since there is no such onerror handler in script nodes, plus the behavior is not consistent. If we try to reload the page a couple of times, mainly if other files have been cached, we could have randomly 1,2 or 2,2 - at least in Opera so ... experiment failed.

Conclusion

document.write is dangerous! I could not expect all this crap for a single "innocent" function but what I am sure about is that I have never used it, unless strictly necessary (never, imho), and I will try to do not use this method anymore since it causes too many problems, and I cannot even imagine why on earth and how Google, as many others, have been able to handle all their services faking the script injection via blocking writes, and obtaining potentially wrong info and nothing else. Be careful, oops, sorry, I meant ... forget document.write!

No comments:

Post a Comment