Boxrefuge Code Blog Get it right the first time

20Oct/1211

Debugging the Node

Posted by Justin

I ran into an interesting problem a couple of weeks ago. At work I've been developing a web application the uses a Node.js server and Socket.IO to keep up-to-date with realtime data. I was stuck trying to track down a rather sneaky little bug for a while. I went looking around for debugging options and discovered the Chrome Developer Tools Plugin for Eclipse. This allowed me to set breakpoints and step through my code to inspect what was going on. Great!

Problem is, I ran into a bit of a snag as I realized that setting breakpoints halts execution of the code, thus causing heartbeats to fail and the websocket connection to be lost! Really all I wanted to do was be able to dump out a few objects and look at them. Unfortunately the objects were very large and deep, and trying to inspect the mess that gets dumped out to the terminal with a normal `console.log()` in a Node.js application wasn't realistic. Not only does it only display two levels deep, but it also went way past my scrollback buffer.

Suffice it to say, I eventually tracked down the problem and solved it. But I wasn't content realizing that I didn't have a good way to inspect objects since I couldn't use breakpoints. So, I set off on my usual afternoon journey through the land of Google, in search of a solution. My hopes were raised when I learned of a project called node-inspector; but then quickly crushed as I learned that it doesn't work with the latest versions of Node.js and they also removed the ability to log to the browser console after version 0.1. Next up, node-codein. Unfortunately it was very buggy. Also, both of them worked exclusively with Chrome.

After failing to find any good solutions already out there, I did what any sensible code monkey would do. I started my own project called Node Monkey. Node Monkey is extremely easy to use and it works with any browser that supports `console.log()` and works with Socket.IO. Of course, you won't really have much luck with IE9 right now because logging an object to the console there just produces the string '[object Object]'. Go figure!

Please check it out and let me know what you think. If you have any ideas of more features that would be useful in debugging Node.js applications, let me know on Github. It's in its infancy, but I'd like to see something good come of it.

13Dec/109

PHP Mersenne Twister Random Number Generator

Posted by Justin

Ok, so It's been a while and I was planning on writing about all of this a lot sooner, but I was very busy and that didn't work out. But, I've got some time now, so here goes.

I was recently trying to write a class in PHP that I could use to encrypt and decrypt data for communication between our server and remote clients. As I got started I realized I was going to need to generate a lot of pseudo-random numbers. However, I needed to be able to reproduce the same numbers given the same seeds on a different system. The Mersenne Twister algorithm is a well-known, solid random number generator - and apparently faster than PHP's rand() function. I realized PHP has an implementation of the Mersenne Twister algorithm call mt_rand(), so I thought that would make life easy. But, as it turns out, as of PHP 5.2.1 the same seed no longer produces the same values as stated in the documentation for mt_srand():

The Mersenne Twister implementation in PHP now uses a new seeding algorithm by Richard Wagner. Identical seeds no longer produce the same sequence of values they did in previous versions. This behavior is not expected to change again, but it is considered unsafe to rely upon it nonetheless.

So I decided I would go ahead and implement the algorithm myself. At first, I was just looking to see if I could find anyone else's PHP implementation of the algorithm, but I had no luck with Google. After that I went to Wikipedia and worked from the pseudo-code. That much is pretty straight forward. However, my purpose in writing this article is two-fold. 1) I want to make a PHP implementation of the Mersenne Twister publicly available for other seekers, and 2) I want to discuss some of my modifications to enhance the speed.

Here's my implementation:

/**
 * Mersenne Twister Random Number Generator
 * Returns a random number. Depending on the application, you likely don't have to reseed every time as you can simply select a different index
 * from the last seed and get a different number - it is already auto-incremented each time mt() is called unless the index is specified manually.
 *
 * Note: This has been tweaked for performance. It still generates the same numbers as the original algorithm but it's slightly more complicated
 *       because it maintains both speed and elegance. Though not a crucial difference under normal use, on 1 million iterations,
 *       re-seeding each time, it will save 5 minutes of time from the orginal algorithm - at least on my system.
 *
 * $seed	: Any number, used to seed the generator. Default is time().
 * $index	: An index indicating the index of the internal array to select the number to generate the random number from
 * $min		: The minimum number to return
 * $max		: The maximum number to return
 *
 * Returns the random number as an INT
 **/
function Mt($seed = null, $index = null, $min = 0, $max = 1000)
{
	static $op = array(0x0, 0x9908b0df); // Used for efficiency below to eliminate if statement
	static $mt = array(); // 624 element array used to get random numbers
	static $ps = null; // Previous Seed
	static $idx = 0; // The index to use when selecting a number to randomize

	// Seed if none was given
	if($seed === null)
		$seed = time();

	// Assign index
	if($index !== null)
		$idx = $index;

	// Regenerate when reseeding or seeding initially
	if($seed !== $ps)
	{
		$s = $seed & 0xffffffff;
		$mt = array(&$s, 624 => &$s);
		$ps = $seed;

		for($i = 1; $i < 624; ++$i)
			$mt[$i] = (0x6c078965 * ($mt[$i - 1] ^ ($mt[$i - 1] >> 30)) + $i) & 0xffffffff;

		// This has been tweaked for maximum speed and elegance
		// Explanation of possibly confusing variables:
		//   $p = previous index
		//   $sp = split parts of array - the numbers at which to stop and continue on
		//   $n = number to iterate to - we loop up to 227 adding 397 after which we finish looping up to 624 subtracting 227 to continue getting out 397 indices ahead reference
		//   $m = 397 or -227 to add to $i to keep our 397 index difference
		//   $i = the previous element in $sp, our starting index in this iteration
		for($j = 1, $sp = array(0, 227, 397); $j < count($sp); ++$j)
		{
			for($p = $j - 1, $i = $sp[$p], $m = ((624 - $sp[$j]) * ($p ? -1 : 1)), $n = ($sp[$j] + $sp[$p]); $i < $n; ++$i)
			{
				$y = ($mt[$i] & 0x80000000) | ($mt[$i + 1] & 0x7fffffff);
				$mt[$i] = $mt[$i + $m] ^ ($y >> 1) ^ $op[$y & 0x1];
			}
		}
	}

	// Select a number from the array and randomize it
	$y = $mt[$idx = $idx % 624];
	$y ^= $y >> 11;
	$y ^= ($y << 7) & 0x9d2c5680;
	$y ^= ($y << 15) & 0xefc60000;
	$y ^= $y >> 18;

	// Set the next index to randomize the results even if the same seed is passed
	// We don't need to % 624 on this because that's already done when selecting $y
	++$idx;

	return $y % ($max - $min + 1) + $min;
}

I'm going to focus on the highlighted lines above. First you'll notice the array called $mt. Before filling this array, two values are assigned. In the original algorithm, only the first value is assigned to the last 32 bits of the seed. But this creates a slow down because then in the second for loop where you see "$mt[$i + 1]" this reads "$mt[($i + 1) % 624]" in the original. Division is the slowest math operation on a computer. Since modulus returns the remainder of a division operation, it obviously still has to do the division operation. Thus, this becomes a big slowdown when doing a lot of iterations. This line just looks painful to me as well because it does the modulus 624 times and it only has an effect on the very last iteration. So, I set out to find a better solution. As I was laying in bed pondering, the thought struck me that all we really need to do is create a 625 element array, but set the last element to the actual reference of the first (since the value of the first element changes on the first iteration). Then, there's no longer a need for a modulus. The last element of the array is the first element, and we still stop at 624, so there are no boundary problems.

Well, that eliminates one modulus, but there's still a couple more in the pseudo-code. The next line of code, before modification, looks like this:

$mt[$i] = $mt[($i + 397) % 624] ^ ($y >> 1) ^ $op[$y & 0x1];

Now, there are several possible ways to eliminate the second, but after hours of playing around with it, I couldn't seem to find anything that sill looked clean and elegant. It seemed that it would require adding more lines of code and using if statements and too many extra variables. The idea that finally worked came to me, once again, while I was laying in bed pondering about it. Somehow my best ideas always come when I'm trying to sleep.

To solve this, we first have to notice that the algorithm is always trying to access the element that is 397 ahead of the current one. So, if we want to eliminate the modulus, then as soon as we hit element 227 we have to somehow jump to element 0, and count up from there. The only reasonably elegant way of doing this without the modulus that I could find, was to stick the exact indices at which we need to switch in an array, and loop through those. So there is a 3 element array with the values 0, 227, and 397; however, we only loop through the last 2. The first one is used simply so the inner for loop knows what index to begin counting up from. Now, since I can't really be bothered explaining much more today, I'll simply point out that because of the "* ($p ? -1 : 1)" trick, as soon as it reaches 227, it loops through the outer for loop again, jumping to 397, however this time, it's subtracting 227. By subtracting 227 it creates a 397 index difference. In other words, (620 + 397) % 624 is the same as 620 - 227. Modulus #2 eliminated.

Now, probably the most painful part of the algorithm if implemented straight from the pseudo-code, is the last modulus. It goes inside the for loop and looks something like this:

// If $y is odd
if($y % 2 == 1)
	$y ^= 0x9908b0df;

Any good coder knows there's a better way to tell if a number is odd. In binary all odd numbers end with a 1 which means all you have to do to test if the number is odd is:

if($y & 1)
	...

This takes only 1 processor cycle, versus the countless cycles a division operation might take. But, we can still do one better than this, in this particular case. We can eliminate the if statement. I saw this trick in a C++ implementation I found while researching the Mersenne Twister algorithm. You can find this C++ implementation here. Anyway, the trick goes like this:

First create an array like this:

static $op = array(0x0, 0x9908b0df);

Then instead of doing the 'if', we'll just always do an XOR (^):

$mt[$i] = $mt[$i + $m] ^ ($y >> 1) ^ $op[$y & 0x1];

Because the value of $y & 1 will always be either 1 or 0, we can use that as the array index to determine what value we want to XOR $y with. Since odd numbers will always result in 1, we simply place the number we want to XOR odd numbers with in index 1 in the array, and 0 in the other.

And that's it! We've eliminated all 3 modulus operators and the if statement. When doing the number of iterations, re-seeding each time, required for my encryption algorithm, it makes a big difference in speed. Hopefully somebody finds this useful. If you find any problems with the code or want to suggest improvements, please let me know.

In my next article I will explain a problem I ran into with this random number generator and the differences in values returned on a 32 bit system VS. a 64 bit system and how I handle this.

29Oct/102

Boxrefuge is online!

Posted by Justin

After banging my head against countless hard and sometimes sharp objects in search of solutions to crazy code problems, I've thought to myself many times, "man, I should post this somewhere so that others don't have to go through the same thing." But then, I never do. So I finally decided I would. I wanted a simple, but good system to post with, however the thought of taking the time out of my already busy life to write something seemed painful. So I ended up using WordPress. I'm not proud, but it's something, for now. Maybe one day I'll be motivated enough to write something else.

So anyway, day 1! Boxrefuge is online! W00+!

Filed under: Personal 2 Comments