animesh kumar

Running water never grows stale. Keep flowing!

Posts Tagged ‘JavaScript

In Studio: Javascript Primitives

with 2 comments

Javascript – and for that matter most other languages – has two kinds of stuffs: primitives and objects. Primitives are immutable and stored in a variable entirely. That is, when you do assignments among primitives, the assigned receives a copy from the assigner.

y = x; // y receives a copy of x.

Objects are not stored in the variable. They stay somewhere else, and the variable only holds a reference to that location. So, in case of object assignments only references get copied.

obj2 = obj1; // obj2 receives a copy of reference to the object referred by obj1.

That is, both obj1 and obj2 now points to the same object, and if you change this object – the change will be observed by both obj1 and obj2.

In Javascript, there are 5 kinds of primitives: undefined, null, boolean, string and number. So, string “abc”, number 1, 1.34, boolean true, false – these all are primitives. Rest everything are Objects.

I know what you are thinking? Primitives are not Objects, and so they must not be able to execute any method on themselves – because they don’t have any – then how come this works?

console.log(“abc”.length); // prints 3

Well, the reason this works is that Javascript wraps these primitives with their Object counterparts.

“abc” ==> new String(“abc”)
true ==> new Boolean(true)
1 ==> new Number(1)

Recall Java’s wrapper objects? Well, this is almost the same and with more juicy auto-boxing in play. Javascript automatically coerces between primitives and Objects. In this case,

  1. string value is coerced into a String Object,
  2. property ‘length’ is accessed, and then
  3. the coerced Object is discarded for garbage collection!

Now, the hacker in you must have started thinking: If Javascript automatically coerces primitives to Objects; then certainly, you should be able to assign properties to a primitive too, like this:

var my_bank_account = “State Bank of India”;
my_bank_account.city = “Indore”;
console.log(my_bank_account.city); // prints undefined.

Oops. It failed. But why?

Well, what you guessed was right. The moment you tried to assign a property to the primitive, Javascript did indeed coerce primitive string into the wrapper Object – in this case, String. This new Object got itself assigned a brand new property – ‘city’ – too. But, since there wasn’t any placeholder to store the reference to this new Object, it was quickly discarded and sent for garbage collection. And when you went back to access the property ‘city’, Javascript again coerced your primitive to the wrapper Object – which incidentally is a new Object and doesn’t know anything about your ‘city’ property hence you see ‘undefined’.

Let me show you – roughly – what happens behind the scene:

var my_bank_account = “State Bank of India”;
my_bank_account.city = “Indore”;  ==> (new String(“State Bank of India”)).city = “Indore”;
console.log(my_bank_account.city);  ==> console.log(  (new String(“State Bank of India”)).city );

This whole game seems messy? No. How do I know when am I dealing with primitives and when with Objects?

Well, you use ‘typeof’ operator.

typeof false; // ‘boolean’
typeof new Boolean(false); // ‘object’
typeof “Animesh”;// ‘string’
typeof new String(“Animesh”); // ‘object’
typeof 1.45; // ‘number’
typeof new Number(1.45); // ‘object’

Enlightened? Good. :)

One more thing: this entire coercion business is a double-way traffic. The way Javascript turns your primitives into Objects when needed, it can turn Objects into primitives too when required.

var s = new String(‘hello’);
typeof s; // ‘object’
var p = s.valueOf(); // converts to primitive
typeof p; // ‘string’

Javascript will automatically use ‘valueOf’ (and sometimes, ‘toString’) method whenever such a need arises. Watch this:

var x = new String(“abc”);
typeof x; // ‘object’
var y = x + 1; // automatically calls valueOf() on Object x.
typeof y; // ‘string’

Normally, you don’t need to worry about this thing. But knowing what goes on certainly helps at times. Don’t you think? Follow this:

var b = new Boolean(false);
typeof b; // ‘object’
if (b){
    console.log(“b is true.”); // Ah! When did b become true?
}

What’s wrong with the above code? You must have thought that Javascript would coerce the Boolean object into primitive while evaluating ‘if’? Well, Javascript didn’t think so. It just went ahead and evaluated the Object directly for the condition… and since that Object was neither null nor undefined, it evaluated to true.

If you had done this:

if (b.valueOf(){
    console.log(“b is true”);
} else {
    console.log(“b is false”); // thank God!
}

It would have come out differently. Remember this: unless there is a dire need, Javascript doesn’t perform any coercion.

I hope this article was helpful. I’ll be writing more on Javascript internals.

Written by Animesh

June 22, 2011 at 11:49 pm

Posted in Technology

Tagged with , , ,

Fun with Singleton (Python, Javascript, Java)

leave a comment »

They say that Singletons, like global variables, are evil. They hide dependencies; are harder to test and even harder to extend. Singletons are lies, and it’s best to keep away from them. But, there are scenarios where you need them. For example, when you want a shared resource like printer spooler or file manager or log manager, you want a single object to handle requests from all the various parts of your application.

In this blog, I am going to explore various ways to make Singletons in languages like Python, Java and Javascript so as to keep it simple, elegant and usable. Let’s talk about Python first. I love it, and it’s a really really wonderful language, and in here, there are n different ways to solve a problem. Singletons are no exception. The most natural way to do it is to create a decorator.

class Singleton(object):
	def __init__(self, klass):
		self.klass = klass   # class which is being decorated
		self.instance = None  # instance of that class
	def __call__(self, *args, **kwargs):
		if self.instance is None:
			# new instance is created and stored for future use
			self.instance = self.klass(*args, **kwargs)
		return self.instance

Now, let’s say you have a Resource class. To make it singleton, you just need to decorate it with ‘@Singleton‘, and you are done.

@Singleton
class Resource(object):
	def __init__(self, klass):
		self.name = None

Cool…eh? There are other – nerdy – ways too. Python uses an internal dictionary ‘__dict__’ variable to keep track of an Object’s properties and methods. So, if you can share ‘__dict__‘ across multiple instances of a Class, you can share the state between them. And isn’t that Singleton? Yes, that is. You might have many many instances, but all of them behave exactly the same.

class Singleton(object):
	_shared = {}
	def __init__(self):
		self.__dict__ = Singleton._shared
class Resource(Singleton):
	def __init__(self, klass):
		self.name = None

Since ‘self.__dict__‘ now refers to ‘_shared‘ dictionary, all instances of Resource would use the same dictionary and hence they will all have the same behavior. Geeky? Let me show you an even geekier way to do it.

In Python, when you instantiate a class, the interpreter calls ‘__new__‘ method – a class method which returns an instance of the class – and then ‘__init__‘ method – constructor of the class – is called which initializes the object. So, you can hack into ‘__new__‘ and return the single instance whenever it is being called.

class Singleton(object):
	_instance = None
	def __new__(cls, *args, **kwargs):
	# cls is the Class and the rest are constructor arguments
		if cls._instance is None:
			# create an instance and store it
			cls._instance = Object.__new__(cls, *args, **kwargs)
		return cls._instance
class Resource(Singleton):
	def __init__(self, klass):
		self.name = None

Aweomse! Isn’t it? There are other ways that deal with ‘__metaclass__‘ etc. but let’s save them for another day. Let’s use it now:

# get resource r1
r1 = Resource();
# get resource r2  (since Resource is singleton, r1 == r2)
r2 = Resource();
# to verify, let's set 'name' onto r1
r1.name = "Animesh Kumar"
print r1.name
# and the same 'name' appears in r2 as well!
print r2.name

Let’s now see how do we do this in Javascript. For the most simple form, just define an Object Literal, and you are done.

var resource = {
	getName : function() {
		return this.name;
	},
	setName: function(name){
		this.name = name;
	}
}

Easy. You have an object which you can now share across your application modules and it just works. For more complex scenarios, like private variables and all, you might have to resort to something like this:

// self-executable wrapper function
var Resource = (function(){
	// Resouce class which is to made 'singleton'
	function _Resource() {
    	var name; // private variable
    	this.getName = function() {	// getter
    		return this.name;
    	};
		this.setName= function(name){ // setter
			this.name = name;
		};
		// do more stuffs
    }
    // instance holder
    var instance = new _Resource();</p>
    // return an object with 'getInstance' method
    return = {
        getInstance: function(){
            return instance;
        }
   };
})();

_Resource (line-04) is your function of interest, and you want to make it singleton. So, you create another function ‘Resource‘ which wraps over _Resource and returns an object with method ‘getInstance‘ which would return the same instance of _Resource every time it will be called.

Let’s try to use it now:

// get resource r1
r1 = Resource.getInstance();
// get resource r2  (since Resource is singleton, r1 == r2)
r2 = Resource.getInstance();
// to verify, let's set 'name' onto r1
r1.setName("Animesh Kumar");
console.log(r1.getName());
// and the same 'name' appears in r2 as well!
console.log(r2.getName());

So it was easy. No? Great.

Now, Java. I know every one of you would already know it. I would write about it anyway, just for the sake of completeness. In Java, you create a private static instance of the class, and use that instance wherever necessary.

public class Resource {
	// static instance (Note: Resource instantiation is done here, not in getInstance)
	private static Resource instance = new Resource();
	// property
	private String name;
	// private constructor to disable 'new'
	private Resource() {
	}
	// public staic method to get an instance of this class
	public static Resource getInstance() {
		return instance;
	}
	// getter
	public String getName() {
		return name;
	}
	// setter
	public void setName(String name) {
		this.name = name;
	}
}

Now, let’s use it.

	public static void main(String[] args) {
		// get resource r1
		Resource r1 = Resource.getInstance();
		// get resource r2  (since Resource is singleton, r1 == r2)
		Resource r2 = Resource.getInstance();
		// to verify, let's set 'name' onto r1
		r1.setName("Animesh Kumar");
		System.out.println(r1.getName());
		// and the same 'name' appears in r2 as well!
		System.out.println(r2.getName());
	}

Loud and clear. And this also stops you from instantiating ‘Resource‘ with ‘new‘ operator. Try this:

	Resource r1 = new Resource();  // java.lang.Error: Unresolved compilation problem:

Your code will not compile. I know you know why. I will write anyways: because the constructor is private! So, there is no way to get an instance of Resource class but through ‘getInstance’ method which ensures single instance of the class. Also, If you noticed: I have instantiated ‘instance’ during declaration itself, not in ‘getInstance’ method. This way, the object gets created at the time of class loading, and you save yourself from a lot of issues that creeps in because of Java Just in Time.

Written by Animesh

May 27, 2011 at 3:59 pm

IBM has no idea what Node.js is

with one comment

People, don’t read this article on IBM’s developer works Just what is Node.js?. And you have read it already, don’t believe it. IBM has no idea what Node is, really.

Read my rebuttal http://wkp.me/wkk6g

For more gory details, read Marak Squires’ blog here.

IBM must pull the article or ask the author to rewrite it. The article is providing a dis-service to any new developers who might stumble along it as their first introduction to node.js.

Written by Animesh

May 13, 2011 at 12:34 pm

Node and Its Many Incarnations (Node Version Management)

leave a comment »

Node.js is under active development. And every other day, a new build is released. It’s awesome to see how fast Node is growing and how vibrant the community is… but on the down side, it’s becoming increasingly difficult to keep track of its many versions, and API changes.

Very often, while developing an app, you find yourself married into a particular Node version, because a newer one might have some API changes (mind you, Node is witnessing heavy transformations, especially at the API level) which might break you app… and then, you would be forced to revert back to the older version. That means, uninstall the current node and re-install the older one. Ouch! So much work for a mere upgrade.

Well, there is a nicer way to do it. Check out this project by Tim Caswell: Node Version Manager. It does exactly what it says. It manages various Node versions on your machine, development, stage, production whatever. How?

It creates a virtual Node environment for each version you want to keep. Let’s say, you want to stay with the last stable release v0.2.6 (from the time you started your app) but also want to experiment with v0.4.7 to keep an eye on new additions.  NVM will install two separate Node(s) for you, and each will run in its own sandbox like environment, that is, you will have to install all your third party Modules/Libraries separately for each Node installation. That might seem to be a lot of work, but trust me, it’s the safest way to avoid conflicts. Okay. Let’s get to work.

Installation

Note: I am assuming that you have basic knowhow of GIT (the most awesome source control management system).

  1. Clone NVM repository to your local machine:
    $ git clone git://github.com/creationix/nvm.git ~/.nvm

    Above command would close the NVM repository to a folder ‘.nvm’ in your home directory. (I am using Ubuntu 10.0.4)

  2. Switch to folder ‘.nvm’ and make file ‘nvm.sh’ executable:
    $ chmod 755 ~/.nvm/nvm.sh
  3. ‘nvm.sh’ is just a shell script, so in order to run it, you must source it to every terminal you open. To do this automatically, simply edit either ‘.bashrc’ or ‘.profile’ file to have this line in the very end:
    . ~/.nvm/nvm.sh
  4.  That’s it. Open a new terminal and run,
    $ nvm
  5. You will see a set of useful commands you can use. :) Easy huh?

Getting dirty

Before you get any further, just make sure that you have ‘wget’ installed in your machine. I know, I know… you might already have it. I just want you to make sure.

Check which versions of Node are available.

$ nvm sync // update the local machine with available versions from server
$ nvm ls   // displays all available and installed versions

Now install Node v0.4.7.

$ nvm install v0.4.7 // will install Node v.0.4.7

Note: You might get this error, “Could not autodetect OpenSSL support. Make sure OpenSSL development packages are installed. Use configure –without-ssl to disable this message” which says, that you need to install SSl library:

$ sudo apt-get install libssl-dev

NVM creates a folder ‘src’ either in your home directory or in ‘.nvm’ directory where it downloads the bundled release, extracts and installs it. NVM also installs NPM (node package manager) for each installation of Node.

Select a particular version

$ nvm use v0.4.7 // start using Node-v0.4.7

That’s it. You have set up a system which will enable you to quickly and cleanly switch between various Node versions. You can test your app’s compatibility with any of them, and if need be, easily switch to the one your app was most comfortable with.

Now, since you have set up a congenial Node development machine, in the next blog, I will talk about how to go live with your Node app.

Note: for CentOs-5.x, please make sure that you have following packages installed:

$sudo yum install gcc-c++ screen git-core openssl openssl-devel

Written by Animesh

May 3, 2011 at 11:17 am

WebSocket support in Android’s Phonegap apps

with 78 comments

We are developing a small game which can be played from multiple users using variety of clients, web-browser, Android, iPhone, iPad etc. It’s like, there is a server and all clients connect to this server, and send and receive messages. We decided to use WebSocket for underlying connection between clients and server, and Phonegap to build clients. Our idea is to write the client once and then run it on variety of platforms. Since, Phonegap enables app development using HTML, CSS and JavaScripts, it generously fits into our requirement.

But Phonegap doesn’t support WebSocket yet, it’s in their Plan-of-Action for 1.x release though. So, it was needed to be addressed. I found Mathias Desloge’s PhoneGap-Android-HTML5-WebSocket project. It was good but it used old java.io.* packages. I would have preferred to use java.nio.* for better and efficient non-blocking behavior. So, I decided to write my own small library.

Library can be found here: websocket-android-phonegap.

How to use?

  1. Copy Java source into your source folder.
  2. Copy websocket.js in your assets/www/js folder
  3. Attach com.strumsoft.websocket.phonegap.WebSocketFactory to WebView, like
    	@Override
    	public void onCreate(Bundle savedInstanceState) {
    		super.onCreate(savedInstanceState);
    		super.loadUrl("file:///android_asset/www/index.html");
    
    		// attach websocket factory
    		appView.addJavascriptInterface(new WebSocketFactory(appView), "WebSocketFactory");
    	}
    
  4. In your page, create a new WebSocket, and overload its method ‘onmessage’, ‘onopen’, ‘onclose’, like
    	// new socket
    	var socket = new WebSocket('ws://192.168.1.153:8081');
    
    	// push a message after the connection is established.
    	socket.onopen = function() {
    	 alert('connected');
    	};
    
    	// alerts message pushed from server
    	socket.onmessage = function(msg) {
    	 alert(JSON.stringify(msg));
    	};
    
    	// alert close event
    	socket.onclose = function() {
    	 alert('closed');
    	};
    

How it works?

When you create a new WebSocket object in your page, behind the scene, websocket.js delegates the responsibility to com.strumsoft.websocket.phonegap.WebSocketFactory to instantiate new com.strumsoft.websocket.phonegap.WebSocket object.

		// websocket.js
		// get a new websocket object from factory (check com.strumsoft.websocket.WebSocketFactory.java)
		this.socket = WebSocketFactory.getWebSocket(url);

WebSocketFactory simply instantiates a new WebSocket object, connects it to the designated server and returns the instance.

// com.strumsoft.websocket.phonegap.WebSocketFactory

public WebSocket getWebSocket(String url) throws URISyntaxException {
	WebSocket socket =  new WebSocket(appView, new URI(url));
	socket.connect();   // connects to server
	return socket;
}

Now, whenever an event occurs, say, ‘onmessage’, WebSocket class delegates that event to Javascript.

// com.strumsoft.websocket.phonegap.WebSocket

public void onMessage(String message) {
	appView.loadUrl(buildLoadData("message", message));
}
private String buildLoadData(String _event, String _data) {
	String _d =  "javascript:WebSocket.on" + _event + "(" + 
				"{"
				+ "\"_target\":\"" + webSocketId + "\"," + 
				"\"_data\":'" + data + "'" + 
				"}" + 
			")";
	Logger.log(_d);
	return _d;
}

Finally, ‘WebSocket.onmessage’ from websocket.js is called. It parses the payload, finds out the target WebSocket object, and calls the corresponding event on the target object with event data.

	// websocket.js
	// static event methods to call event methods on target websocket objects
	WebSocket.onmessage = function (evt) {
		WebSocket.registry[evt._target]['onmessage'].call(global, evt._data);
	}

That’s it!

Amendment

(Date: Thu Aug 25 12:40:52 IST 2011)

There was a serious bug! The Websocket connection runs in a separate thread to manage persistent state with the server. And the front end Javascript (websocket.js) stays within UI/Main thread. And Android doesn’t want other threads to communicate with UI thread directly. These threads must employ an additional thread to bridge the communication. So, here is the fix!

	// a message is sent to server! 
	public void send(final String text) {
		// new thread
		new Thread(new Runnable() {
			@Override
			public void run() {
				if (instance.readyState == WEBSOCKET_STATE_OPEN) {
					try {
						instance._send(text);
					} catch (IOException e) {
						instance.onError(e);
					}
				} else {
					instance.onError(new NotYetConnectedException());
				}
			}
		}).start();
	}

	// when a message is received
	public void onMessage(final String msg) {
		// post a new thread to View
		appView.post(new Runnable() {
			@Override
			public void run() {
				appView.loadUrl(buildJavaScriptData(EVENT_ON_MESSAGE, msg));
			}
		});
	}

Commit link:
https://github.com/anismiles/websocket-android-phonegap/commit/a7ccb815cce3a446c3ec92058187cdb20e5a41e8
https://github.com/anismiles/websocket-android-phonegap/commit/087f7a93d46f92cb037d2b451a4d253a65f5f015

Written by Animesh

February 3, 2011 at 11:52 am

Posted in Technology

Tagged with , , ,

WebSocket and node.js: why shud’ya care?

with 26 comments

Traditional HTTP messages are heavy. Every message is sent with HTTP headers. Now, let’s say you have an application that has a real-time component, like chat or some twitter client or may be some traffic analysis stuff. And let’s say you have around 100,000 users connected to your app. To make your app real-time, you need to have a mechanism which will enable server to push data almost as soon as this data becomes available. You could do it in two ways: Write a script which will connect to server every few seconds to check if there is any data. With each attempt, full set of HTTP headers moves back and forth between client and server. That’s not very efficient. To save yourself with all these bandwidth hassles, you could use a popular trick known as long-polling, where your browser connects to server and server holds the connection open until there is some data available to be pushed.

Now, let’s assume that there are 100,000 users connected to your app and every 10 seconds some data is sent from server to clients. Following HTTP specs, every time some data is sent, full set of headers are shared between client and server. This is how they look,

Request header

GET / HTTP/1.1
User-Agent: ...some long user agent string...
Host: animesh.org
Accept: */*

Response header

HTTP/1.1 200 OK
Date: Tue, 25 Jan 2011 17:32:19 GMT
Server: Apache
X-Powered-By: PHP/5.2.3
X-Pingback: http://animesh.org/endpoint/
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8

That’s approximately 350 bytes of data, per user every 10 seconds. That’s roughly 28,400,000 bits per second of network throughput for 100,000 users. Roughly 26.7 Mbps for only HTTP headers. Gosh!

WebSocket

WebSocket comes to resue. With web sockets, once a handshake is done between client and server, messages can be sent back and forth with a minimal overhead. That’s awesome. You do a handshake while establishing the connection, and of course handshaking needs all those HTTP headers, but after that, you only need to send the data… no headers. This greatly reduces the bandwidth usage and thus improves the performance. Let’s see how. This is how handshake headers look like,

Handshake Request header

GET /demo HTTP/1.1
Upgrade: WebSocket
Connection: Upgrade
Host: animesh.org
Origin: http://animesh.org
WebSocket-Protocol: sample

Handshake Response header

HTTP/1.1 101 Web Socket Protocol Handshake
Upgrade: WebSocket
Connection: Upgrade
WebSocket-Origin: http://animesh.org
WebSocket-Location: ws://animesh.org/
WebSocket-Protocol: sample

And now, the connection has been established and data can freely flow between server and client without having to exchange any HTTP headers until this connection is closed or broken and you do another handshake. Imagine how much bandwidth you are saving! Whoa!

Example

Let’s write a simple application to see and learn how this thing actually works. This application will have a server all the clients will connect to, and whenever one client writes something to the server, all clients will be notified.

Here is our server, written in Node.js. Let’s name it server.js

Note: Though you can very well write a web socket server using Node’s native APIs, however I chose to use Micheil Smith‘s node-websocket-server library. This library is simple, elegant and very easy to work with. It works by wrapping and extending Node’s server object.

var sys = require("sys");
// Library https://github.com/miksago/node-websocket-server
var	websocket = require('./lib/node-websocket-server/lib/ws/server');

// create web socket server
var server = websocket.createServer();
// listen on port 8078
server.listen(8078);

// when the server is ready
server.addListener("listening", function() {
  sys.log("Listening for connections on localhost:8078");
});

// when a traditional HTTP request comes
server.addListener("request", function(req, res) {
	res.writeHead(200, {
		"Content-Type" : "text/plain"
	});
	res.write("This is an example WebSocket server.");
	res.end();
});

// when a client websocket connects
server.addListener("connection", function(conn) {

	// when client writes something
	conn.addListener("message", function(message) {

		// iterate thorough all connected clients, and push this message
		server.manager.forEach(function(connected_client) {
			connected_client.write(JSON.stringify(conn.id + ": " + message));
        });
	});
});

Now, let’s write a simple client. We will create one HTML file and run it in Google Chrome. Let’s name is client.html

<!DOCTYPE html>
<html>
<head>
    <title>WebSocket - Simple Client</title>
    <script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.2.6/jquery.js"></script>
    <script type="text/javascript">

	$(function() {
		// bind form
		$('#payload-form').submit(function() {
			var payload = ($("input#payload").val());
			socket.send(payload);  // write to server
			return false;
		});

		// open websocket
		var socket = new WebSocket('ws://localhost:8078');

		socket.onopen = function() {
	    	// Web Socket is connected. send an initial random message.
	    	socket.send(Math.floor(Math.random()*11) + ' >> Hi, I am Mr. so-and-so!');
	    };
		// append to '#log' whatever server pushes.
		socket.onmessage = function(ev){
			msg = JSON.parse(ev.data);
			$('#log').append(JSON.stringify(msg) + '</br>');
		}
	})
    </script>
</head>
<body>
	<div id='payload-container'>
		<form id='payload-form'>
			<input type='text' name='payload' id='payload' value='Hello World' style="width:500px;"/>
			<input type='submit' value='push'/>
		</form>
	</div>

	<div id='log' style='display:block; border:1px solid lightgray;'></div>
</body>
</html>

Now, run your server, and open your client in multiple Chrome windows/tabs.

// run server
$ node server.js

That’s it! Was is fun? I will write more on how to establish WebSocket connections from a Java client in the next blog.

Written by Animesh

January 25, 2011 at 3:25 pm

Using node.js and jquery to scrape websites

with 30 comments

I have been playing with Node.js for last few days and am totally head over heels. Madly in love! It’s awesome to know how much you can build with how little. I have ranted about Node.js earlier and did some comparisons too. It’s fast, really fast. And it’s plain old Javascript we have been using for last many-many years now. I thought I would build a real world application with it to see how much it stands the water. Earlier I thought to make a something on top of Riak, but that felt like running too fast. Instead I picked up something simpler to deal only with Node.js. Now, I think it would make sense to brush up on some Javascript fundaments.

Javascript objects

Yes. Javascript is an object oriented language. But it’s different from your traditional classical OO languages like Java and Ruby.

  1. One obvious difference is in syntax, and the other major one is that
  2. Other languages have methods while Javascript has first-class functions.

First class functions. What does it mean? It means that they are expressions and can be assigned to a variable and can be easily passed around. Does it sound like a closure in Ruby? It does indeed. Well thought, it’s a little more than that. I will come to this again some other time. For now, let’s find out how we can create objects and use them? I will focus tell you two ways to do it.

The Classical way

Here is a constructor function for object Shape. It accepts two parameters and saves them into respective instance variables.

function Shape(width, height) {
	this.width = width;        // instance variable width
	this.height = height;      // instance variable height
	this.getArea = function() {     // function to calculate Area, notice the assignment.
		return this.width * this.height;
	};
}

var rectangle = new Shape (2, 5);    // instantiate a new Shape object
console.log (rectangle.getArea());   // calculate the area: 10

Javascript uses prototype chains to add new functions or variables to an object on the fly. You should read more about this thing here: http://www.packtpub.com/article/using-prototype-property-in-javascript

I will add a new function to calculate the perimeter of my Shape object.

Shape.prototype.getPerimiter = function() {
	return 2 * (this.width + this.height);
}

console.log (rectangle.getPerimiter());

What happened here? Did you notice that even if ‘rectangle’ was already defined it could access the newly added function to calculate perimeter. Wasn’t that awesome? Javascript is intelligent, dude. If you ask for something, it looks into the current object, and if not found, it would go up the object’s prototype chain to look for what you asked for. And since, we added the new function to the prototype, it’s found unscrupulously.  There is a lot of interesting stuffs going on here, you must read about it. I would suggest buying Manning’s Javascript Ninja, if you are really serious about it.

Now, let’s try to extend Shape. I will create a new constructor function for Square.

function Square(side){
	this.width = side;
	this.height = side;
}

Square.prototype = new Shape();

var sq = new Square(4);
console.log(sq.getArea());

I created a new Square class and overrode its prototype chain with that of Shape’s. I got all the functionalities and behavior of Shape. Easy… huh?

The Prototypal way

Let’s do the same thing without using constructors now. Just plain prototypes!

var Shape = {
	getArea: function () {
		return this.width * this.height;
	},
	getPerimiter: function() {
		return 2 * (this.width + this.height);
	}
};

var rec = Object.create(Shape);
rec.width = 2;
rec.height = 5;

console.log(rec.getArea());

Now that you have the Shape object, you can easily add new functions to its prototype chain, or even inherit it to another object. However I find this approach a little clumsy. I would rather stick to the classic way. You choose your pick. To each his own!

Node.js Modules

Node uses the CommonJS module system. Node has a simple module loading system where files and modules are in one-to-one correspondence. Here is the API: http://nodejs.org/api.html. Above example can be ported to Node.js module ecosystem like explained below:

First, create Shape.js

function Shape(width, height) {
	this.width = width;        // instance variable width
	this.height = height;      // instance variable height
	this.getArea = function() {     // function to calculate Area, notice the assignment.
		return this.width * this.height;
	};
}

// Export this module
exports.module = Shape;

And now, use this

var Shape = require('./Shape');

var rectangle = new Shape (2, 5);
console.log (rectangle.getArea());

Node.js loads and runs each module in a sandbox which staves off any possible name collision. That’s the benefit you get apart from having a properly structured code base.

Writing a screen scraping application

I will write a simple application to capture details from various websites. The beautiful thing is Javascript has been handling DOM objects for years. In fact Javascript was created to handle DOM objects. No wonder that it’s more mature than any other html parsing library. Also, given that there are many elegant frameworks like Prototype, Mootools, JQuery etc. available to use, scraping websites with Node.js should be easy and fun. Let’s do it. Let’s write an application to collect data from various book selling websites.

Create a basic searcher.js module. It would provide the fundamental skeleton for writing website specific tool.

// External Modules
var request = require('ahr'), // Abstract-HTTP-request https://github.com/coolaj86/abstract-http-request
sys = require('sys'),		// System
events = require('events'),	// EventEmitter
jsdom = require('jsdom');	// JsDom https://github.com/tmpvar/jsdom

var jQueryPath = 'http://code.jquery.com/jquery-1.4.2.min.js';
var headers = {'content-type':'application/json', 'accept': 'application/json'};

// Export searcher
module.exports = Searcher;

function Searcher(param) {
	if (param.headers) {
		this.headers = param.headers;
	} else {
		this.headers = headers;
	}

	this.merchantName = param.merchantName;
	this.merchantUrl = param.merchantUrl;
	this.id = param.merchantUrl;
}

// Inherit from EventEmitter
Searcher.prototype = new process.EventEmitter;

Searcher.prototype.search = function(query, collector) {
	var self = this;
	var url = self.getSearchUrl(query);

	console.log('Connecting to... ' + url);

	request({uri: url, method: 'GET', headers: self.headers, timeout: 10000}, function(err, response, html) {
		if (err) {
			self.onError({error: err, searcher: self});
			self.onComplete({searcher: self});
		} else {
			console.log('Fetched content from... ' + url);
			// create DOM window from HTML data
			var window = jsdom.jsdom(html).createWindow();
			// load jquery with DOM window and call the parser!
			jsdom.jQueryify(window, 'http://code.jquery.com/jquery-1.4.2.min.js', function() {
				self.parseHTML(window);
				self.onComplete({searcher: self});
			});
		}
	});
}

// Implemented in inhetired class
Searcher.prototype.getSearchUrl = function(query) {
	throw "getSearchUrl() is unimplemented!";
}
// Implemented in inhetired class
Searcher.prototype.parseHTML = function(window) {
	throw "parseForBook() is unimplemented!";
}
// Emits 'item' events when an item is found.
Searcher.prototype.onItem = function(item) {
	this.emit('item', item);
}
// Emits 'complete' event when searcher is done
Searcher.prototype.onComplete = function(searcher) {
	this.emit('complete', searcher);
}
// Emit 'error' events
Searcher.prototype.onError = function(error) {
	this.emit('error', error);
}

Searcher.prototype.toString = function() {
	return this.merchantName + "(" + this.merchantUrl + ")";
}

Now, code to scrape rediff books. I will name it searcher-rediff.js

var Searcher = require('./searcher');

var searcher = new Searcher({
	merchantName: 'Rediff Books',
	merchantUrl: 'http://books.rediff.com'
});

module.exports = searcher;

searcher.getSearchUrl = function(query) {
	return this.merchantUrl + "/book/" + query;
}

searcher.parseHTML = function(window) {
	var self = this;

	window.$('div[id="prod_detail"]').each(function(){
		var item  = window.$(this);

		var title = item.find('#prod_detail2').find('font[id="book-titl"]').text();
		var link = item.find('#prod_detail2').find('a').attr('href');
		var author = item.find('#prod_detail2').find('font[id="book-auth"]').text();
		var price = item.find('#prod_detail2').find('font[id="book-pric"]').text();

		self.onItem({
			title: title,
			link: link,
			author: author,
			price: price
		});
	});
}

Run it now.

var searcher = require('./searcher-rediff');

searcher.on('item', function(item){
	console.log('Item found >> ' + item)
});

searcher.on('complete', function(searcher){
	console.log('searcher done!');
});

searcher.search("Salman");

What I did?

  1. First, I wrote a skeleton searcher class. This class makes the
    1. request to the merchant’s search URL (this URL is built in getSearchUrl function), then
    2. fetches the html data from here, then
    3. by using ‘jsdom’ module creates DOM’s window object which further
    4. gets parsed by ‘jquery’, and
    5. function parseHTML is executed.
  2. Second, I wrote another class that extends from searcher and intends to interact with Rediff. This class implements,
    1. getSearchUrl function to return appropriate search URL to connect to, and
    2. parseHTML function to scrape data from DOM’s window object. This is very interesting. You can use all your jquery knowledge to pick elements and parse data from inside the elements. Just like you did in old days when you added styles or data to random elements.

Now, if I want to search say Flipkart along with Rediff, I just need to write a Flipkart specific implementation, say searcher-flipkart.js

var Searcher = require('./searcher');

var searcher = new Searcher({
	merchantName: 'Flipkart',
	merchantUrl: 'http://www.flipkart.com'
});

module.exports = searcher;

searcher.getSearchUrl = function(query) {
	return this.merchantUrl + "/search-book" + '?query=' + query;
}

searcher.parseHTML = function(window) {
	var self = this;

	window.$('.search_result_item').each(function(){
		var item  = window.$(this);

		var title = item.find('.search_result_title').text().trim().replace(/\n/g, "");
		var link = self.merchantUrl + item.find('.search_result_title').find("a").attr('href');
		var price = item.find('.search_results_list_price').text().trim().replace(/\n/g, "");

		self.onItem({
			title: title,
			link: link,
			price: price
		});
	});
}

I have also written a Runner class to execute the multiple searchers in parallel and collect results into an array. You can find the entire source code here: https://github.com/anismiles/jsdom-based-screen-scraper Chill!

What’s next? I am going to write on Node.js pretty feverishly. You better keep posted. How about a blog engine on Riak?

Written by Animesh

November 29, 2010 at 3:47 pm

WTF is node.js and what’s the fuss all about?

with 6 comments

You must have been hearing about Node.js for quite some time. Me too! Everybody is talking about it, writing about it. I am tired. So I think I should try it myself. By definition, node.js is a library written for Google’s V8 that does evented I/O. V8 is a JavaScript engine written in C++ being used in Google Chrome, and it’s veryveryvery fast.

Point to note here is evented I/O. Traditionally you would wait for input/output to finish before moving further with your execution, but in evented environment you don’t wait, rather you get informed about I/O completion and meanwhile you could do whatever you want. Cool eh? Let’s cement it with an example. Say, you want to find out the last edited time of a file. Traditionally, you would do it this way:

// read file
Stat stat = readFileStat( ‘file-path’ );
// operation
useStatInfo( stat );

In evented environment, you would do it this way:

readFileStat( ‘file-path’, function ( result ) {
	// operation
	useStatInfo( result );
} );

In this case, once the file is read the result is passed to another function. You don’t have to wait. Do you see that? This enables evented systems to handle larger number of requests simultaneously, because there is no thread to spawn, no heap to allocate.

You have been doing this kind of things with Closures and Java anonymous functions since eternity. But JavaScript makes it all more natural and simpler. And that’s where Node.js shines. Let me list down the main things:

  1. It’s JavaScript. JavaScript’s anonymous functions and closures is perfect for callback definitions.
  2. Everything everywhere is asynchronous. There are no threads. Everything has been built up from scratch and everything is event driven.
  3. No old baggage. That is, nothing has been carried over from the old synchronous, threaded world. That’s a good thing though a little limiting right now since there aren’t many packages. But that would soon be taken care of. There is a huge community toiling here.
  4. Focus on dealing with data. You don’t have to focus on networks or protocols. Just focus on your data and your flow. Simple?
  5. It’s small.
  6. It’s fast.
  7. It’s easy.

Now don’t start thinking of Node.js as another framework like Rails, Django, Sinatra etc. Don’t. Node.js doesn’t only help you build a web application; it goes further and helps you to build an application server instead. Node.js is framework to build scalable network programs. It could run on HTTP protocol or on TCP or whatever. You don’t have to worry about it.

Installation

UPDATE: You should use Node Version Manager instead of bare installation.

I am using ubuntu-9.10-desktop-i386 and Oracle’s VirtualBox on Windows 7. Here is a nice tutorial to do it yourself: http://www.psychocats.net/ubuntu/virtualbox. I think you could also use Cygwin to run Node.js but I don’t prefer that personally. Linux feels way much easier.

  1. Ensure you have all the essentials necessary.
    sudo apt-get update
    sudo apt-get install git-core
    sudo apt-get install build-essential
    
  2. Clone the Node.js repository:
    git clone git://github.com/ry/node.git
    
  3. Now configure and install:
    cd node
    ./configure && make && sudo make install
    

That’s it. You are done. Now, let’s make ourself a small and pretty HTTP server.

var sys  = require("sys"),
http = require("http");

http.createServer(function(request, response) {
	response.sendHeader(200, {"Content-Type": "text/html"});
	response.write("Hello World!");
	response.close();
}).listen(8080);

sys.puts("Server running at http://localhost:8080/");

This script uses two modules, sys and http to create an HTTP server. The anonymous function being passed to http.createServer is called at each request. Save this script to helloworld.js file.

Now run this server,

node helloworld.js

Go to http://localhost:8080/ in your browser, you will see “Hello World!”

Benchmark

It all might have seemed so simple, eh? I know. It stunned me too. So I decided to benchmark it. I created identical apps in PHP (using PHP5 with Apache2 mod_php) and Node.js. Both apps rendered a single html page with similar content. I used Apache Benchmark tool to run the comparison.

ab –n 10000 –c 10
ab –n 10000 –c 10

PHP			2988.3 requests/sec
Node.js		5391.2 requests/sec

Node.js wins with a huge margin. Wondering why? Remember Node.js is an event driven framework, so unlike other servers like Apache it doesn’t open a socket or spawn a thread or even use a pool of threads, rather it has only a single thread running an event loop that executes the callbacks, so it needs only a small heap allocation and it leaves a much smaller footprint.

So Node.js indeed handled a lot of concurrent connections like a breeze. I thought to experiment a bit further. I introduced a 2 seconds sleep. That way, there will be many connections piling up and waiting to be responded.

ab –n 2500 –c 350
ab –n 2500 –c 350

PHP			27.3 requests/sec
Node.js		148.7 requests/sec

Amazing, isn’t it? Now, I am officially swept over by it.

Next

In the next blog, I will create a simple web application with Node.js and Riak. Meanwhile, if Node.js indeed aroused your curiosity, you can read more,

  1. Ryan’s presentation
  2. Node.js API
  3. How to node

Written by Animesh

November 11, 2010 at 2:09 pm

Posted in Technology

Tagged with , , , ,

Follow

Get every new post delivered to your Inbox.

Join 205 other followers