animesh kumar

WebSocket support in Android’s Phonegap apps

[tweetmeme source=”anismiles” only_single=false http://www.URL.com%5D

We are developing a small game which can be played from multiple users using variety of clients, web-browser, Android, iPhone, iPad etc. It’s like, there is a server and all clients connect to this server, and send and receive messages. We decided to use WebSocket for underlying connection between clients and server, and Phonegap to build clients. Our idea is to write the client once and then run it on variety of platforms. Since, Phonegap enables app development using HTML, CSS and JavaScripts, it generously fits into our requirement.

But Phonegap doesn’t support WebSocket yet, it’s in their Plan-of-Action for 1.x release though. So, it was needed to be addressed. I found Mathias Desloge’s PhoneGap-Android-HTML5-WebSocket project. It was good but it used old java.io.* packages. I would have preferred to use java.nio.* for better and efficient non-blocking behavior. So, I decided to write my own small library.

Library can be found here: websocket-android-phonegap.

How to use?

Copy Java source into your source folder.
Copy websocket.js in your assets/www/js folder

Attach com.strumsoft.websocket.phonegap.WebSocketFactory to WebView, like

	@Override
	public void onCreate(Bundle savedInstanceState) {
		super.onCreate(savedInstanceState);
		super.loadUrl("file:///android_asset/www/index.html");

		// attach websocket factory
		appView.addJavascriptInterface(new WebSocketFactory(appView), "WebSocketFactory");
	}

In your page, create a new WebSocket, and overload its method ‘onmessage’, ‘onopen’, ‘onclose’, like

	// new socket
	var socket = new WebSocket('ws://192.168.1.153:8081');

	// push a message after the connection is established.
	socket.onopen = function() {
	 alert('connected');
	};

	// alerts message pushed from server
	socket.onmessage = function(msg) {
	 alert(JSON.stringify(msg));
	};

	// alert close event
	socket.onclose = function() {
	 alert('closed');
	};

How it works?

When you create a new WebSocket object in your page, behind the scene, websocket.js delegates the responsibility to com.strumsoft.websocket.phonegap.WebSocketFactory to instantiate new com.strumsoft.websocket.phonegap.WebSocket object.

		// websocket.js
		// get a new websocket object from factory (check com.strumsoft.websocket.WebSocketFactory.java)
		this.socket = WebSocketFactory.getWebSocket(url);

WebSocketFactory simply instantiates a new WebSocket object, connects it to the designated server and returns the instance.

// com.strumsoft.websocket.phonegap.WebSocketFactory

public WebSocket getWebSocket(String url) throws URISyntaxException {
	WebSocket socket =  new WebSocket(appView, new URI(url));
	socket.connect();   // connects to server
	return socket;
}

Now, whenever an event occurs, say, ‘onmessage’, WebSocket class delegates that event to Javascript.

// com.strumsoft.websocket.phonegap.WebSocket

public void onMessage(String message) {
	appView.loadUrl(buildLoadData("message", message));
}
private String buildLoadData(String _event, String _data) {
	String _d =  "javascript:WebSocket.on" + _event + "(" + 
				"{"
				+ "\"_target\":\"" + webSocketId + "\"," + 
				"\"_data\":'" + data + "'" + 
				"}" + 
			")";
	Logger.log(_d);
	return _d;
}

Finally, ‘WebSocket.onmessage’ from websocket.js is called. It parses the payload, finds out the target WebSocket object, and calls the corresponding event on the target object with event data.

	// websocket.js
	// static event methods to call event methods on target websocket objects
	WebSocket.onmessage = function (evt) {
		WebSocket.registry[evt._target]['onmessage'].call(global, evt._data);
	}

That’s it!

Amendment

(Date: Thu Aug 25 12:40:52 IST 2011)

There was a serious bug! The Websocket connection runs in a separate thread to manage persistent state with the server. And the front end Javascript (websocket.js) stays within UI/Main thread. And Android doesn’t want other threads to communicate with UI thread directly. These threads must employ an additional thread to bridge the communication. So, here is the fix!

	// a message is sent to server! 
	public void send(final String text) {
		// new thread
		new Thread(new Runnable() {
			@Override
			public void run() {
				if (instance.readyState == WEBSOCKET_STATE_OPEN) {
					try {
						instance._send(text);
					} catch (IOException e) {
						instance.onError(e);
					}
				} else {
					instance.onError(new NotYetConnectedException());
				}
			}
		}).start();
	}

	// when a message is received
	public void onMessage(final String msg) {
		// post a new thread to View
		appView.post(new Runnable() {
			@Override
			public void run() {
				appView.loadUrl(buildJavaScriptData(EVENT_ON_MESSAGE, msg));
			}
		});
	}

Commit link:
https://github.com/anismiles/websocket-android-phonegap/commit/a7ccb815cce3a446c3ec92058187cdb20e5a41e8
https://github.com/anismiles/websocket-android-phonegap/commit/087f7a93d46f92cb037d2b451a4d253a65f5f015

Written by Animesh

February 3, 2011 at 11:52 am

Posted in Technology

Tagged with Android, JavaScript, Phonegap, WebSocket

WebSocket and node.js: why shud’ya care?

with 26 comments

[tweetmeme source=”anismiles” only_single=false http://www.URL.com%5D

Traditional HTTP messages are heavy. Every message is sent with HTTP headers. Now, let’s say you have an application that has a real-time component, like chat or some twitter client or may be some traffic analysis stuff. And let’s say you have around 100,000 users connected to your app. To make your app real-time, you need to have a mechanism which will enable server to push data almost as soon as this data becomes available. You could do it in two ways: Write a script which will connect to server every few seconds to check if there is any data. With each attempt, full set of HTTP headers moves back and forth between client and server. That’s not very efficient. To save yourself with all these bandwidth hassles, you could use a popular trick known as long-polling, where your browser connects to server and server holds the connection open until there is some data available to be pushed.

Now, let’s assume that there are 100,000 users connected to your app and every 10 seconds some data is sent from server to clients. Following HTTP specs, every time some data is sent, full set of headers are shared between client and server. This is how they look,

Request header

GET / HTTP/1.1
User-Agent: ...some long user agent string...
Host: animesh.org
Accept: */*

Response header

HTTP/1.1 200 OK
Date: Tue, 25 Jan 2011 17:32:19 GMT
Server: Apache
X-Powered-By: PHP/5.2.3
X-Pingback: http://animesh.org/endpoint/
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8

That’s approximately 350 bytes of data, per user every 10 seconds. That’s roughly 28,400,000 bits per second of network throughput for 100,000 users. Roughly 26.7 Mbps for only HTTP headers. Gosh!

WebSocket

WebSocket comes to resue. With web sockets, once a handshake is done between client and server, messages can be sent back and forth with a minimal overhead. That’s awesome. You do a handshake while establishing the connection, and of course handshaking needs all those HTTP headers, but after that, you only need to send the data… no headers. This greatly reduces the bandwidth usage and thus improves the performance. Let’s see how. This is how handshake headers look like,

Handshake Request header

GET /demo HTTP/1.1
Upgrade: WebSocket
Connection: Upgrade
Host: animesh.org
Origin: http://animesh.org
WebSocket-Protocol: sample

Handshake Response header

HTTP/1.1 101 Web Socket Protocol Handshake
Upgrade: WebSocket
Connection: Upgrade
WebSocket-Origin: http://animesh.org
WebSocket-Location: ws://animesh.org/
WebSocket-Protocol: sample

And now, the connection has been established and data can freely flow between server and client without having to exchange any HTTP headers until this connection is closed or broken and you do another handshake. Imagine how much bandwidth you are saving! Whoa!

Example

Let’s write a simple application to see and learn how this thing actually works. This application will have a server all the clients will connect to, and whenever one client writes something to the server, all clients will be notified.

Here is our server, written in Node.js. Let’s name it server.js

Note: Though you can very well write a web socket server using Node’s native APIs, however I chose to use Micheil Smith‘s node-websocket-server library. This library is simple, elegant and very easy to work with. It works by wrapping and extending Node’s server object.

var sys = require("sys");
// Library https://github.com/miksago/node-websocket-server
var	websocket = require('./lib/node-websocket-server/lib/ws/server');

// create web socket server
var server = websocket.createServer();
// listen on port 8078
server.listen(8078);

// when the server is ready
server.addListener("listening", function() {
  sys.log("Listening for connections on localhost:8078");
});

// when a traditional HTTP request comes
server.addListener("request", function(req, res) {
	res.writeHead(200, {
		"Content-Type" : "text/plain"
	});
	res.write("This is an example WebSocket server.");
	res.end();
});

// when a client websocket connects
server.addListener("connection", function(conn) {

	// when client writes something
	conn.addListener("message", function(message) {

		// iterate thorough all connected clients, and push this message
		server.manager.forEach(function(connected_client) {
			connected_client.write(JSON.stringify(conn.id + ": " + message));
        });
	});
});

Now, let’s write a simple client. We will create one HTML file and run it in Google Chrome. Let’s name is client.html

<!DOCTYPE html>
<html>
<head>
    <title>WebSocket - Simple Client</title>
    <script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.2.6/jquery.js"></script>
    <script type="text/javascript">

	$(function() {
		// bind form
		$('#payload-form').submit(function() {
			var payload = ($("input#payload").val());
			socket.send(payload);  // write to server
			return false;
		});

		// open websocket
		var socket = new WebSocket('ws://localhost:8078');

		socket.onopen = function() {
	    	// Web Socket is connected. send an initial random message.
	    	socket.send(Math.floor(Math.random()*11) + ' >> Hi, I am Mr. so-and-so!');
	    };
		// append to '#log' whatever server pushes.
		socket.onmessage = function(ev){
			msg = JSON.parse(ev.data);
			$('#log').append(JSON.stringify(msg) + '</br>');
		}
	})
    </script>
</head>
<body>
	<div id='payload-container'>
		<form id='payload-form'>
			<input type='text' name='payload' id='payload' value='Hello World' style="width:500px;"/>
			<input type='submit' value='push'/>
		</form>
	</div>

	<div id='log' style='display:block; border:1px solid lightgray;'></div>
</body>
</html>

Now, run your server, and open your client in multiple Chrome windows/tabs.

// run server
$ node server.js

That’s it! Was is fun? I will write more on how to establish WebSocket connections from a Java client in the next blog.

Written by Animesh

January 25, 2011 at 3:25 pm

Posted in Technology, Uncategorized

Tagged with HTTP, JavaScript, Long Polling, Node.js, WebSocket

i’m a joke

with one comment

[tweetmeme source=”anismiles” only_single=false http://www.URL.com%5D

I’m a joke
may be, a laugh.

Don’t trust me
not worth it
I come cheap
and easy
like wild scrubs
at every nook
and cranny.

Never taken home
but, used
and cast away
and laughed at
worst, ridiculed sometimes
I’m not worth it.

Written by Animesh

January 19, 2011 at 3:29 pm

Posted in Poetry

Tagged with joke, laugh, Poem, Poetry, ridicule, what am i

Maven, SQL and ordered execution

Steps of execution

Drop everything
Create users
Create schemas
Load data
Load upgrade data

They used sql-maven-plugin underwhich they had multiple execution tasks. The whole thing was very untidy and I didn’t like it.

Summary

They used <fileSets> to import sql files. FileSets are an elegant choice if you want to import files with ANT like wildcard patterns. But the files are alphabetically sorted. So, unless you don’t care about the order in which the files will be executed, you should avoid using FileSets. In their case, they wanted to run ‘drop_users_databases.sql’ before ‘create_users_databases.sql’. With FileSets, it was impossible to do, they would be re-ordered just the opposite. Solution? They should have used <srcFiles>, but instead they created 2 separate <execution> tasks to run in order. Too much verbosity. Eh?
Another issue was with loading upgrade data. Now, they follow agile methodologies and keep updating their DB schemas, and so after few iterations, before they merge the changes in the main script, they end up with a number of files like upgrade-schema-2.0.4.sql, upgrade-schema-2.0.5.sql, upgrade-schema-2.0.5.1.sql and so on. Pain is, they don’t have any automated mechanism to execute these files in order, so the developers would be forced to do it manually. If the number of these files were low, that wouldn’t have been a problem… but how about thirty or forty such files? Pissed off.

Strategy

Kick <fileSets> out of the window
Use <srcFiles> instead (because I cared more about order)
Logically partition the operations
Create a new profile to help run these DB executions at will

Code

<profiles>  
<profile>  
	<!-- profile id -->
	<id>init-db</id> 
	<build>
		<plugins>
			<plugin>
				<groupId>org.codehaus.mojo</groupId>
				<artifactId>sql-maven-plugin</artifactId>
				
				<!-- postgresql dependencies -->	
				<dependencies>
				  <dependency>
					<groupId>postgresql</groupId>
					<artifactId>postgresql</artifactId>
					<version>8.3-603.jdbc4</version>
					<scope>clean</scope>
				  </dependency>
				</dependencies>
				
				<!-- DB Configuration -->
				<configuration>
					<url>jdbc:postgresql:my_db</url>
					<driver>org.postgresql.Driver</driver>
					<username>user</username>
					<password>pass</password>
					<autocommit>true</autocommit>							
				</configuration>
				
				<executions>
					<!-- 1. drop everything, create users and load data -->
					<execution>
						<id>drop-and-create-database</id>
						<phase>clean</phase>
						<goals><goal>execute</goal></goals>
						<configuration>
							<srcFiles>
								<srcFile>src/main/sql/drop_users_databases.sql</srcFile>						
								<srcFile>src/main/sql/create_users_databases.sql</srcFile>						
								<srcFile>src/main/sql/create_schema.sql</srcFile>						
								<srcFile>src/main/sql/load_data.sql</srcFile>						
								<srcFile>src/main/sql/test_data.sql</srcFile>						
							</srcFiles>    															
						</configuration>
					</execution>			
					<!-- 2. run all upgrade scripts -->
					<execution>
						<id>upgrade-schema</id>
						<phase>clean</phase>
						<goals><goal>execute</goal></goals>
						<configuration>
							<srcFiles>
								<srcFile>src/main/sql/upgrade-schema-2.0.6.sql</srcFile>						
								<srcFile>src/main/sql/upgrade-schema-2.1.5.sql</srcFile>						
								<srcFile>src/main/sql/upgrade-schema-2.1.7.sql</srcFile>						
								<srcFile>src/main/sql/upgrade-schema-2.1.8.sql</srcFile>
								<srcFile>src/main/sql/upgrade-schema-2.2.1.sql</srcFile>
								<srcFile>src/main/sql/upgrade-schema-3.0.0.sql</srcFile>
								<srcFile>src/main/sql/upgrade-schema-3.0.1.sql</srcFile>
								<srcFile>src/main/sql/upgrade-schema-3.0.2.sql</srcFile>
								<srcFile>src/main/sql/upgrade-schema-3.0.3.sql</srcFile>
								<srcFile>src/main/sql/upgrade-schema-3.0.4.sql</srcFile>
								<srcFile>src/main/sql/upgrade-schema-3.0.5.sql</srcFile>
								<srcFile>src/main/sql/upgrade-schema-3.0.6.sql</srcFile>
								<srcFile>src/main/sql/upgrade-schema-3.0.6.1.sql</srcFile>
								<srcFile>src/main/sql/upgrade-schema-3.0.8.sql</srcFile>
								<srcFile>src/main/sql/upgrade-schema-3.0.9.sql</srcFile>
								<srcFile>src/main/sql/upgrade-schema-3.0.10.sql</srcFile>
								<srcFile>src/main/sql/upgrade-schema-3.0.11.sql</srcFile>
							</srcFiles>    						
						</configuration>
					</execution>
				</executions> 
			</plugin>
		</plugins>  
	</build>  
</profile>  
</profiles>

How to run?

Note that I have bound the executions with phase ‘clean’ within a profile with id ‘init-db’. So, I just need to let maven know about the phase and profile, like this:

mvn -Pinit-db clean

I hope this will help someone in need. I will post more adventurous code cleaning stuffs as I will encounter them. Cheers!

Written by Animesh

December 28, 2010 at 12:31 pm

Posted in Technology, Uncategorized

Tagged with fileSets, Maven, maven ordered sql execution, Postgres, SQL, sql-maven-plugin, srcFiles

Sisyphus: slow – silent – violent – slow

with 3 comments

[tweetmeme source=”anismiles” only_single=false http://www.URL.com%5D

Slow
is his climb
up the hill
tethered to Gods
stone Gods
full of gaze and words.

Silent
is his manner
on the top
he sits alone
drenched in sweat and tear
until he is pushed down.
Gods laugh and merry
fingers pointed at
their falling dolly.

Violent
is his fall
down into creeks
great Sisyphus
turned into stone
by the light of morning
doomed to disappear
without meaning.

He knows not
why he does what he does
his destiny chained before
he was even born.

Written by Animesh

December 17, 2010 at 2:50 pm

Posted in Poetry

Tagged with Existentialism, Fall, Poem, Poetry, Rise, Silent, Sisyphus, Violent

Using node.js and jquery to scrape websites

with 31 comments

[tweetmeme source=”anismiles” only_single=false http://www.URL.com%5D

I have been playing with Node.js for last few days and am totally head over heels. Madly in love! It’s awesome to know how much you can build with how little. I have ranted about Node.js earlier and did some comparisons too. It’s fast, really fast. And it’s plain old Javascript we have been using for last many-many years now. I thought I would build a real world application with it to see how much it stands the water. Earlier I thought to make a something on top of Riak, but that felt like running too fast. Instead I picked up something simpler to deal only with Node.js. Now, I think it would make sense to brush up on some Javascript fundaments.

Javascript objects

Yes. Javascript is an object oriented language. But it’s different from your traditional classical OO languages like Java and Ruby.

One obvious difference is in syntax, and the other major one is that
Other languages have methods while Javascript has first-class functions.

First class functions. What does it mean? It means that they are expressions and can be assigned to a variable and can be easily passed around. Does it sound like a closure in Ruby? It does indeed. Well thought, it’s a little more than that. I will come to this again some other time. For now, let’s find out how we can create objects and use them? I will focus tell you two ways to do it.

The Classical way

Here is a constructor function for object Shape. It accepts two parameters and saves them into respective instance variables.

function Shape(width, height) {
	this.width = width;        // instance variable width
	this.height = height;      // instance variable height
	this.getArea = function() {     // function to calculate Area, notice the assignment.
		return this.width * this.height;
	};
}

var rectangle = new Shape (2, 5);    // instantiate a new Shape object
console.log (rectangle.getArea());   // calculate the area: 10

Javascript uses prototype chains to add new functions or variables to an object on the fly. You should read more about this thing here: http://www.packtpub.com/article/using-prototype-property-in-javascript

I will add a new function to calculate the perimeter of my Shape object.

Shape.prototype.getPerimiter = function() {
	return 2 * (this.width + this.height);
}

console.log (rectangle.getPerimiter());

What happened here? Did you notice that even if ‘rectangle’ was already defined it could access the newly added function to calculate perimeter. Wasn’t that awesome? Javascript is intelligent, dude. If you ask for something, it looks into the current object, and if not found, it would go up the object’s prototype chain to look for what you asked for. And since, we added the new function to the prototype, it’s found unscrupulously. There is a lot of interesting stuffs going on here, you must read about it. I would suggest buying Manning’s Javascript Ninja, if you are really serious about it.

Now, let’s try to extend Shape. I will create a new constructor function for Square.

function Square(side){
	this.width = side;
	this.height = side;
}

Square.prototype = new Shape();

var sq = new Square(4);
console.log(sq.getArea());

I created a new Square class and overrode its prototype chain with that of Shape’s. I got all the functionalities and behavior of Shape. Easy… huh?

The Prototypal way

Let’s do the same thing without using constructors now. Just plain prototypes!

var Shape = {
	getArea: function () {
		return this.width * this.height;
	},
	getPerimiter: function() {
		return 2 * (this.width + this.height);
	}
};

var rec = Object.create(Shape);
rec.width = 2;
rec.height = 5;

console.log(rec.getArea());

Now that you have the Shape object, you can easily add new functions to its prototype chain, or even inherit it to another object. However I find this approach a little clumsy. I would rather stick to the classic way. You choose your pick. To each his own!

Node.js Modules

Node uses the CommonJS module system. Node has a simple module loading system where files and modules are in one-to-one correspondence. Here is the API: http://nodejs.org/api.html. Above example can be ported to Node.js module ecosystem like explained below:

First, create Shape.js

function Shape(width, height) {
	this.width = width;        // instance variable width
	this.height = height;      // instance variable height
	this.getArea = function() {     // function to calculate Area, notice the assignment.
		return this.width * this.height;
	};
}

// Export this module
exports.module = Shape;

And now, use this

var Shape = require('./Shape');

var rectangle = new Shape (2, 5);
console.log (rectangle.getArea());

Node.js loads and runs each module in a sandbox which staves off any possible name collision. That’s the benefit you get apart from having a properly structured code base.

Writing a screen scraping application

I will write a simple application to capture details from various websites. The beautiful thing is Javascript has been handling DOM objects for years. In fact Javascript was created to handle DOM objects. No wonder that it’s more mature than any other html parsing library. Also, given that there are many elegant frameworks like Prototype, Mootools, JQuery etc. available to use, scraping websites with Node.js should be easy and fun. Let’s do it. Let’s write an application to collect data from various book selling websites.

Create a basic searcher.js module. It would provide the fundamental skeleton for writing website specific tool.

// External Modules
var request = require('ahr'), // Abstract-HTTP-request https://github.com/coolaj86/abstract-http-request
sys = require('sys'),		// System
events = require('events'),	// EventEmitter
jsdom = require('jsdom');	// JsDom https://github.com/tmpvar/jsdom

var jQueryPath = 'http://code.jquery.com/jquery-1.4.2.min.js';
var headers = {'content-type':'application/json', 'accept': 'application/json'};

// Export searcher
module.exports = Searcher;

function Searcher(param) {
	if (param.headers) {
		this.headers = param.headers;
	} else {
		this.headers = headers;
	}

	this.merchantName = param.merchantName;
	this.merchantUrl = param.merchantUrl;
	this.id = param.merchantUrl;
}

// Inherit from EventEmitter
Searcher.prototype = new process.EventEmitter;

Searcher.prototype.search = function(query, collector) {
	var self = this;
	var url = self.getSearchUrl(query);

	console.log('Connecting to... ' + url);

	request({uri: url, method: 'GET', headers: self.headers, timeout: 10000}, function(err, response, html) {
		if (err) {
			self.onError({error: err, searcher: self});
			self.onComplete({searcher: self});
		} else {
			console.log('Fetched content from... ' + url);
			// create DOM window from HTML data
			var window = jsdom.jsdom(html).createWindow();
			// load jquery with DOM window and call the parser!
			jsdom.jQueryify(window, 'http://code.jquery.com/jquery-1.4.2.min.js', function() {
				self.parseHTML(window);
				self.onComplete({searcher: self});
			});
		}
	});
}

// Implemented in inhetired class
Searcher.prototype.getSearchUrl = function(query) {
	throw "getSearchUrl() is unimplemented!";
}
// Implemented in inhetired class
Searcher.prototype.parseHTML = function(window) {
	throw "parseForBook() is unimplemented!";
}
// Emits 'item' events when an item is found.
Searcher.prototype.onItem = function(item) {
	this.emit('item', item);
}
// Emits 'complete' event when searcher is done
Searcher.prototype.onComplete = function(searcher) {
	this.emit('complete', searcher);
}
// Emit 'error' events
Searcher.prototype.onError = function(error) {
	this.emit('error', error);
}

Searcher.prototype.toString = function() {
	return this.merchantName + "(" + this.merchantUrl + ")";
}

Now, code to scrape rediff books. I will name it searcher-rediff.js

var Searcher = require('./searcher');

var searcher = new Searcher({
	merchantName: 'Rediff Books',
	merchantUrl: 'http://books.rediff.com'
});

module.exports = searcher;

searcher.getSearchUrl = function(query) {
	return this.merchantUrl + "/book/" + query;
}

searcher.parseHTML = function(window) {
	var self = this;

	window.$('div[id="prod_detail"]').each(function(){
		var item  = window.$(this);

		var title = item.find('#prod_detail2').find('font[id="book-titl"]').text();
		var link = item.find('#prod_detail2').find('a').attr('href');
		var author = item.find('#prod_detail2').find('font[id="book-auth"]').text();
		var price = item.find('#prod_detail2').find('font[id="book-pric"]').text();

		self.onItem({
			title: title,
			link: link,
			author: author,
			price: price
		});
	});
}

Run it now.

var searcher = require('./searcher-rediff');

searcher.on('item', function(item){
	console.log('Item found >> ' + item)
});

searcher.on('complete', function(searcher){
	console.log('searcher done!');
});

searcher.search("Salman");

What I did?

First, I wrote a skeleton searcher class. This class makes the
1. request to the merchant’s search URL (this URL is built in getSearchUrl function), then
2. fetches the html data from here, then
3. by using ‘jsdom’ module creates DOM’s window object which further
4. gets parsed by ‘jquery’, and
5. function parseHTML is executed.
Second, I wrote another class that extends from searcher and intends to interact with Rediff. This class implements,
1. getSearchUrl function to return appropriate search URL to connect to, and
2. parseHTML function to scrape data from DOM’s window object. This is very interesting. You can use all your jquery knowledge to pick elements and parse data from inside the elements. Just like you did in old days when you added styles or data to random elements.

Now, if I want to search say Flipkart along with Rediff, I just need to write a Flipkart specific implementation, say searcher-flipkart.js

var Searcher = require('./searcher');

var searcher = new Searcher({
	merchantName: 'Flipkart',
	merchantUrl: 'http://www.flipkart.com'
});

module.exports = searcher;

searcher.getSearchUrl = function(query) {
	return this.merchantUrl + "/search-book" + '?query=' + query;
}

searcher.parseHTML = function(window) {
	var self = this;

	window.$('.search_result_item').each(function(){
		var item  = window.$(this);

		var title = item.find('.search_result_title').text().trim().replace(/\n/g, "");
		var link = self.merchantUrl + item.find('.search_result_title').find("a").attr('href');
		var price = item.find('.search_results_list_price').text().trim().replace(/\n/g, "");

		self.onItem({
			title: title,
			link: link,
			price: price
		});
	});
}

I have also written a Runner class to execute the multiple searchers in parallel and collect results into an array. You can find the entire source code here: https://github.com/anismiles/jsdom-based-screen-scraper Chill!

What’s next? I am going to write on Node.js pretty feverishly. You better keep posted. How about a blog engine on Riak?

Written by Animesh

November 29, 2010 at 3:47 pm

Posted in Technology

Tagged with data scraping, JavaScript, JQuery, Node.js, web scraping

WTF is node.js and what’s the fuss all about?

with 6 comments

[tweetmeme source=”anismiles” only_single=false http://www.URL.com%5D

You must have been hearing about Node.js for quite some time. Me too! Everybody is talking about it, writing about it. I am tired. So I think I should try it myself. By definition, node.js is a library written for Google’s V8 that does evented I/O. V8 is a JavaScript engine written in C++ being used in Google Chrome, and it’s veryveryvery fast.

Point to note here is evented I/O. Traditionally you would wait for input/output to finish before moving further with your execution, but in evented environment you don’t wait, rather you get informed about I/O completion and meanwhile you could do whatever you want. Cool eh? Let’s cement it with an example. Say, you want to find out the last edited time of a file. Traditionally, you would do it this way:

// read file
Stat stat = readFileStat( ‘file-path’ );
// operation
useStatInfo( stat );

In evented environment, you would do it this way:

readFileStat( ‘file-path’, function ( result ) {
	// operation
	useStatInfo( result );
} );

In this case, once the file is read the result is passed to another function. You don’t have to wait. Do you see that? This enables evented systems to handle larger number of requests simultaneously, because there is no thread to spawn, no heap to allocate.

You have been doing this kind of things with Closures and Java anonymous functions since eternity. But JavaScript makes it all more natural and simpler. And that’s where Node.js shines. Let me list down the main things:

It’s JavaScript. JavaScript’s anonymous functions and closures is perfect for callback definitions.
Everything everywhere is asynchronous. There are no threads. Everything has been built up from scratch and everything is event driven.
No old baggage. That is, nothing has been carried over from the old synchronous, threaded world. That’s a good thing though a little limiting right now since there aren’t many packages. But that would soon be taken care of. There is a huge community toiling here.
Focus on dealing with data. You don’t have to focus on networks or protocols. Just focus on your data and your flow. Simple?
It’s small.
It’s fast.
It’s easy.

Now don’t start thinking of Node.js as another framework like Rails, Django, Sinatra etc. Don’t. Node.js doesn’t only help you build a web application; it goes further and helps you to build an application server instead. Node.js is framework to build scalable network programs. It could run on HTTP protocol or on TCP or whatever. You don’t have to worry about it.

Installation

UPDATE: You should use Node Version Manager instead of bare installation.

I am using ubuntu-9.10-desktop-i386 and Oracle’s VirtualBox on Windows 7. Here is a nice tutorial to do it yourself: http://www.psychocats.net/ubuntu/virtualbox. I think you could also use Cygwin to run Node.js but I don’t prefer that personally. Linux feels way much easier.

Ensure you have all the essentials necessary.

sudo apt-get update
sudo apt-get install git-core
sudo apt-get install build-essential

Clone the Node.js repository:
```
git clone git://github.com/ry/node.git
```

Now configure and install:

cd node
./configure && make && sudo make install

That’s it. You are done. Now, let’s make ourself a small and pretty HTTP server.

var sys  = require("sys"),
http = require("http");

http.createServer(function(request, response) {
	response.sendHeader(200, {"Content-Type": "text/html"});
	response.write("Hello World!");
	response.close();
}).listen(8080);

sys.puts("Server running at http://localhost:8080/");

This script uses two modules, sys and http to create an HTTP server. The anonymous function being passed to http.createServer is called at each request. Save this script to helloworld.js file.

Now run this server,

node helloworld.js

Go to http://localhost:8080/ in your browser, you will see “Hello World!”

Benchmark

It all might have seemed so simple, eh? I know. It stunned me too. So I decided to benchmark it. I created identical apps in PHP (using PHP5 with Apache2 mod_php) and Node.js. Both apps rendered a single html page with similar content. I used Apache Benchmark tool to run the comparison.

ab –n 10000 –c 10
ab –n 10000 –c 10

PHP			2988.3 requests/sec
Node.js		5391.2 requests/sec

Node.js wins with a huge margin. Wondering why? Remember Node.js is an event driven framework, so unlike other servers like Apache it doesn’t open a socket or spawn a thread or even use a pool of threads, rather it has only a single thread running an event loop that executes the callbacks, so it needs only a small heap allocation and it leaves a much smaller footprint.

So Node.js indeed handled a lot of concurrent connections like a breeze. I thought to experiment a bit further. I introduced a 2 seconds sleep. That way, there will be many connections piling up and waiting to be responded.

ab –n 2500 –c 350
ab –n 2500 –c 350

PHP			27.3 requests/sec
Node.js		148.7 requests/sec

Amazing, isn’t it? Now, I am officially swept over by it.

In the next blog, I will create a simple web application with Node.js and Riak. Meanwhile, if Node.js indeed aroused your curiosity, you can read more,

Ryan’s presentation
Node.js API
How to node

Written by Animesh

November 11, 2010 at 2:09 pm

Posted in Technology

Tagged with evented I/O, JavaScript, Node.js, Riak, Ryan

Fiddling with Cassandra 0.7-beta2

with 11 comments

[tweetmeme source=”anismiles” only_single=false http://www.URL.com%5D

I have been dilly-dallying with Cassandra 0.7 for quite some time. My intensions were to build Cassandra 0.7 support into Kundera (a JPA 1.0 compliant ORM library to work with Cassandra). I must admit that often times I was very upset about the lack of documentation on Cassandra and libraries that I had planned to use, Pelops and Hector. So I decided that I should post my findings for your help.

Now since Cassandra 0.7 beta-2 has been released, I will concentrate my talk around this release.

Installing Cassandra 0.7

Download 0.7.0-beta2 (released on 2010-10-01) from here: http://cassandra.apache.org/download/
Extract the jar to some location say, D:\apache-cassandra-0.7.0-beta2
Set CASSANDRA_HOME environment variable to D:\apache-cassandra-0.7.0-beta2
You can also update you PATH variable to include $CASSANDRA_HOME/bin
Now, to start the server you would need to run this command:
> cassandra -start

That’s it.

Okay, since you’ve gotten the basics right. I would like to tell you few important things about this new Cassandra release.

Unlike .6.x versions, 0.7.x employs YAML instead of XML, that is, you are going to find cassandra.yaml instead of storage-conf.xml.
0.7 allows you to manage entire cluster, Keyspaces, Column Families everything from Thrift API.
There is also support for Apache Avro. (I haven’t explored this though, so no more comment)
0.7 comes with secondary index features. What does it mean? It means, you can look for your data not just by Row Identifier, but also by Column Values. Interesting huh?

If you look into cassandra.yaml, you will find a default Keyspace1 and few Column Families too, but Cassandra doesn’t load them. I am not sure why. Theoretically, everything defined in the yaml file should have been created at the start. I am going to dig around this. Anyways for now, let’s create some Keyspaces and few Column Families ourselves. We can use Thrift API (and Cassandra client which uses Thrift itself) or JMX interface.

Dealing with Cassandra Client

Cassandra comes with a command line interface tool cassandra-cli. This tool is really really impressive. You should certainly spend some time with it.

Start the client,
> cassandra-cli
Connect to server,
> [default@unknown] connect localhost/9160
Create a new keyspace, (I picked this up from cassandra.yaml)
> [default@unknown] create keyspace Keyspace1 with replication_factor=1
Create Column Families,
> [default@unknown] use Keyspace1
> [default@Keyspace1] create column family Standard1 with column_type = ‘Standard’ and comparator = ‘BytesType’
Describe keyspace,
> [default@Keyspace1] describe keyspace Keyspace1

And so on. Use ‘help’ to learn more about cassandra-cli.

JConsole

As I mentioned above, you can also use JMX to check what Keyspaces and Column Families exist in your server. But there is a little problem. Cassandra does not come with the mx4j-tools.jar, so you need to download and copy this jar to Cassandra’s lib folder. Download it from here: http://www.java2s.com/Code/Jar/MNOPQR/Downloadmx4jtoolsjar.htm

Now, just run ‘jconsole’ and pick ‘org.apache.cassandra.thrift.CassandraDaemon’ process.

Java clientèle

Well, there are two serious contenders, Pelops and Hector. Both have released experimental support for Version 0.7. I had worked with Pelops earlier, so I thought this is time to give Hector a chance.

Download Hector (Sync release with Cassandra 0.7.0-beta2) from here: http://github.com/rantav/hector/downloads
You can also use ‘git clone‘ to download the latest source.
Hector is a maven project. To compile the source into ‘jar’, just extract the release and run,
> mvn package

My first program

To start with Hector, I thought to write a very small code to insert a Column and then later fetch it back. If you remember, in the previous section, we already created a keyspace ‘Keyspace1‘ and a Column Family ‘Standard1‘, and not we are going to make use of them.

import me.prettyprint.cassandra.serializers.StringSerializer;
import me.prettyprint.hector.api.Cluster;
import me.prettyprint.hector.api.Keyspace;
import me.prettyprint.hector.api.beans.HColumn;
import me.prettyprint.hector.api.exceptions.HectorException;
import me.prettyprint.hector.api.factory.HFactory;
import me.prettyprint.hector.api.mutation.Mutator;
import me.prettyprint.hector.api.query.ColumnQuery;
import me.prettyprint.hector.api.query.QueryResult;

public class HectorFirstExample {

	public static void main(String[] args) throws Exception {

		String keyspaceName = "Keyspace1";
		String columnFamilyName = "Standard1";
		String serverAddress = "localhost:9160";

		// Create Cassandra cluster
		Cluster cluster = HFactory.getOrCreateCluster("Cluster-Name", serverAddress);
		// Create Keyspace
		Keyspace keyspace = HFactory.createKeyspace(keyspaceName, cluster);

		try {
			// Mutation
			Mutator mutator = HFactory.createMutator(keyspace, StringSerializer.get());
			// Insert a new column with row-id 'id-1'
			mutator.insert("id-1", columnFamilyName, HFactory.createStringColumn("Animesh", "Kumar"));

			// Look up the same column
			ColumnQuery columnQuery = HFactory.createStringColumnQuery(keyspace);
			columnQuery.setColumnFamily(columnFamilyName).setKey("id-1").setName("Animesh");
			QueryResult> result = columnQuery.execute();

			System.out.println("Read HColumn from cassandra: " + result.get());
		} catch (HectorException e) {
			e.printStackTrace();
		}
	}
}

That was simple. By the way, ‘Nate McCall‘ has written a set of example classes to help us understand Hector with Cassandra 0.7. Check it out here: http://github.com/zznate/hector-examples

I am working towards introducing Cassandra 0.7 support in Kundera, and will be publishing my findings intermittently.

Written by Animesh

October 14, 2010 at 9:26 pm

Posted in Technology

Tagged with Cassandra, Cassandra-0.7, Hector, HPC, NoSql, Pelops

Again

with 2 comments

[tweetmeme source=”anismiles” only_single=false http://www.URL.com%5D

Again
the heart is broken
in thousands pieces, she says
and this time
there is no turning back
this is no hole, she sighs, to fill up
this is a big crack.

Again
across the window
I see things that glow
angels flying down to take me, she says
I’m going away, farther than you can reach
time and again, I tried to repair
nothing touches me now, no love, no beseech.

Again
I sweep the floor
dirt and tears come along
I know what to do
get a box and fix it up with some glue
you can’t fix it, don’t you see,
she growls, you can’t un-rape me.

Written by Animesh

September 28, 2010 at 3:09 pm

Posted in Poetry

Tagged with Glue, heart, love, Poem, Poetry

Lost and never found

with one comment

[tweetmeme source=”anismiles” only_single=false http://www.URL.com%5D

All that I am or I pretend
sullen dreams of bleak end

Lost and never found
fallen leaves on the ground

Truth is a lie in disguise
solemn words of a skewed voice

Flowers sprawled by the riverbed
roots entangled into the mud

I seek love and love seeks a clown
run away before you’re drowned

Written by Animesh

September 27, 2010 at 5:26 pm

Posted in Poetry

Tagged with Found, lost, love, Poem, Poetry

How to use?

How it works?

Amendment

WebSocket

Example

Steps of execution

Summary

Strategy

Code

How to run?

Javascript objects

The Classical way

Node.js Modules

Writing a screen scraping application

What I did?

Installation

Benchmark

Next

about me

Pages

categories

recent posts

archives

calendar

tags

search my blog

follow me

email subscription

disclaimer