Saturday, October 30, 2010

Scale up your skills, not the hardware

It is not a big secret that good programmers keep their saws sharp all the time. If you want to improve your mental skills a good place to start is Project Euler. I found it while browsing other programmers' blogs and immediately got hooked up. Project Euler is a place where you can challenge yourself (and others) in solving various mathematical and computer programming problems. The collection of problems is constantly growing and you can find them ranging from very easy to really difficult. When you provide the correct solution to a problem, a forum dedicated to it becomes revealed and you can see how other people managed to solve it. Sometimes you can also get a bonus in a form of paper explaining the problem and the best known solutions in details. You can learn a lot from these forums since people share there both their ideas and source code written in different programming languages; some of them don't even use a computer, all they need to find a solution is paper and pencil.

What I particularly like about Project Euler is that most of the problems cannot be solved by brute force. They can make you check hundreds or thousands of possible combinations, but still require a more clever approach. A classic example are problems 18 and 67, where you need to find a path from the top of the number pyramid to the bottom, so that the total sum of numbers on the path is the largest. While the pyramid in problem 18 is rather small and you can find the solution by trying every of 16384 possible paths, it is impossible to solve problem 67 the same way. To solve it you need to turn the problem upside down - literally, since you can easily get the correct solution if you start searching for the best path from the bottom of the pyramid to the top, reducing the number of combinations with every step.

You may wonder how mathematics can help you in facing every day challenges as a programmer. Imagine you work for an on-line store with millions of CD records on stock, and you provide a browser with a dynamic list of products based on different criteria, such as genre, year and price. Now you want to show to your customers the most popular product within the range they choose. Getting all items satisfying the selected criteria from a database and finding the most popular one every time someone browses the list will quickly kill your back-end. On the other hand, it is impossible to generate such data in advance for every possible combination of search parameters. You can either expand your server farm (and possibly move to the cloud to reduce costs) or you can use some statistical tools to help you do the job. The trick is to find a representative sample size for a given population of products. Assuming that N is a total population, z is a statistical factor for desired confidence level, p is an estimated proportion of a given variable within the population, and d is admissible error level, we can get sample size n with the following formula:
x = z^2 * (p * (1 - p))
n = N * x / ((N - 1) * d^2 + x)
The z factor is constant and equals 1.96 for the 95% confidence level and 2.58 for the 99% confidence level. Assuming that a selected range of CD records contains one million uniformly distributed products and you can tolerate a 5% error, it turns out that in 95% cases you get the most popular one correctly by analysing only 385 randomly chosen items from the list, since:
x = 1.96^2 * (0.5 * (1 - 0.5)) = 0.9604
n = 1000000 * 0.9604 / ((1000000 - 1) * 0.05^2 + 0.9604) =~ 384.013
To be 99% sure that no more than 1% results are incorrect, you'd need to analyse 16369 records.

You can learn more about calculating sample sizes from this article at Quirks - I took the above formula from it. And remember: while it's true that machine cycles are cheap and people cycles are not, never forget that people cycles are worth billions of machine ones.

Saturday, July 31, 2010

MapReduce with Gambit Scheme & Termite

Termite is a library that runs on top of Gambit Scheme and implements concurrency model inspired by Erlang (there is also a Chicken Scheme port, but it's so far incomplete). Since I have already posted about Termite some time ago, I will not go into details how to write distributed applications with it. Instead, I want to show you how to use Termite to implement a simple MapReduce algorithm.

A trivial implementation uses a simple worker process that executes some code, and a pmap function which spawns multiple workers and collects the results:
(define (worker)
  (let* ((msg (?)) (pid (car msg)) (fun (cadr msg)) (arg (caddr msg)))
    (! pid (fun arg))))

(define (pmap fun lst)
  ; spawn workers
  (for-each 
    (lambda (x) (let ((p (spawn worker))) (! p (list (self) fun x))))
    lst)
  ; collect results
  (let loop ((i (length lst)) (result '()))
    (if (= 0 i) (reverse result) (loop (- i 1) (cons (?) result)))))
So far it works similar to its Erlang equivalent. However, this version does not guarantee (just like the Erlang one) that you get the results in the same order as the arguments passed to pmap. Let's consider function f, which holds execution of the current thread for a given number of seconds:
(define (f x) (thread-sleep! x) x)
If you run the following code:
(pmap f (list 1 2 3 2 1))
The result will be:
(1 1 2 2 3)
Since the first results on the list will come from the processes that finished first.

A solution to this problem is to make a single worker process using a sequencer to mark spawned threads. Numbers generated by the sequencer are then used to sort the results to achieve the correct order of the result list:
(define (worker)
  ; init sequencer
  (let loop ((n 0))
    (let ((msg (?)))
      ; terminate worker on request
      (if (not (eqv? 'exit (car msg)))
        (let ((pid (car msg)) (fun (cadr msg)) (arg (caddr msg)))
          ; start a new thread
          (spawn (lambda () (! pid (cons n (fun arg)))))
          ; increase sequencer
          (loop (+ n 1)))))))

(define (pmap fun args)
  ; start worker
  (let ((p (spawn worker)) (result '()))
    ; send data to the worker
    (for-each (lambda (x) (! p (list (self) fun x))) args)
    ; collect results
    (let loop ((i (length args)) (lst '()))
      (if (= 0 i)
        (set! result lst)
        (loop (- i 1) (cons (?) lst))))
    ; terminate worker
    (! p (list 'exit))
    ; sort results
    (let loop ((in (qsort result <)) (out '()))
      (if (null? in)
        out
        (loop (cdr in) (cons (cdar in) out))))))
You can use any function you wish to sort the results. I used a modified version of quicksort found at Rosetta Code:
(define (qsort l gt?)
  (let q ((l l))
    (if (null? l)
      l
      (let ((s (split-by (cdr l) (lambda (x) (gt? (car x) (caar l))))))
        (append (q (car s)) (list (car l)) (q (cdr s)))))))

(define (split-by l p)
  (let loop ((low (list)) (high (list)) (l l))
    (if (null? l)
     (cons low high)
     (if (p (car l))
       (loop low (cons (car l) high) (cdr l))
       (loop (cons (car l) low) high (cdr l))))))
Now when you run the test again, you get the result list in the same order as the argument list, no matter what the execution time of single processes was:
(pmap f (list 1 2 3 2 1))

(1 2 3 2 1)
You can download the pmap library here.

Monday, July 19, 2010

Using trampolines in C

In my previous post I presented my implementation of trampoline code in newLISP. Since this technique is also used in non-functional programming languages, I prepared an example in C.

Let's assume we have two mutually recursive functions f1 and f2:
#include <stdio.h>

void f1(int n);
void f2(int n);

void f1(int n) {
  printf("%d\n", n);
  if (n == 0)
    printf("Blastoff!\n");
  else
    f2(n);
}

void f2(int n) {
  f1(n-1);
}

int main() {
  f1(1000000);
}
When you compile this code without any optimizations provided by a compiler and try to run it you will most probably quickly get a segmentation fault (or similar error, depending on your platform). What happens here is the application running out of stack for storing function callbacks.

In Lisp you can solve this problem by rewriting functions to return continuations instead of values. In C you can simulate continuations by the following means:
1. Making functions to operate on pointers to variables instead of actual variables, which allows the application to preserve their values between function calls.
2. Returning a pointer to the next function to be called instead of calling it directly, which makes the use of application stack unnecessary.

So rewritten f1 and f2 can be presented in the following form:
#include <stdio.h>

typedef void *(*Fun)(int *n);

void *f1(int *n);
void *f2(int *n);

void *f1(int *n) {
  printf("%d\n", *n);
  if (*n == 0) {
    printf("Blastoff!\n");
    return NULL;
  } else {
    return f2;
  }
}

void *f2(int *n) {
  *n = *n - 1;
  return f1;
}

int main() {
  int n = 1000000;
  Fun f = f1;
  while (f != NULL) {
    f = (Fun)f(&n);
  }
  return n;
}
Now the main function iteratively goes through subsequent function calls until the halting condition (NULL) is met. At the end of iteration variable n holds 0.

As a side note, gcc is able to optimize the code of mutually recursive functions quite effectively. If you compile the first example with gcc -O2 the resulting code will not crash, because the compiler replaces all call instructions with jmp.

P.S. I would like to thank Johnny, my friend and a true programming expert, who inspired me to write this post. I was also inspired by Carlos Oliveira's post on trampolines.

Thursday, July 15, 2010

Advanced recursion in newLISP

In my recent post about newLISP I mentioned that it does not support tail call optimization. In fact, many Lisps don't. As Bill Six pointed out even the ANSI standard of Common Lisp standard does not mandate (unlike Scheme) tail-call elimination provided by the language implementation, although it seems that all well-known ANSI Common Lisp compilers do it anyway.

I was wondering if there is a way to circumvent this problem and the first solution I found was to use memoize macro described in an excellent online documentation for newLISP, Code Patterns in newLISP:
(define-macro (memoize mem-func func)
  (set (sym mem-func mem-func)
    (letex (f func  c mem-func)
      (lambda ()
        (or (context c (string (args)))
        (context c (string (args)) (apply f (args))))))))
You can apply this macro to any function with any number of arguments. The trick here is that each time a function is called its result is cached in memory for another call. It can speed up your application tremendously, which can be observed by comparing the execution time of these example Fibonacci functions:
(define (fibo n)
  (if (< n 2) 1
    (+ (fibo (- n 1)) (fibo (- n 2)))))

(memoize fibo-m
  (lambda (n)
    (if (< n 2) 1
      (+ (fibo-m (- n 1)) (fibo-m (- n 2))))))

(time (fibo 35))
(time (fibo-m 35))
On my laptop (fibo 35) took 12,98 seconds to execute, while (fibo-m) executed in 0,016 miliseconds.

Unfortunately the memoize macro cannot help you to handle mutual recursion. A classic example of such recursion looks as follows:
(define (f1 n)
  (println n)
    (if (= n 0)
      (println "Blastoff!")
      (f2 n)))

(define (f2 n)
  (f1 (- n 1)))
You can easily make newLISP run out of stack by running (f1 1000), not to mention bigger numbers. What happens if we define a "memoized" version of f1 and f2? Let's see:
(memoize f1
  (lambda (n)
    (println n)
    (if (= n 0)
      (println "Blastoff!")
      (f2 n))))

(memoize f2
  (lambda (n)
    (f1 (- n 1))))
Again, running (f1 1000) immediately exhausts newLISP's stack.

A solution to this problem is using a technique called trampolining. Bill Clementson on his blog not only explained in an excellent way the concept of using trampolines, but also provided a Common Lisp implementation, which became my inspiration to write a newLISP version:
(define (trampoline fun arg)
  (catch
    (while true
      (let ((run (apply fun arg)))
        (setf fun (first run) arg (rest run)))) 'result)
  result)
A trampoline iteratively executes code thunks returned by a function, and this way avoids blowing out application stack. However, in order to use trampoline, the function must return continuation (a pointer to the next step) instead of value. Below is a version of beforementioned functions modified to use the trampoline:
(define (f1 n)
  (println n)
  (if (= n 0)
    (throw "Blastoff!")
    (list f2 n)))

(define (f2 n)
  (list f1 (- n 1)))
Now you can test it with:
(trampoline f1 '(1000))
(trampoline f1 '(10000))
(trampoline f1 '(100000))
...
Have fun!

Thursday, June 3, 2010

newLISP - Lisp for the masses

There is a popular phrase among Lisp hackers: Plant a tree, write a book, create your own dialect of lisp. Although there aren't many popular Lisps out there and even the mainstream Common Lisp has never been massively used, it seems that just like in case of various Linux distributions, often more means simply better. A good example of such success story is Clojure, and here comes another candidate to take the lead.
newLISP is a modern dialect of Lisp, designed by Lutz Mueller to be (as he says himself) "quick to learn and to get the job done". I must say that this sentence couldn't be more true - solving Euler Problem 10 (finding the sum of all primes below 2 million) after just two days of fiddling with newLISP took me less than 3 minutes, including designing, writing, testing and executing the following code:
(println (apply + (filter (fn (n) (= 1 (length (factor n)))) (sequence 2 2000000))))
In spite of being an interpreted language, programs created with newLISP run amazingly fast. The code above is a purely brute force solution, yet it executes in less than 10 seconds on Core 2 Duo 1.66GHz.
However, simplicity comes for a price. If you try to use a more sophisticated approach, like classic sieve of Eratosthenes, you might get a bit surprised:
(define (sieve seq out)
  (let ((n (first seq)))
    (setf seq (filter (fn (x) (!= 0 (mod x n))) seq))
    (push n out)
    (if (not seq) out (sieve seq out))))
(print (apply + (sieve (sequence 2 2000000))))
In this code function sieve, although properly tail recursive, will make newLISP to quickly run out of stack or (if you provide enough space for the stack) of all available memory. It's because newLISP does no tail recursion optimization. If for some reason you cannot live with it, you can still use Common Lisp for implementing such recursions:
(defun range (min max) (loop for i from min to max collect i))
(defun sieve (seq &optional out)
  (let ((n (car seq)))
    (setf seq (delete-if #'(lambda (x) (= 0 (mod x n))) seq))
    (push n out)
    (if (not seq) out (sieve seq out))))
(print (apply #'+ (sieve (range 2 2000000))))
As you can see, the code of function sieve is very similar, so it's fairly easy to switch to newLISP if you already have some Common Lisp background. Differences to other Lisp dialects are well documented, as well as the language itself. Documentation is another strength of newLISP: you can learn how to solve different real life problems using newLISP code patterns or browse through many interesting code snippets.
What I personally like about newLISP in comparison to other Lisps is its really tiny footprint. You can create a standalone executable containing an embedded newLISP engine which is 200kB small, using two simple steps:
(load "/usr/share/newlisp/util/link.lsp")
(link "/usr/bin/newlisp" "mycode.bin" "mycode.lsp")
Despite being so small, newLISP provides a surprising amount of functionalities out of the box: regular expressions, TCP/IP networking (including FTP and HTTP protocols), database access (through external libraries), OpenGL, XML and XML-RPC handling, matrices, statistics (including Bayesian formulas), unicode support and a set of C/C++ modules that extend its abilities even more.
newLISP also supports parallel processing through Cilk-like API, and distributed computing through a built-in function net-eval.
newLISP is definitely not a New Common Lisp, and in some points (such as the beforementioned tail recursion) is still inferior. But newLISP is a perfect example that in the IT industry sometimes worse is better.

Tuesday, January 5, 2010

Web Sockets: a new era for the Web

The Web Sockets API is a part of HTML 5 specification and is to the Web what TCP is to the IP protocol. It allows full-duplex, bidirectional communication between the server and the client browser - no more polling, no more busy waiting, and no more problems with keep-alive HTTP connections. Web Sockets allow to leverage Web applications to an absolutely new level, where they can finally operate like any other network software, without crippled overlays like AJAX or Comet. With Web Sockets you can push data to the client just as you would do it with XMPP.
One of the first browsers to support Web Sockets is Google Chrome. And, of course, one of the first languages natively supporting Web Sockets is Go. Here is a simple server application which sends time information to the browser in one second intervals:
package main

import (
  "http"
  "io"
  "strconv"
  "time"
  "websocket"
)

func ClockServer(ws *websocket.Conn) {
  ch := time.Tick(100000000)
  t1 := <- ch
  t := t1 / 1000000000
  for {
    t2 := <-ch
    td := (t2 - t1) / 1000000000
    if td != t {
      io.WriteString(ws, strconv.Itoa64(td))
      t = td
    }
  }
}

func main() {
  http.Handle("/clock", websocket.Handler(ClockServer))
  err := http.ListenAndServe(":12345", nil)
  if err != nil {
    panic("Error: ", err.String())
  }
}
When you compile and start the server, create a web page with the following content and save it to your disk:
<html>
<head>
<title>WebSocket</title>
<script type="text/javascript">
function init() {
  var tick = document.getElementById("tick");
  if ("WebSocket" in window) {
    var ws = new WebSocket("ws://localhost:12345/clock");
    ws.onmessage = function (evt) {
      tick.innerHTML = evt.data;
    };
  } else {
    tick.innerHTML = "The browser doesn't support WebSocket.";
  }
}
</script>
</head>
<body onload="init()">
<span>Seconds elapsed: </span><span id="tick">0</span>
</body>
</html>
If you open the page in Google Chrome, you will see seconds ticking. There is no AJAX, no long-polling or any other fancy stuff - the server pushes data directly to the client. If you open more Chrome instances, you will see that each of them has its own connection with the server with its own clock (however, beware opening many connections in the same window, but in different tabs - this can occasionally crash your browser).
You can also use Erlang to make use of Web Sockets technology. Joe Armstrong has recently posted an article on his blog with full source code of both Web Sockets client and server.
Have fun!