Scotland on Rails 2009 Slides

Posted by Jonathan

I know, it is a bit late, but here are my slides from Scotland on Rails:


The slides are also available as a PDF download: Advanced Deployment

Scotland on Rails was again a great conference. A very interesting crowd in a very nice city. I'm looking forward to next year!

Web 2.0 Expo Berlin

Posted by Jonathan

I'm just back from today's Web 2.0 Expo sessions and I'm not sure I will attend tomorrow. Many have written about this before, but the creative, social atmosphere is missing due to the conference labyrinth halls. Boy, I'm happy I haven't spend > 1.000 Euros on this. No real food, a lot of product presentations, not enough room for socializing and to many suits for my taste.

Still, I had some nice conversations and met some interesting people.

I did again a session on scaling with Amazon EC2 and S3, the slides can be found here.

This time a also talked a bit about how we use S3 and EC2 to drive our Webmail Portal product, PeritorMail at Peritor.

SlideShare | View

Also nice the AWS announcement of S3 being available in EU data centers. Now I'm only waiting for EC2 in the EU...

EC2 gets new instance types

Posted by Jonathan

Wow

Amazon EC2 gets two new types of instances, large and extra large EC2 instances. Basically a large instance that has 4 times the capacity (CPU, RAM, HDD) of the old, now default small instance type while the extra large instance type has 8 times the capacity.

Small Instance (default)

1.7 GB memory
1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit)
160 GB instance storage (150 GB plus 10 GB root partition)
32-bit platform
I/O Performance: Moderate 
Price: $0.10 per instance hour

Large Instance

7.5 GB memory
4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)
850 GB instance storage (2 x 420 GB plus 10 GB root partition)
64-bit platform
I/O Performance: High 
Price: $0.40 per instance hour

Extra Large Instance

15 GB memory
8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each)
1,690 GB instance storage (4 x 420 GB plus 10 GB root partition)
64-bit platform
I/O Performance: High 
Price: $0.80 per instance hour

The idea is that you specify the instance type in the RunInstances API call. All old tools that do not specify this parameter start a default instance type.

Very nice to see this so fast after the recent S3 SLAs.

If they would now allow to run EC2 instances in Europe there are no excuses left not to run nearly all applications on EC2.

My Rails Konferenz 2007 slides

Posted by Jonathan

Rails Konferenz 2007 is over and like last year it was a lot of fun, meeting other developers and learning some new stuff.

My talk on scaling Rails applications with Amazon S3 and EC2 went well and I had some interesting discussions afterwards. The talk was based on my Linuxtag talk but had a lot more info on Switiply and load-balancing.


(Photo by phil76)

The slides are available here as a PDF, a video will soon be available as I’m told.

Skalieren von Rails Anwendungen mit Amazon S3 und EC2 (PDF)

Linuxtag 2007 slides on Amazon S3 and EC2

Posted by Jonathan

I gave a presentation (in German) on how to scale web applications using Amazon S3 and EC2 at the Linuxtag 2007 in Berlin, Germany.

The talk was a broad introduction to S3 and EC2 and had some examples and scenarios using a Ruby on Rails application.

The slides are available as PDF here and there is also a mp3 recording thanks to http://www.digitalwarenmanufaktur.de/blog/.

I will present a more Rails centric variant of this talk at the upcoming Rails Konferenz in Frankfurt, Germany. Further I will be talking about S3/EC2 and Rails at the RailsConfEurope 2007 here in Berlin, Germany.

Skalieren einer Web Anwendung mit Amazon S3 und EC2 (PDF)

Skalieren einer Web Anwendung mit Amazon S3 und EC2 (MP3)

Scaling Rails with Apache 2.2, mod_proxy_balancer and Mongrel

Posted by Jonathan

Unitl this week we used Lighttpd and FastCGI for MeinProf.de. The setup was nearly the same as described in the must read series scaling rails (1, 2, 3, 4) from poocs.net.

We used this setup from day 1 but always had some small issues with Lighttpd. Lighttpd was crashing every couple of days. Nothing dramatic, we had a script that monitored Lighttpd and restarted it if necessary. During the last weeks Lighttpd started to crash once a day and lately even once an hour. This was unacceptable and as we knew that we were going to get some serious press coverage in Germany we looked for alternatives.

43people and Basecamp use Apache 1.3 and FastCGI so this seemed like a good alternative. Just switch the webserver and we would be done. Unfortunately Apache 1.3 cannot loadbalance the FastCGI request and there is very little documentation on Apache 1.3 and remote FastCGI processes. Apache 2.0 is no better and has problems with mod_fastcgi. We needed remote FastCGI listeners as our hardware is quite old and we have many slow machines as opposed to a few fast ones that could use local FastCGI to handle the load.

Enter Mongrel.

Mongrel is a fast HTTP library and server for Ruby that is intended for hosting Ruby web applications of any kind using plain HTTP rather than FastCGI or SCGI. It is framework agnostic and already supports Ruby On Rails, Og+Nitro, and Camping frameworks.

With Mongrel your application server becomes a webserver that speaks HTTP so you “only” need to loadbalance and proxy normal HTTP request to it. Mongrel was stable during our tests so we looked for the HTTP proxy solution. Apache had always mod_proxy and could therefore proxy HTTP requests but we needed to loadbalancer these. The are extra packages for this kind of stuff like Balance but we wanted something more integrated and didn’t want to introduce more components.

Enter Apache 2.2 and mod_proxy_balancer.

Apache 2.2 introduced a new proxy module, mod_proxy_balancer. This module does exactly this, it balances proxy requests. You can define a cluster of proxies and use this cluster in your mod_proxy statement instead of just one proxy server.

With this setup we use Apache 2.2 to handle all incoming requests. Apache 2.2 uses mod_proxy to redirect the incoming HTTP requests to the mod_proxy_balancer cluster. The cluster consists of several Mongrel processes on each application server (and now also internal web server) and distributes the requests.

mod_proxy_balancer is more configurable that Lighttpd’s mod_fastcgi. For example you can specify load factors or routes for each cluster member. See the documentation for details.

Our httpd.conf looks like this:

First you define the cluster and tell it of which members it is composed of.

<Proxy balancer://myclustername>
  # cluster member 1
  BalancerMember http://192.168.0.1:3000 
  BalancerMember http://192.168.0.1:3001

  # cluster member 2, the fastest machine so double the load
  BalancerMember http://192.168.0.11:3000 loadfactor=2
  BalancerMember http://192.168.0.11:3001 loadfactor=2

  # cluster member 3
  BalancerMember http://192.168.0.12:3000
  BalancerMember http://192.168.0.12:3001

  # cluster member 4
  BalancerMember http://192.168.0.13:3000
  BalancerMember http://192.168.0.13:3001
</Proxy>

Then you proxy the location or virtual host to the cluster:

<VirtualHost *:80>
  ServerAdmin info@meinprof.de
  ServerName www.meinprof.de
  ServerAlias meinprof.de
  ProxyPass / balancer://meinprofcluster/
  ProxyPassReverse / balancer://meinprofcluster/
  ErrorLog /var/log/www/www.meinprof.de/apache_error_log
  CustomLog /var/log/www/www.meinprof.de/apache_access_log combined
</VirtualHost>

The slash at the end of the ProxyPass directive is very important.

Mongrel itself is startet on the cluster nodes like this:

# mongrel_rails start -d -e production -p 3000
# mongrel_rails start -d -e production -p 3001

Another nice feature of mod_proxy_balancer is the balancer-manager. It is a web interface to the configuration of the mod_proxy_balancer cluster through which you can query or edit your cluster nodes without the need to restart Apache.

In order to use balancer-manager include this in your configuration:

<Location /balancer-manager>
  SetHandler balancer-manager
</Location>

Of course you should protect this location through Apache’s require valid-user or Allow from directives.

So far this solution has proven much more stable (at least on FreeBSD) and was able to handle our peak traffic of 350.000 page request per day. In practice we use up to 8 Mongrel processes on each cluster node and it seems that Apache is the bottleneck and not our application servers as before. The next step for us is to introduce another web server that handles the incoming HTTP requests and has it’s own Mongrel cluster.

Time-based Fragment Caching with MemCache

Posted by Jonathan

For the sidebar content in MeinProf.de we use fragment caching. One problem with caching is that expiring entries can get really messy. Time-based caching can solve this problem but the current caching implementation in Rails does not support this.

While poking around in the MemCache-client implementation from the Robotcoop I saw that MemCache itself does support time-based expiry of cached entries. Thanks to Ruby I just re-implemented the write method in ActionController::Caching::Fragments::MemCacheStore so that we could expire our entries after some given time:

class ActionController::Caching::Fragments::MemCacheStore
<br />  def write(name, value, options=nil)
<br />    @data.set(name, value, 30.minutes)
<br />  end
<br />end

Now all fragments expire after 30 minutes. If you want to have different live-times for your caches you have to distinct by the name of the fragment. Normally fragments created with the <% cache do > call in views are named after the controller and action, e.g. controller/actionname. You can also specify a name like < cache(“UniPage_#{uni.id}”) do %>.

<blockquote>
class ActionController::Caching::Fragments::MemCacheStore
<br />  def write(name, value, options=nil)
<br />    if name =~ %r{^UniPage}
<br />      @data.set(name, value, 30.minutes)
<br />    elsif name == "mycontroller/myaction" 
<br />      @data.set(name, value, 45.minutes)
<br />    else
<br />      @data.set(name, value, 60.minutes)
<br />    end
<br />  end
<br />end

Not the cleanest solution but it works very well for us.

If you want to also save your sessions in MemCache with the memcache-client library from the Robotcoop, add this code to ActionController::Caching::Fragments::MemCacheStore:

class ActionController::Caching::Fragments::MemCacheStore
<br />  def data=(cache)
<br />    @data = cache
<br />  end
<br />end

In summary our code in config/environments/production.rb looks like this:

### MemCached Server ###
<br />CACHE = MemCache.new :c_threshold =&gt; 10_000, :compression =&gt; true,\<br /> :debug =&gt; false, :namespace =&gt; 'meinprof_de', :readonly =&gt; false, :urlencode =&gt; false<br />
<br />CACHE.servers = '127.0.0.1:11211'
<br />
<br />### Sessions in MemCached ###
<br />session_options = {
<br />    :database_manager =&gt; CGI::Session::MemCacheStore,
<br />    :cache =&gt; CACHE
<br />}
<br />
<br />ActionController::CgiRequest::DEFAULT_SESSION_OPTIONS.update session_options
<br />
<br />### FragmentCaching im MemCached ###
<br /># Allow us to set the CACHE as the Fragment Cache store
<br />class ActionController::Caching::Fragments::MemCacheStore
<br />  def data=(cache)
<br />    @data = cache
<br />  end
<br />  
<br />  def write(name, value, options=nil)
<br />    if name =~ %r{^Random_TopFlop}
<br />      @data.set(name, value, 30.minutes)
<br />    elsif name =~ %r{^RegionPage}
<br />      @data.set(name, value, 60.minutes)
<br />    elsif name =~ %r{^UniPage}
<br />      @data.set(name, value, 60.minutes)    
<br />    else
<br />      @data.set(name, value, 120.minutes)
<br />    end
<br />  end
<br />end<br />
<br />ActionController::Base.fragment_cache_store = :mem_cache_store ,{}
<br />ActionController::Base.fragment_cache_store.data = CACHE