Scaling my file server with my audience

Colin Dermott colindermott at
Thu Jan 18 16:32:34 CST 2007

On 1/19/07, Michael E. Conlen <meconlen at> wrote:
> Colin,

Hi Michael.  Thanks for your very in-depth reply!

> A reverse proxy may not be efficient for your setup. If the set of
> files commonly accessed is large then the proxy will run out of
> memory easily. The disk cache for the proxy can only be so large
> before memory becomes a concern.

Interesting.  A pessimistic estimate of my hot set is around 250GiB
currently and the entire set is close to 4TiB.

> I use the following for one client
> 1 NFS server with large disk (2 TB raw)
> 13 web servers that mount the NFS server
> 1 proxy server which handles a subset of data that is commonly accessed
> 1 layer 7 switch (foundry load balancer).
> With this I handle 250 Mbit/sec of peak traffic (180 Mbit/sec
> average) accessing over 500 GB of data, the median size of the files
> is 28k or about  70 million hits a day.

I'm interested in this set up.  I don't have much experience with NFS,
however.  Do the NFS clients cache to local disk or will the webservers
hit the NFS server for each file request?  As it stands now, my disk
array is the bottleneck.  I have also exhausted the upgrade path for
RAM in my machine at 12GB.

The median file size in my hot set would be around 50-100MiB,
possibly more as the most popular files seem to be game (WoW)
recordings (can be several hundred MiB) and Linux ISO downloads.

The median of my entire set would be much smaller - say 3MiB.

> Because of the small file size and the fact that the majority of the
> traffic is handled by a small subset of the data the proxy server
> works well. If the commonly accessed data was larger I could use
> multiple proxies and the layer 7 switch to direct specific traffic to
> each proxy, however that box is about $30,000 which I would presume
> is outside your range.

Unfortunately, that sounds exactly like my problem (the commonly accessed
data is larger), and you're right $30,000 is out of my range.

> Without a load balancer I don't recommend using round robin DNS
> pointing directly to servers. It works fine when pointed to a cluster
> of load balancers because the load balancer can transparently handle
> situations of server outages, but with round robin DNS while you get
> clustering you have a poor mechanism for removing a server from the
> cluster in the event of failure because of DNS caching. In this case
> I recommend the previous model my client used.
> FreeBSD and OpenBSD has firewall software callled PF. PF has the
> ability to redirect traffic going to a single IP address to a cluster
> of IP addresses. In this scenario when you wish to remove a server
> from the cluster you can remove it from the PF configuration and
> there's no need to wait for DNS propagation or people's computers to
> lookup the addresses after the DNS change. A simple server with a
> single fast CPU and 1GB or possibly even 512 MB of memory should be
> able to handle loads up to somewhere between 150 and 200 Mbit/sec of
> traffic (my client now has a cluster of firewalls between two load
> balancers).

How about the Apache module mod_backhand?  Together with
the open-source software Wackamole, it seems like one can achieve
high-availability and intelligent load-balancing without any extra
hardware.  However I believe I am forced into Apache 1.3 and
possibly FreeBSD for Wackamole.  I suppose instead of Wackamole
I could use PF or LVS.

> With this you could consider one NFS server which contains the disks
> for the data. This server should be robust. Performance is affected
> by operations more than throughput (it can serve one large file much
> much more easily than accessing several small files). Large
> directories and the use of .htaccess files rather than placing the
> directives in the httpd.conf file can cause performance issues. A
> high end raid card or better yet a RAID box that performs the  RAID
> functions in the disk array rather than on the SCSI card are optimal.
> Get plenty of memory for a disk cache.

By this, do you mean the cache in the RAID box/card, or in the OS, or

> Three to five web servers. Memory is less of a concern and CPU is
> more of a concern because with NFS (unless you go to v4, which may
> not be ready for prime time) each operation goes to the NFS server,
> commonly accessed files are no longer cached in the web server's
> memory. You want to have at least some headroom so that you can take
> a server down for maintenance and still have enough servers to handle
> the traffic.
> One PF firewall. Fast machine, modest amounts of memory. With this
> you can round robin to the servers and manage which servers get
> traffic quickly.

I am interested in trying this architecture, however with the web servers
caching nothing and every hit going to the NFS server, where do I go
once the NFS box starts to struggle?

Given my resources, my current web server would most likely become
the NFS server, and he is struggling at the moment with disk I/O during
peak periods.

> You should also consider a management server on which you can run
> applications like cricket and nagios or big brother. You can also use
> this machine to store all configuration data and perform day to day
> operations. Once you go beyond having one or two servers having
> software that can help you monitor your environment becomes critical
> because keeping five to 10 ssh logins running with top on them all
> the time becomes problematic (and this can happen if you're
> attempting to find a bottleneck or spot a problem somewhere on the
> cluster).

This is a good idea and something I will definitely look into.  Currently
I am using munin locally on the same machine.

> You should start planing now for when you will need hardware load
> balancers. They are expensive, and doubly so if you want to load
> balance a firewall (if you use the PF firewall just to load balance
> traffic you can eliminate it if you go to a hardware load balancer).
> If you decide to go with a Squid proxy you want a single processor
> machine with as much memory as possible. I haven't tried squid on
> ia64 platforms to see how much memory it can  handle but I believe
> it's been able to take advantage of 64 bit platforms since the days
> of the DEC Alpha. I've gone without using a disk cache at all as any
> sort of large disk cache will consume a lot of memory just in the
> indexing. With a disk cache I was actually left with little memory
> for a memory cache. On the other hand by using a memory cache only I
> was able to use about half the memory for storing objects. Object
> delivery times are reduced greatly and the proxy can handle large
> amounts of traffic.

This is interesting what you say about the disk cache.  The main reason
I guessed 2x36GB 15KRPM in RAID-0 for the/each proxy server was
because of the size of my hot set.

> The biggest CPU hog for the proxy is objects that expire frequently.
> Every connection to the web server for a IMS request is expensive so
> that data is sent directly to the origin server so you still need
> enough web servers to handle CGI or PHP scripts and refresh the
> caches or process IMS requests. By using a layer 7 switch I send
> requests directly to origin servers for PHP or CGI requests so the
> proxy doesn't have the overhead of processing those.

Ah.  Fortunately I don't have any PHP or CGI to worry about.  This is
purely pushing files :-)

> Hope this helps.

Yes you've opened my eyes to a few things I knew nothing about :-)

Thanks again

> --
> Michael Conlen


More information about the Linux-PowerEdge mailing list