Scaling my file server with my audience

Eric Rostetter rostetter at mail.utexas.edu
Thu Jan 18 17:17:11 CST 2007


Quoting Colin Dermott <colindermott at gmail.com>:

>> A reverse proxy may not be efficient for your setup. If the set of
>> files commonly accessed is large then the proxy will run out of
>> memory easily. The disk cache for the proxy can only be so large
>> before memory becomes a concern.

This depends on the size of the files, as well as the number of files...
The proxy may or may not help here.

But, the proxy could have local copies of the files, and send them from
local disk rather than reverse-proxying them...  This might prove a good
solution, as long as the local disk is large enough to hold them all.

>> I use the following for one client
>>
>> 1 NFS server with large disk (2 TB raw)
>> 13 web servers that mount the NFS server
>> 1 proxy server which handles a subset of data that is commonly accessed
>> 1 layer 7 switch (foundry load balancer).

That sounds good, but the Foundy load balancer is probably very expensive.
You might look at an open source load balancer instead...  I've used
sever with great success (also I've used commerical ones like F5 Big-IP).

> I'm interested in this set up.  I don't have much experience with NFS,
> however.  Do the NFS clients cache to local disk or will the webservers
> hit the NFS server for each file request?  As it stands now, my disk
> array is the bottleneck.  I have also exhausted the upgrade path for
> RAM in my machine at 12GB.

NFS clients cache in memory, but not to disk.  So they will cache, but
it won't cache much...

>> Because of the small file size and the fact that the majority of the
>> traffic is handled by a small subset of the data the proxy server
>> works well. If the commonly accessed data was larger I could use
>> multiple proxies and the layer 7 switch to direct specific traffic to
>> each proxy, however that box is about $30,000 which I would presume
>> is outside your range.
>
> Unfortunately, that sounds exactly like my problem (the commonly accessed
> data is larger), and you're right $30,000 is out of my range.

You might be able to use an opensource/free load balancer instead...
But that assumes you have some skill to setup/install such, or can
find/hire someone to do it for you.

>> Without a load balancer I don't recommend using round robin DNS
>> pointing directly to servers. It works fine when pointed to a cluster
>> of load balancers because the load balancer can transparently handle
>> situations of server outages, but with round robin DNS while you get
>> clustering you have a poor mechanism for removing a server from the
>> cluster in the event of failure because of DNS caching. In this case
>> I recommend the previous model my client used.

This is true.  I've never used RR DNS myself, as it seems to have
undesirable side-effects if a host/network problem occurs...

>> FreeBSD and OpenBSD has firewall software callled PF. PF has the
>> ability to redirect traffic going to a single IP address to a cluster
>> of IP addresses. In this scenario when you wish to remove a server
>> from the cluster you can remove it from the PF configuration and
>> there's no need to wait for DNS propagation or people's computers to
>> lookup the addresses after the DNS change. A simple server with a
>> single fast CPU and 1GB or possibly even 512 MB of memory should be
>> able to handle loads up to somewhere between 150 and 200 Mbit/sec of
>> traffic (my client now has a cluster of firewalls between two load
>> balancers).

You might look at UltraMonkey for a load balancer, or Pound (for a web
proxy without any caching), etc. and see if any of them (or other
available solutions) meet your needs.

> How about the Apache module mod_backhand?  Together with
> the open-source software Wackamole, it seems like one can achieve
> high-availability and intelligent load-balancing without any extra
> hardware.  However I believe I am forced into Apache 1.3 and
> possibly FreeBSD for Wackamole.  I suppose instead of Wackamole
> I could use PF or LVS.

Actually, there is better proxying in the newer apache 2.x line...
Or you could go with squid or something else.  The problem with
apache or squid is they are hugh, and take a lot of machine resources,
so they may be part of your bottle neck.  Hence recommendations for
more light-weight proxy solutions.

UltraMonkey that I mention above is based on LVS, by the way...

> I am interested in trying this architecture, however with the web servers
> caching nothing and every hit going to the NFS server, where do I go
> once the NFS box starts to struggle?

You can't, other than throwing money at the NFS server.  GFS is a solution
that lets you do a little more (you can export GFS similar to NFS).  With
GFS, you can add more backend GFS servers if needed, to split the load...

> Given my resources, my current web server would most likely become
> the NFS server, and he is struggling at the moment with disk I/O during
> peak periods.

Then a non-caching proxy server, unless it copies/stores hot files locally
and serves them locally, won't help.

Keeping the copy of the hot files in sync across multiple servers can
become a hassle though also, so that may not be optimal from a management
point of view, especially as hot files tend to change over time...


>> You should start planing now for when you will need hardware load
>> balancers. They are expensive, and doubly so if you want to load
>> balance a firewall (if you use the PF firewall just to load balance
>> traffic you can eliminate it if you go to a hardware load balancer).

I've not found anything a hardware load balancer can do that a free
opensource one can't do for much less cost, except of course offer
you on-site support/installation.

>> Hope this helps.
>
> Yes you've opened my eyes to a few things I knew nothing about :-)

Bath in the light... ;)

> Thanks again
>
>> --
>> Michael Conlen

-- 
Eric Rostetter
The Department of Physics
The University of Texas at Austin

Go Longhorns!



More information about the Linux-PowerEdge mailing list