Scaling my file server with my audience

William Warren hescominsoon at emmanuelcomputerconsulting.com
Thu Jan 18 16:56:38 CST 2007


You can easily limit the memory usage of squid so that large files 
aren't a concern if they aren't the primary thing you are trying to cache.

Colin Dermott wrote:
> On 1/19/07, Michael E. Conlen <meconlen at obfuscated.net> wrote:
>> Colin,
> 
> Hi Michael.  Thanks for your very in-depth reply!
> 
>> A reverse proxy may not be efficient for your setup. If the set of
>> files commonly accessed is large then the proxy will run out of
>> memory easily. The disk cache for the proxy can only be so large
>> before memory becomes a concern.
> 
> Interesting.  A pessimistic estimate of my hot set is around 250GiB
> currently and the entire set is close to 4TiB.
> 
>> I use the following for one client
>>
>> 1 NFS server with large disk (2 TB raw)
>> 13 web servers that mount the NFS server
>> 1 proxy server which handles a subset of data that is commonly accessed
>> 1 layer 7 switch (foundry load balancer).
>>
>> With this I handle 250 Mbit/sec of peak traffic (180 Mbit/sec
>> average) accessing over 500 GB of data, the median size of the files
>> is 28k or about  70 million hits a day.
> 
> I'm interested in this set up.  I don't have much experience with NFS,
> however.  Do the NFS clients cache to local disk or will the webservers
> hit the NFS server for each file request?  As it stands now, my disk
> array is the bottleneck.  I have also exhausted the upgrade path for
> RAM in my machine at 12GB.
> 
> The median file size in my hot set would be around 50-100MiB,
> possibly more as the most popular files seem to be game (WoW)
> recordings (can be several hundred MiB) and Linux ISO downloads.
> 
> The median of my entire set would be much smaller - say 3MiB.
> 
>> Because of the small file size and the fact that the majority of the
>> traffic is handled by a small subset of the data the proxy server
>> works well. If the commonly accessed data was larger I could use
>> multiple proxies and the layer 7 switch to direct specific traffic to
>> each proxy, however that box is about $30,000 which I would presume
>> is outside your range.
> 
> Unfortunately, that sounds exactly like my problem (the commonly accessed
> data is larger), and you're right $30,000 is out of my range.
> 
>> Without a load balancer I don't recommend using round robin DNS
>> pointing directly to servers. It works fine when pointed to a cluster
>> of load balancers because the load balancer can transparently handle
>> situations of server outages, but with round robin DNS while you get
>> clustering you have a poor mechanism for removing a server from the
>> cluster in the event of failure because of DNS caching. In this case
>> I recommend the previous model my client used.
>>
>> FreeBSD and OpenBSD has firewall software callled PF. PF has the
>> ability to redirect traffic going to a single IP address to a cluster
>> of IP addresses. In this scenario when you wish to remove a server
>> from the cluster you can remove it from the PF configuration and
>> there's no need to wait for DNS propagation or people's computers to
>> lookup the addresses after the DNS change. A simple server with a
>> single fast CPU and 1GB or possibly even 512 MB of memory should be
>> able to handle loads up to somewhere between 150 and 200 Mbit/sec of
>> traffic (my client now has a cluster of firewalls between two load
>> balancers).
> 
> How about the Apache module mod_backhand?  Together with
> the open-source software Wackamole, it seems like one can achieve
> high-availability and intelligent load-balancing without any extra
> hardware.  However I believe I am forced into Apache 1.3 and
> possibly FreeBSD for Wackamole.  I suppose instead of Wackamole
> I could use PF or LVS.
> 
>> With this you could consider one NFS server which contains the disks
>> for the data. This server should be robust. Performance is affected
>> by operations more than throughput (it can serve one large file much
>> much more easily than accessing several small files). Large
>> directories and the use of .htaccess files rather than placing the
>> directives in the httpd.conf file can cause performance issues. A
>> high end raid card or better yet a RAID box that performs the  RAID
>> functions in the disk array rather than on the SCSI card are optimal.
>> Get plenty of memory for a disk cache.
> 
> By this, do you mean the cache in the RAID box/card, or in the OS, or
> both?
> 
>> Three to five web servers. Memory is less of a concern and CPU is
>> more of a concern because with NFS (unless you go to v4, which may
>> not be ready for prime time) each operation goes to the NFS server,
>> commonly accessed files are no longer cached in the web server's
>> memory. You want to have at least some headroom so that you can take
>> a server down for maintenance and still have enough servers to handle
>> the traffic.
>>
>> One PF firewall. Fast machine, modest amounts of memory. With this
>> you can round robin to the servers and manage which servers get
>> traffic quickly.
> 
> I am interested in trying this architecture, however with the web servers
> caching nothing and every hit going to the NFS server, where do I go
> once the NFS box starts to struggle?
> 
> Given my resources, my current web server would most likely become
> the NFS server, and he is struggling at the moment with disk I/O during
> peak periods.
> 
>> You should also consider a management server on which you can run
>> applications like cricket and nagios or big brother. You can also use
>> this machine to store all configuration data and perform day to day
>> operations. Once you go beyond having one or two servers having
>> software that can help you monitor your environment becomes critical
>> because keeping five to 10 ssh logins running with top on them all
>> the time becomes problematic (and this can happen if you're
>> attempting to find a bottleneck or spot a problem somewhere on the
>> cluster).
> 
> This is a good idea and something I will definitely look into.  Currently
> I am using munin locally on the same machine.
> 
>> You should start planing now for when you will need hardware load
>> balancers. They are expensive, and doubly so if you want to load
>> balance a firewall (if you use the PF firewall just to load balance
>> traffic you can eliminate it if you go to a hardware load balancer).
>>
>> If you decide to go with a Squid proxy you want a single processor
>> machine with as much memory as possible. I haven't tried squid on
>> ia64 platforms to see how much memory it can  handle but I believe
>> it's been able to take advantage of 64 bit platforms since the days
>> of the DEC Alpha. I've gone without using a disk cache at all as any
>> sort of large disk cache will consume a lot of memory just in the
>> indexing. With a disk cache I was actually left with little memory
>> for a memory cache. On the other hand by using a memory cache only I
>> was able to use about half the memory for storing objects. Object
>> delivery times are reduced greatly and the proxy can handle large
>> amounts of traffic.
> 
> This is interesting what you say about the disk cache.  The main reason
> I guessed 2x36GB 15KRPM in RAID-0 for the/each proxy server was
> because of the size of my hot set.
> 
>> The biggest CPU hog for the proxy is objects that expire frequently.
>> Every connection to the web server for a IMS request is expensive so
>> that data is sent directly to the origin server so you still need
>> enough web servers to handle CGI or PHP scripts and refresh the
>> caches or process IMS requests. By using a layer 7 switch I send
>> requests directly to origin servers for PHP or CGI requests so the
>> proxy doesn't have the overhead of processing those.
> 
> Ah.  Fortunately I don't have any PHP or CGI to worry about.  This is
> purely pushing files :-)
> 
>> Hope this helps.
> 
> Yes you've opened my eyes to a few things I knew nothing about :-)
> 
> Thanks again
> 
>> --
>> Michael Conlen
> 
> [snip]
> 
> _______________________________________________
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> http://lists.us.dell.com/mailman/listinfo/linux-poweredge
> Please read the FAQ at http://lists.us.dell.com/faq
> 
> 

-- 
My "Foundation" verse:
Isa 54:17  No weapon that is formed against thee shall prosper; and 
every tongue that shall rise against thee in judgment thou shalt 
condemn. This is the heritage of the servants of the LORD, and their 
righteousness is of me, saith the LORD.

-- carpe ductum -- "Grab the tape"
CDTT (Certified Duct Tape Technician)

Linux user #322099
Machines:
206822
256638
276825
http://counter.li.org/



More information about the Linux-PowerEdge mailing list