Feasability of VERY large ext3 file system?

Basil Hussain basil.hussain at kodakweddings.com
Wed Dec 4 10:14:01 CST 2002


Thanks everyone for the replies.

I shall explain in a bit more detail what we plan to do with all this
storage before I address comments people have given.

This storage array will be used as on-line storage for a large image
archive. It is not going to be used as permanent storage, just as a 'holding
area'. The data flow will be as follows:

1) Images scanned into storage from negatives.
2) Images proofed and corrected.
3) Images written to CD-R for permanent storage (alongside negatives).
4) Images kept in on-line storage for 3 months, then deleted.
5) If images are required after 3 months, the CD archive copy is used.

Each batch of images will consist of up to ~200 files, each around 2-4MB.
Stages 1-3 will happen during the same day.

On the issue of integrity/security, being able to back up this storage array
or maintaining 100% data integrity is not an absolute concern, as we have
copies off-line and the original negatives. However, if it all goes down,
having to retrieve 4,000 batches of images from their CDs doesn't have much
appeal! So, we would like to minimise the chances of data corruption, but
don't need to eliminate the possibility.

Thinking about this, as well as the 2TB block device limit, leads me to
consider using the option to specify up to 8 logical storage units (LUNs) on
the RAID array we will most likely be purchasing. If I'm interpreting it's
documentation right, it will present each LUN as a separate SCSI 'device'.

If I use, say, 5 LUNs with an ext3 file system on each, then this will not
only get around block device and inode limits (only 360GB in 160,000 files
on each LUN with 1.6TB total) but will also increase data integrity. If one
file system gets corrupted, then the other 4 will still be okay, and we will
only have to resort to our off-line CD copies for some of the data.

Does this sound sensible?

On the issue of huge directories, it has been suggested to me by a colleague
that we could employ some kind of indexed directory structure. Each batch of
images will have a unique reference number, which will make it easy, I
think. For example, say we have five numbers: 34680, 54715, 27301, 12789,
98302. The directory structure might look something like this:

	<files for 34680>
	<files for 27301>
	<files for 98302>
	<files for 54715>
	<files for 12789>

Why represent the numbers backwards? Because they might not necessarily be
always five digits. I think this should negate any directory index
size/speed issues. What does everyone else think?

This also ties in quite nicely with my idea of using 5 LUNs on the RAID
array to present 5 different SCSI 'devices' to the Linux server. I could
place an ext3 file system on each device and distribute the top level
numbered directories among the file systems - e.g. 0/1 on LUN0, 2/3 on LUN1,
4/5 on LUN2, etc.

On the subject of alternative file systems, they do have some appeal. The
large file handling capabilities of ReiserFS or XFS would be great if we
ever need to store large files. But, there are drawbacks. I'd rather not
have to fiddle around using special kernels or patching this and that. I'd
rather try and accomplish all this using a stock Redhat kernel.


Basil Hussain
Internet Developer, Kodak Weddings
E-Mail: basil.hussain at kodakweddings.com

More information about the Linux-PowerEdge mailing list