Feasability of VERY large ext3 file system?

Seth Mos knuffie at xs4all.nl
Wed Dec 4 07:22:01 CST 2002

At 22:27 4-12-2002 +1000, jason andrade wrote:
>On Wed, 4 Dec 2002, Basil Hussain wrote:
>You are a brave admin.  We salute you (and unfortunately, will probably laugh
>behind your back if it all falls over with much muttering of "we told you 
>so" :-)

hahahaha... ehm... sorry.

>i have seen people demonstrating 1Tbyte ext3 filesystems - i'm sure there are
>larger ones out there.  we are also looking at arrays in the 2Tb range but
>plan to break it up into 500G chunks for various reasons.  we have a 550G
>ext3 filesystem now and so far it has performed well.

2TB is a current 2.4 blockdevice limitation, even if the filesystem 
actually can be larger.

>a good alternative to ext3 might be XFS instead (though i don't personally
>recommend it unless you know what you're doing..) or perhaps reiserfs.

Depends on the filesize. If you have multimegabyte or even gigabyte files 
XFS will to that very nicely. XFS internally supports upto 2^63 size but 
because of current 2.4 contraints this is capped at 2^31 for the moment (to 
prevent bad things from happening :-)

The largest production XFS filesystem I know if is 1.6TB I believe. They 
even have a number of those.
Quantum also uses XFS in one of it's NAS boxes.

I tried reiserfs once (long time ago (3.5)) and won't use it anymore. 
Crappy repair tools were one of the problems. And they had long lasting NFS 
problems as well. (I believe this to be fixed these days).

> > * The data stored would be organised in a directory hierarchy only one 
> level
> > deep. How would ext3 cope with, say, 4000-5000 directories off the file
> > system's root?
>no issues with that.

It works but an indexed tree is faster. I am not sure if ext3 does this but 
I do remember patches for ext2 to make lookups in large directories faster. 
You might want this.

> > * How much space is lost due to journaling? Basically, given a gigabyte of
> > disk, what is the available formatted space?
>very little in comparison to the available space.  journal sizes don't
>grow (from what i understand) in direct proportion to increased disk.

Depending on the type of filesystem you are using this is a static 
allocated amount of space or dynamically or even a wandering log.
XFS allocates at mkfs time and the largest most people use is 32MB, 
remember that only meta data is stored in most journaling fs. (except ext3 
in data mode). so 32 MB of meta updates is quite a lot.

ReiserFS 3.x use a static allocated log and they are planning wandering 
logs for v4. This would also make it more of a snapshot based fs which 
might prove interesting (and fast).

> > If anyone could offer some advice, anecdotes, etc. on running *large* file
> > systems using ext3, I would be most grateful.
>do you _have_ to have a single filesystem ?  how are you backing it up ?
>can everything be down when you need to fsck it and it takes 1-2 hours
>to do so ?

Although journaling will give you filesystem integrity it will not give you 
data integrity. It will prevent a neccesary fsck at startup/mount time but 
if things go bellyup for one reason or another checking a really large 
filesystem _will_ take a long time.
No matter what filesystem you are running. A bad scsi cable can wreck havoc 
on a filesystem.

Then again if you really must work with 200GB+ databases it won't be a 
option will it.
If the filesize is not that large you can always mount a number of smaller 
filesystems elsewhere and start symlinking directories like mad, like you 
see on many ftp servers.


It might just be your lucky day, if you only knew.

More information about the Linux-PowerEdge mailing list