High load average on Ubuntu 7.10/8.04

Andrew Carter ascarter at gmail.com
Mon May 12 17:52:32 CDT 2008


Bruno Friedmann wrote:
> Could you check your harddrive with a smartctl -a /dev/sda
> 

Here is the smartctl -a output:

smartctl version 5.37 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6
Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar SE (Serial ATA) family
Device Model:     WDC WD2500JS-75NCB1
Serial Number:    WD-WCANK2472851
Firmware Version: 10.02E01
User Capacity:    250,000,000,000 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Mon May 12 14:03:16 2008 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x84)	Offline data collection activity
					was suspended by an interrupting command from host.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine
completed
					without error or no self-test has ever
					been run.
Total time to complete Offline
data collection: 		 (8280) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (  96) minutes.
Conveyance self-test routine
recommended polling time: 	 (   6) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   200   200   051    Pre-fail  Always
      -       0
  3 Spin_Up_Time            0x0003   211   186   021    Pre-fail  Always
      -       4441
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always
      -       513
  5 Reallocated_Sector_Ct   0x0033   193   193   140    Pre-fail  Always
      -       50
  7 Seek_Error_Rate         0x000f   200   200   051    Pre-fail  Always
      -       0
  9 Power_On_Hours          0x0032   086   086   000    Old_age   Always
      -       10232
 10 Spin_Retry_Count        0x0013   100   100   051    Pre-fail  Always
      -       0
 11 Calibration_Retry_Count 0x0012   100   100   051    Old_age   Always
      -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always
      -       336
190 Temperature_Celsius     0x0022   046   009   045    Old_age   Always
  In_the_past 54
194 Temperature_Celsius     0x0022   096   059   000    Old_age   Always
      -       54
196 Reallocated_Event_Count 0x0032   199   199   000    Old_age   Always
      -       1
197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always
      -       0
198 Offline_Uncorrectable   0x0010   200   200   000    Old_age
Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always
      -       0
200 Multi_Zone_Error_Rate   0x0009   200   200   051    Pre-fail
Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining
LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%         0
     -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.



> And perharps launch a long test
> smartctl -t /dev/sda
> 

Test result for short:

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining
LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     10234
     -
# 2  Short offline       Completed without error       00%         0
     -

Running a long test now.


> Your harddrive have perharps some trouble inside.
> 
> a lspci -vv could help to identify the bus use
> and the report of smartctl to identify (completely the harddrive )
> 

Here is the SATA portion of lspci:

00:1f.2 SATA controller: Intel Corporation 631xESB/632xESB SATA AHCI
Controller (rev 09) (prog-if 01 [AHCI 1.0])
	Subsystem: Dell Unknown device 01c1
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
	Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
	Latency: 0
	Interrupt: pin C routed to IRQ 20
	Region 0: I/O ports at fe00 [size=8]
	Region 1: I/O ports at fe10 [size=4]
	Region 2: I/O ports at fe20 [size=8]
	Region 3: I/O ports at fe30 [size=4]
	Region 4: I/O ports at fec0 [size=32]
	Region 5: Memory at ff970000 (32-bit, non-prefetchable) [size=1K]
	Capabilities: [70] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [a8] #12 [0010]



One other thing to note - reads are fine. It only crawls on writes.
Also, I think at least one other person in the office is hitting the
same problem (running the same hardware except with less RAM).

Thanks,
Andrew


> 
> Andrew Carter wrote:
>> I've been dealing with a problem with my Dell Precision 490 for some
>> time now. I've been using this machine since late last year at work. The
>> problem is that the load average goes through the roof (around 10) when
>> the disk is is used heavily. For example, tar xvzf on a 4GB tarball
>> brings the machine to its knees.
>>
>> I originally was running Ubuntu 7.10 64-bit. I recently did a clean
>> upgrade to Ubuntu 8.04 64-bit. I've tried turning off all the compiz
>> effects, killing as many other processes as I can but it always kills
>> the machine when the disk is heavily accessed.
>>
>> I'm not exactly sure how to diagnose this problem. I'm thinking it could
>> be a kernel problem, a driver problem for the SATA disk, or a firmware
>> issuee. I believe I have the newest Dell BIOS. I'm not sure about the
>> whole kernel thing since I don't see widespread talke about it. And my
>> newer OptiPlex at home hasn't shown any issues (and I mess with lots of
>> large video and audio on it).
>>
>> Specs:
>> Dell Precision 490
>> BIOS A07
>> 4GB RAM
>> Intel Xeon 1.6GHz (2 dual cores)
>> Ubuntu 8.04
>> Kernel Linux 2.6.24-17-generic
>> 250GB WD SATA hard drive



More information about the Linux-Precision mailing list