High load average on Ubuntu 7.10/8.04

Bruno Friedmann bruno at ioda-net.ch
Tue May 13 01:05:25 CDT 2008


Everything look good. But need to wait the long test result.
There's two things to comment : the temp is quiet high (54°) My disk is around 36-42.
and the reallocate_count at 1 indicate sector's reallocation. ( shouldn't be a trouble )

Perharps you could use the diagnostic tools ( from Dell & Western Digital Drive fitness )
And make a backup


Andrew Carter wrote:
> Bruno Friedmann wrote:
>> Could you check your harddrive with a smartctl -a /dev/sda
>>
> 
> Here is the smartctl -a output:
> 
> smartctl version 5.37 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6
> Bruce Allen
> Home page is http://smartmontools.sourceforge.net/
> 
> === START OF INFORMATION SECTION ===
> Model Family:     Western Digital Caviar SE (Serial ATA) family
> Device Model:     WDC WD2500JS-75NCB1
> Serial Number:    WD-WCANK2472851
> Firmware Version: 10.02E01
> User Capacity:    250,000,000,000 bytes
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   7
> ATA Standard is:  Exact ATA specification draft version not indicated
> Local Time is:    Mon May 12 14:03:16 2008 PDT
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> See vendor-specific Attribute list for marginal Attributes.
> 
> General SMART Values:
> Offline data collection status:  (0x84)	Offline data collection activity
> 					was suspended by an interrupting command from host.
> 					Auto Offline Data Collection: Enabled.
> Self-test execution status:      (   0)	The previous self-test routine
> completed
> 					without error or no self-test has ever
> 					been run.
> Total time to complete Offline
> data collection: 		 (8280) seconds.
> Offline data collection
> capabilities: 			 (0x7b) SMART execute Offline immediate.
> 					Auto Offline data collection on/off support.
> 					Suspend Offline collection upon new
> 					command.
> 					Offline surface scan supported.
> 					Self-test supported.
> 					Conveyance Self-test supported.
> 					Selective Self-test supported.
> SMART capabilities:            (0x0003)	Saves SMART data before entering
> 					power-saving mode.
> 					Supports SMART auto save timer.
> Error logging capability:        (0x01)	Error logging supported.
> 					General Purpose Logging supported.
> Short self-test routine
> recommended polling time: 	 (   2) minutes.
> Extended self-test routine
> recommended polling time: 	 (  96) minutes.
> Conveyance self-test routine
> recommended polling time: 	 (   6) minutes.
> 
> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
> UPDATED  WHEN_FAILED RAW_VALUE
>   1 Raw_Read_Error_Rate     0x000f   200   200   051    Pre-fail  Always
>       -       0
>   3 Spin_Up_Time            0x0003   211   186   021    Pre-fail  Always
>       -       4441
>   4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always
>       -       513
>   5 Reallocated_Sector_Ct   0x0033   193   193   140    Pre-fail  Always
>       -       50
>   7 Seek_Error_Rate         0x000f   200   200   051    Pre-fail  Always
>       -       0
>   9 Power_On_Hours          0x0032   086   086   000    Old_age   Always
>       -       10232
>  10 Spin_Retry_Count        0x0013   100   100   051    Pre-fail  Always
>       -       0
>  11 Calibration_Retry_Count 0x0012   100   100   051    Old_age   Always
>       -       0
>  12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always
>       -       336
> 190 Temperature_Celsius     0x0022   046   009   045    Old_age   Always
>   In_the_past 54
> 194 Temperature_Celsius     0x0022   096   059   000    Old_age   Always
>       -       54
> 196 Reallocated_Event_Count 0x0032   199   199   000    Old_age   Always
>       -       1
> 197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always
>       -       0
> 198 Offline_Uncorrectable   0x0010   200   200   000    Old_age
> Offline      -       0
> 199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always
>       -       0
> 200 Multi_Zone_Error_Rate   0x0009   200   200   051    Pre-fail
> Offline      -       0
> 
> SMART Error Log Version: 1
> No Errors Logged
> 
> SMART Self-test log structure revision number 1
> Num  Test_Description    Status                  Remaining
> LifeTime(hours)  LBA_of_first_error
> # 1  Short offline       Completed without error       00%         0
>      -
> 
> SMART Selective self-test log data structure revision number 1
>  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
>     1        0        0  Not_testing
>     2        0        0  Not_testing
>     3        0        0  Not_testing
>     4        0        0  Not_testing
>     5        0        0  Not_testing
> Selective self-test flags (0x0):
>   After scanning selected spans, do NOT read-scan remainder of disk.
> If Selective self-test is pending on power-up, resume after 0 minute delay.
> 
> 
> 
>> And perharps launch a long test
>> smartctl -t /dev/sda
>>
> 
> Test result for short:
> 
> === START OF READ SMART DATA SECTION ===
> SMART Self-test log structure revision number 1
> Num  Test_Description    Status                  Remaining
> LifeTime(hours)  LBA_of_first_error
> # 1  Short offline       Completed without error       00%     10234
>      -
> # 2  Short offline       Completed without error       00%         0
>      -
> 
> Running a long test now.
> 
> 
>> Your harddrive have perharps some trouble inside.
>>
>> a lspci -vv could help to identify the bus use
>> and the report of smartctl to identify (completely the harddrive )
>>
> 
> Here is the SATA portion of lspci:
> 
> 00:1f.2 SATA controller: Intel Corporation 631xESB/632xESB SATA AHCI
> Controller (rev 09) (prog-if 01 [AHCI 1.0])
> 	Subsystem: Dell Unknown device 01c1
> 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B-
> 	Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
> 	Latency: 0
> 	Interrupt: pin C routed to IRQ 20
> 	Region 0: I/O ports at fe00 [size=8]
> 	Region 1: I/O ports at fe10 [size=4]
> 	Region 2: I/O ports at fe20 [size=8]
> 	Region 3: I/O ports at fe30 [size=4]
> 	Region 4: I/O ports at fec0 [size=32]
> 	Region 5: Memory at ff970000 (32-bit, non-prefetchable) [size=1K]
> 	Capabilities: [70] Power Management version 2
> 		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
> 		Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> 	Capabilities: [a8] #12 [0010]
> 
> 
> 
> One other thing to note - reads are fine. It only crawls on writes.
> Also, I think at least one other person in the office is hitting the
> same problem (running the same hardware except with less RAM).
> 
> Thanks,
> Andrew
> 
> 
>> Andrew Carter wrote:
>>> I've been dealing with a problem with my Dell Precision 490 for some
>>> time now. I've been using this machine since late last year at work. The
>>> problem is that the load average goes through the roof (around 10) when
>>> the disk is is used heavily. For example, tar xvzf on a 4GB tarball
>>> brings the machine to its knees.
>>>
>>> I originally was running Ubuntu 7.10 64-bit. I recently did a clean
>>> upgrade to Ubuntu 8.04 64-bit. I've tried turning off all the compiz
>>> effects, killing as many other processes as I can but it always kills
>>> the machine when the disk is heavily accessed.
>>>
>>> I'm not exactly sure how to diagnose this problem. I'm thinking it could
>>> be a kernel problem, a driver problem for the SATA disk, or a firmware
>>> issuee. I believe I have the newest Dell BIOS. I'm not sure about the
>>> whole kernel thing since I don't see widespread talke about it. And my
>>> newer OptiPlex at home hasn't shown any issues (and I mess with lots of
>>> large video and audio on it).
>>>
>>> Specs:
>>> Dell Precision 490
>>> BIOS A07
>>> 4GB RAM
>>> Intel Xeon 1.6GHz (2 dual cores)
>>> Ubuntu 8.04
>>> Kernel Linux 2.6.24-17-generic
>>> 250GB WD SATA hard drive
> 


-- 

     Bruno Friedmann




More information about the Linux-Precision mailing list