High load average on Ubuntu 7.10/8.04
Andrew Carter
ascarter at gmail.com
Mon May 12 17:52:32 CDT 2008
Bruno Friedmann wrote:
> Could you check your harddrive with a smartctl -a /dev/sda
>
Here is the smartctl -a output:
smartctl version 5.37 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6
Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar SE (Serial ATA) family
Device Model: WDC WD2500JS-75NCB1
Serial Number: WD-WCANK2472851
Firmware Version: 10.02E01
User Capacity: 250,000,000,000 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 7
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Mon May 12 14:03:16 2008 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.
General SMART Values:
Offline data collection status: (0x84) Offline data collection activity
was suspended by an interrupting command from host.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine
completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (8280) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 96) minutes.
Conveyance self-test routine
recommended polling time: ( 6) minutes.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 200 200 051 Pre-fail Always
- 0
3 Spin_Up_Time 0x0003 211 186 021 Pre-fail Always
- 4441
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always
- 513
5 Reallocated_Sector_Ct 0x0033 193 193 140 Pre-fail Always
- 50
7 Seek_Error_Rate 0x000f 200 200 051 Pre-fail Always
- 0
9 Power_On_Hours 0x0032 086 086 000 Old_age Always
- 10232
10 Spin_Retry_Count 0x0013 100 100 051 Pre-fail Always
- 0
11 Calibration_Retry_Count 0x0012 100 100 051 Old_age Always
- 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always
- 336
190 Temperature_Celsius 0x0022 046 009 045 Old_age Always
In_the_past 54
194 Temperature_Celsius 0x0022 096 059 000 Old_age Always
- 54
196 Reallocated_Event_Count 0x0032 199 199 000 Old_age Always
- 1
197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0010 200 200 000 Old_age
Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always
- 0
200 Multi_Zone_Error_Rate 0x0009 200 200 051 Pre-fail
Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining
LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 0
-
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
> And perharps launch a long test
> smartctl -t /dev/sda
>
Test result for short:
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining
LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 10234
-
# 2 Short offline Completed without error 00% 0
-
Running a long test now.
> Your harddrive have perharps some trouble inside.
>
> a lspci -vv could help to identify the bus use
> and the report of smartctl to identify (completely the harddrive )
>
Here is the SATA portion of lspci:
00:1f.2 SATA controller: Intel Corporation 631xESB/632xESB SATA AHCI
Controller (rev 09) (prog-if 01 [AHCI 1.0])
Subsystem: Dell Unknown device 01c1
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Interrupt: pin C routed to IRQ 20
Region 0: I/O ports at fe00 [size=8]
Region 1: I/O ports at fe10 [size=4]
Region 2: I/O ports at fe20 [size=8]
Region 3: I/O ports at fe30 [size=4]
Region 4: I/O ports at fec0 [size=32]
Region 5: Memory at ff970000 (32-bit, non-prefetchable) [size=1K]
Capabilities: [70] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [a8] #12 [0010]
One other thing to note - reads are fine. It only crawls on writes.
Also, I think at least one other person in the office is hitting the
same problem (running the same hardware except with less RAM).
Thanks,
Andrew
>
> Andrew Carter wrote:
>> I've been dealing with a problem with my Dell Precision 490 for some
>> time now. I've been using this machine since late last year at work. The
>> problem is that the load average goes through the roof (around 10) when
>> the disk is is used heavily. For example, tar xvzf on a 4GB tarball
>> brings the machine to its knees.
>>
>> I originally was running Ubuntu 7.10 64-bit. I recently did a clean
>> upgrade to Ubuntu 8.04 64-bit. I've tried turning off all the compiz
>> effects, killing as many other processes as I can but it always kills
>> the machine when the disk is heavily accessed.
>>
>> I'm not exactly sure how to diagnose this problem. I'm thinking it could
>> be a kernel problem, a driver problem for the SATA disk, or a firmware
>> issuee. I believe I have the newest Dell BIOS. I'm not sure about the
>> whole kernel thing since I don't see widespread talke about it. And my
>> newer OptiPlex at home hasn't shown any issues (and I mess with lots of
>> large video and audio on it).
>>
>> Specs:
>> Dell Precision 490
>> BIOS A07
>> 4GB RAM
>> Intel Xeon 1.6GHz (2 dual cores)
>> Ubuntu 8.04
>> Kernel Linux 2.6.24-17-generic
>> 250GB WD SATA hard drive
More information about the Linux-Precision
mailing list