Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 160315 - System Hangs when trying to write data to full filesystem
Summary: System Hangs when trying to write data to full filesystem
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: athlon
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Peter Staubach
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-06-14 11:53 UTC by Kieran Foley
Modified: 2007-11-30 22:07 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-06-20 19:19:08 UTC


Attachments (Terms of Use)

Description Kieran Foley 2005-06-14 11:53:46 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)

Description of problem:
We have a cronjob that runs nightly which copies files/data from one filesystem to another. Both of these filesystems are configured using Hitachi storage, connected through a SAN. They are mounted on logical volumes using LVM.
It has happened on 2 occasions that the filesystem that we are copying data to fills up 100%. The cron job continues to try and write data to the filesystem but because it it full the system hangs and the only way we can get on it and kill the jobs is to power cycle the server. We have since grown the filesystems and hopefully this will not happen again but I'd like to know if it is normal for the server to hang in this situation and if not what can we do to prevent a re-occurance?

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.Fill up file systems 100%
2.Continuously try to write date to the filesystem that's 100%
3.Systems hangs
  

Actual Results:  systems hangs

Additional info:

Comment 1 Bill Nottingham 2005-06-14 16:10:27 UTC
Assigning the kernel.

Kieran: please reset the arch to something that accurately reflects which OS
version you're running (it's currently set to sparc.)


Comment 2 Kieran Foley 2005-06-14 17:21:34 UTC
Hi Bill,

The OS is Linux 

[root@recordme root]# cat /etc/redhat-release 
Red Hat Enterprise Linux ES release 3 (Taroon)

[root@recordme root]# uname -a
Linux recordme 2.4.21-4.ELsmp #1 SMP Fri Oct 3 17:31:21 EDT 2003 i686 athlon 
i386 GNU/Linux

The Hardware is a Sun Fire V20.

Comment 4 Peter Staubach 2005-06-20 14:27:59 UTC
It doesn't seem like something as simple as filling up a file system should
cause a system hang.

I tried to reproduce this by creating a 14G file system, comfortably larger
than the 1G memory on the system, and then running a program which continually
tries to write a 64K buffer to a file.  Eventually, the file system fills up.

It was quite slow at times while the file and file system were filling up, but
I haven't been able to reproduce a hang yet.  The testcase is just getting a
ENOSPC error each time that it attempts to write the buffer.

So, I need some more information.  Does this happen everytime that the file
system fills up?  Are there any messages via dmesg(8)?  This cronjob, how is
it copying files?

Comment 5 Kieran Foley 2005-06-20 15:04:08 UTC
The system hung on the 2 occasions that the file system filled up over. Both 
times it happened over a weekend. I am guessing thata the cron was trying to 
continuously trying to write for over a day. We were not notified of the 
matter until Monday morning. The cronjob uses the mv command to move the files.

The filesystems in question are configured from a Hitachi 9970 storage unit. 
The server is connected up through a SAN using qlogic HBA's.

Comment 6 Peter Staubach 2005-06-20 18:15:56 UTC
I am not having any luck in reproducing this situation.  I don't have access to
a Hitachi 9970 accessed qlogic HBA's, but I don't know why that would matter,
unless there were messages in /var/log/messages to indicate that there was
something going on there.  There weren't any messages, were there?

I will need some information in order to proceed.  Is the system pingable?  Is
this a hard hang, where nothing appears to be doing anything?  Would it be
possible to capture some information the next that this happens?  I am thinking
of Alt-SysRq-T, Alt-SysRq-M, Alt-SysRq-P, and Alt-SysRq-W.  (This will have to
be enabled first via something like "echo 1 > /proc/sys/kernel/sysrq", before
the hang occurs.)



Comment 7 Kieran Foley 2005-06-20 19:02:11 UTC
There were no messages in the /var/log/messages file to say that there were 
issues with the qlogics etc...

The system was pingable but that's about all. It was a hard hang. 

I have enabled Sysrq but I hope that this does not re-occur since we have 
added a considerable amount of storage to prevent a re-occurance and have 
configured monitoring to send alerts when the file systems starts to fill.

Thanks for your help.

Comment 8 Peter Staubach 2005-06-20 19:19:08 UTC
Well, without some more information, I don't see much that I can do.

I am going to close this BZ as "WORKSFORME", but if the problem reoccurs
and more information can be gained, please reopen this BZ and I will look
at it some more.

Good luck...


Note You need to log in before you can comment on or make changes to this bug.