Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 163681 - asyncore.poll3 runs longer than timeout under heavy io load
Summary: asyncore.poll3 runs longer than timeout under heavy io load
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: python
Version: 3.0
Hardware: i686
OS: Linux
Target Milestone: ---
Assignee: Jeremy Katz
QA Contact: Brock Organ
Depends On:
TreeView+ depends on / blocked
Reported: 2005-07-20 08:53 UTC by Andre Schubert
Modified: 2007-11-30 22:07 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2007-01-06 21:13:41 UTC
Target Upstream Version:

Attachments (Terms of Use)

Description Andre Schubert 2005-07-20 08:53:05 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050323 Galeon/1.3.20

Description of problem:
We have production server running our network management system.
One subsystem is a script which runs every 5 minutes,
this script gets informations from around 1000 snmp-agents.
Because near of the half of these agents is offline we need to
get these informations asyncronously.
The script getting the data is written in python and uses the asyncore.poll3
function with a timeout of 1.0 second.
The server itself is running on a software raid-1 with 2 ide harddisk.
The average IO/Wait of the system is around 15%.
Sometimes under heavy IO-load some poll3 cycles takes much more time than they should, i saw cycles running up to 20 seconds. This very often happens when a daily vacuum of a large postgres-database is running, or other processes are writing a large amount of data to the disks.
This is really bad, since it sometimes is not possible to collect data from all agents.
It seems that the whole system freezes for several seconds. Thatswhy i think its not only a python problem.

I hope i can get some help on these problem.
I could give additional informations if someone needs it.

python: 2.2.3-6.1
kernel: 2.4.21-32.0.1.EL

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
To reproduce the problem i took a development machine and setup a software raid-1, then i wrote a little python script whith debugging output that trys to connect to 200 machines which are not responding. When running this script under heavy io writes i see poll3 cycles running longer than the timeout is.

Additional info:

Comment 1 Mihai Ibanescu 2005-09-30 16:40:02 UTC
Sorry it took so long to get to this bug report.

Can you please attach an strace of the process while it's under heavy I/O and
poll3 fails? I'd be curious what lower-level system call it uses.

Comment 2 Andre Schubert 2005-11-04 08:42:56 UTC
Sorry too for the late answer.

I think we haved solved this problem.
After several weeks of testing we have rewritten our script
which collects the informations asyncronously.

The hangs were caused by the underlying write to disk,
which is called directly after some data have arrived.

The new script first collects all the data, and after that
all the data is written completely out to the disk.
After we have changed to this new implementation,
we never saw a hang in a poll3 cycle.

Sorry for the false alarm.

Note You need to log in before you can comment on or make changes to this bug.