Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 154242 - CUPS daemon stops accepting job when network printer unreachable
Summary: CUPS daemon stops accepting job when network printer unreachable
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: cups
Version: 3.0
Hardware: All
OS: Linux
Target Milestone: ---
Assignee: Tim Waugh
QA Contact:
Depends On:
TreeView+ depends on / blocked
Reported: 2005-04-08 18:02 UTC by Tom Sightler
Modified: 2007-11-30 22:07 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2007-10-19 19:04:56 UTC
Target Upstream Version:

Attachments (Terms of Use)

Description Tom Sightler 2005-04-08 18:02:35 UTC
Description of problem:

The problem is nearly identical to the problem I reported 5-2004 (Bugzilla
124345).  Since the that bug was fixed I haven't seen this issue in months,
however, approximately 1 month ago we upgraded from 13.3.16 to 13.3.27.  Since
then we are seeing this same basic issue again, if there is a backend lpd
process attempting to spool to an unreachable network printer the cupsd process
hangs and it will completely stop accepting jobs from other clients.

CUPS clients will make a connection to the server, however, the connection will
simply hang forever.  Running commands on the server like 'lpq -a' and 'lpstat'
will also hang forever.

One thing that is different about this case, if I kill the backend process, then
cupsd will continue without issues.  With the previous bugzilla the backend
process would end up in an unkillable "zombie" state and you could only recover
by restarting the cupsd daemon.

It seems hard to reproduce, but it's hit me 3 times in four weeks and it kills
printing company wide.

Is it possible that recent security patches have somehow reintroduce an subtle
problem again?  Please let me know what other information I can provide.


Version-Release number of selected component (if applicable):


How reproducible:
Sometime, but not always

Steps to Reproduce:
1.  Print jobs to an unreachable LPD printer
2.  Let backend hang there, trying to print
Actual results:
CUPSD eventually hangs all connecting clients to all printers.

Expected results:
Printing to all online printers should continue without issues.

Additional info:

Comment 1 Tim Waugh 2005-04-18 14:49:21 UTC
Are you able to reproduce this problem on demand with any reliability at all? 
You say "sometime" as how reproducible it is -- do you mean the three times in
four weeks, or is it more frequent on a test machine?

When this occurs, in what way is the remote printer unreachable?  Is there no
DNS entry, or are connections refused, or is there no response at all from the
remote IP address (or something else)?

The only patch between 1.1.17-13.3.16 and 1.1.17-13.3.27 that touches the
scheduler is cups-attrs.patch, to fix bug #107789.  That bug certainly is fixed,
and the fix is certainly correct.  The symptom had been a scheduler crash.

Could you please start by setting the LogLevel in /etc/cups/cupsd.conf to
"debug2"?  That way we stand more of a chance of diagnosing the problem in
future.  Thanks.

Comment 2 Tom Sightler 2005-04-19 03:03:21 UTC
I have not been able to reproduce this issue reliably.  I suceceeded in hanging 
the print server one time on a test box.  We have recently upgraded our backup/
standby printer server to RHEL4 and are testing it but have not been able to 
reproduce the issue on the much newer version of CUPS included in this release.  
If testing continues to go well we may just upgrade our primary box.

That being said I'm pretty sure this is a real problem, possibly not related to 
the previous bug at all.  After the fix for 124345 I've actually received a few 
mails from others reporting similar hangs even with versions that contain that 
fix.  Most had similar environments as ours and saw random hangs.  Here's a 
typical example:

> We have 120 printers and 250 users on the system. The application is
> character based (written in Business BASIC) and all it does is
> printing using the lp -d command to JetDirect printers.
> Your problem seemed to be about the same, because cups hangs when the
> system gets busy at the end of the day when all users start printing
> at about the same time, and I found a job that could not be printed
> because the networkprinter was powered off.

At the time this was reported to me we were not seeing an issue, but it has 
returned with a vengence the last few weeks.

When we see the failure the printer is unreachable, usually powered off, or 
perhaps a network/WAN outage.  DNS still resolved just no response from that host.

It hangs the entire print system, anything attempting to talk to cups just hangs 
indefinitely including local commands such as lpstat.  This is not as bad as the 
hang with bug 124345 as if I manually kill the lpd backend process the cupsd 
process will recover.  With the previous bug, it required killing the cupsd 
process, so it may be unrelated.

Also, about a third of our printers use the lpd backend, while the rest use the 
sockect backend.  We've only seen this issue with the lpd backend.

I will increase the log level and hope it happens again when I can capture it.  
Is there anything else I can do while it's hung to capture information?  Because 
it is our enterprise printing enviroment I usually have only a short amount of 
time, but anything I can do I will try.


Comment 3 Tim Waugh 2005-04-19 11:05:42 UTC
> Is there anything else I can do while it's hung to capture information?

Yes, there is.  You haven't set the 'hardware' field of this report, but
presuming it is i386 please fetch and install this debuginfo package:

This package does not interfere with the running CUPS program, but instead
provides extra files that allow the debugger to make more sense of the running
cupsd process.

Then, if/when the cupsd process hangs, it would be extremely useful to see the
debugger output.  Become root using 'su -', then use the 'script' command to
start recording output, then do the following:

ps axf | grep [c]upsd

This will show you the process ID of the cupsd process.  Let's say it's 637. 
Then, attach the debugger:

gdb /usr/sbin/cupsd 637

(obviously with the correct PID instead)

At the (gdb) prompt, you can then find out where the program is stopped:

(gdb) backtrace

Then, it would be handy to see what the local variables are at each step on the
stack, like this:

(gdb) info locals
(gdb) up
(gdb) info locals
(gdb) up

..until it says you can't go up the stack any more.


Comment 4 Tom Sightler 2005-04-30 05:43:02 UTC
Unfortunately, now that I'm ready to do something when it fails, we haven't seen 
the failure in weeks.  I'm going to try harder to reproduce this in the lab next 
week as I did look back at our error cases and found a couple of things in common.

The hang always occurred when a job was already partially spooled to the printer 
and the printer failed in some way (for example a printer jam) and the printer 
was then powered off and left that way until a technician could arrive to repair 
it.  I'm hoping I can reproduce that environment and perhaps get lucky.


Comment 5 Tim Waugh 2005-05-04 11:57:36 UTC
Okay.  Fingers crossed we can catch this with debuginfo..

Comment 6 Tom Sightler 2005-08-23 13:39:08 UTC
Well, after a couple of months of running clean, we have suddenly hit this issue
with a vengence over the last two weeks or so.  We've had about 5 total hangs in
the last 10 days.  Can I get a debuginfo package for the latest cups release
(1.1.17-13.3.31).  Hopefully with that we can catch it.


Comment 7 Tim Waugh 2005-09-07 11:52:38 UTC
Here it is:

Comment 9 RHEL Product and Program Management 2007-10-19 19:04:56 UTC
This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
For more information of the RHEL errata support policy, please visit:
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.

Note You need to log in before you can comment on or make changes to this bug.