Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 475876 - CMake 2.6.2 dies on PPC64 builders (std::out_of_range)
Summary: CMake 2.6.2 dies on PPC64 builders (std::out_of_range)
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: cmake
Version: rawhide
Hardware: ppc64
OS: Linux
high
high
Target Milestone: ---
Assignee: Orion Poplawski
QA Contact: Fedora Extras Quality Assurance
URL: http://public.kitware.com/Bug/view.ph...
Whiteboard:
Depends On:
Blocks: FE-ExcludeArch-ppc64, F-ExcludeArch-ppc64
TreeView+ depends on / blocked
 
Reported: 2008-12-10 21:25 UTC by Lorenzo Villani
Modified: 2009-03-25 16:14 UTC (History)
4 users (show)

Fixed In Version: 2.6.3-2.fc11
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-03-10 14:40:44 UTC


Attachments (Terms of Use)
build.log on ppc64 (deleted)
2009-01-13 15:43 UTC, Lorenzo Villani
no flags Details
Valgrind log provided by André Wöbbeking (deleted)
2009-03-08 18:54 UTC, Kevin Kofler
no flags Details

Description Lorenzo Villani 2008-12-10 21:25:50 UTC
CMake 2.6.2 fails to build packages on ppc64 builders this is a recurring and apparently well reproducible bug.
We should still test if the bug is reproducible on ppc32

** Build tasks in koji (from earlies to the latest):
http://koji.fedoraproject.org/koji/taskinfo?taskID=989531
http://koji.fedoraproject.org/koji/taskinfo?taskID=989611
http://koji.fedoraproject.org/koji/taskinfo?taskID=991850
http://koji.fedoraproject.org/koji/taskinfo?taskID=991912

The last build uses the --debug-output trick, Kevin reported it to work once but unfortunately it didn't work this time. We can't of course try again and again until it builds. :-)

Comment 1 Orion Poplawski 2008-12-10 21:47:31 UTC
I can't reproduce this on the only ppc/ppc64 machine I have access to, so I'm pretty much stuck.

I'm building 2.6.3 RC-5 for rawhide, so you might try that once it completes and has been added to the repo.  No idea if it will help, but worth a shot.

Comment 2 Lorenzo Villani 2008-12-10 22:21:02 UTC
I'm giving it a try tomorrow

Comment 3 Orion Poplawski 2008-12-10 22:23:35 UTC
There's a snag in my cmake build (Bug 475887).  We'll have to see what Enrico has to say about that.

Comment 4 Kevin Kofler 2008-12-11 01:57:34 UTC
> We can't of course try again and again until it builds. :-)

Oh we can. ;-)
But of course it's not a good solution.

Comment 5 Orion Poplawski 2008-12-11 17:24:36 UTC
Okay, 2.6.3 RC-5 has been built and it looks like the newRepo task has finished.

Comment 6 Lorenzo Villani 2008-12-12 14:45:19 UTC
(In reply to comment #5)
> Okay, 2.6.3 RC-5 has been built and it looks like the newRepo task has
> finished.

New task submitted by Kevin:
http://koji.fedoraproject.org/koji/taskinfo?taskID=994835

It seems that the mockroot uses the 2.6.3 RC-5 version of cmake but the problem is still there.

Comment 7 Orion Poplawski 2008-12-12 16:44:10 UTC
I've been unable to reproduce myself with various debug methods enabled.  I think it would be helpful to add:

LD_PRELOAD=libSegFault.so SEGFAULT_SIGNALS=abrt

to that start of the %{cmake..  line to try to catch future failures and get a stack trace.

Comment 8 Kevin Kofler 2008-12-12 16:54:06 UTC
FYI, as with the latest cmake/kdepimlibs combination it reproducibly (on Koji) crashes in the same file (according to the debugging output), I tried doing some debugging with messages:
http://cvs.fedoraproject.org/viewvc/rpms/kdepimlibs/devel/kdepimlibs-4.1.85-debug-cmake-crash.patch?revision=1.1&view=markup
and poof, there goes the bug. This is a highly elusive Heisenbug. I suspect an uninitialized variable somewhere. Have you tried running it through valgrind on ppc64?

Comment 9 Lorenzo Villani 2009-01-13 15:32:18 UTC
The Heisenbug disappeared somehow, I'm closing this bug report. It will be re-opened if necessary.

Comment 10 Kevin Kofler 2009-01-13 15:39:22 UTC
It disappeared because we were trying to debug it (that's why it's a Heisenbug ;-) ), we just kept the debug patch there so it keeps building.

Comment 11 Lorenzo Villani 2009-01-13 15:43:32 UTC
Created attachment 328872 [details]
build.log on ppc64

The build log on ppc64

Comment 12 Lorenzo Villani 2009-01-13 15:44:19 UTC
Link to job in koji: http://koji.fedoraproject.org/koji/taskinfo?taskID=1049740

Comment 13 Kevin Kofler 2009-03-07 23:06:43 UTC
André Wöbbeking e-mailed me this backtrace from:

LD_PRELOAD=libSegFault.so SEGFAULT_SIGNALS=abrt

(on x86_64):

cmake(_ZN25cmIncludeDirectoryCommand12AddDirectoryEPKcbb+0x2ca)[0x4ee7ea]
cmake(_ZN25cmIncludeDirectoryCommand11InitialPassERKSt6vectorISsSaISsEER17cmExecutionStatus+0xaa)[0x4fadfa]
cmake(_ZN9cmCommand17InvokeInitialPassERKSt6vectorI18cmListFileArgumentSaIS1_EER17cmExecutionStatus+0x4e)[0x527fbe]
cmake(_ZN10cmMakefile14ExecuteCommandERK18cmListFileFunctionR17cmExecutionStatus+0x2ec)[0x48c27c]
cmake(_ZN10cmMakefile12ReadListFileEPKcS1_PSs+0x49d)[0x492e4d]
cmake(_ZN16cmLocalGenerator9ConfigureEv+0xac)[0x59ef9c]
cmake(_ZN29cmLocalUnixMakefileGenerator39ConfigureEv+0x87)[0x5a2747]
cmake(_ZN10cmMakefile21ConfigureSubDirectoryEP16cmLocalGenerator+0xc6)[0x4956c6]
cmake(_ZN10cmMakefile15AddSubDirectoryEPKcS1_bbb+0x1ca)[0x49596a]
cmake(_ZN24cmAddSubDirectoryCommand11InitialPassERKSt6vectorISsSaISsEER17cmExecutionStatus+0x2d8)[0x506798]
cmake(_ZN9cmCommand17InvokeInitialPassERKSt6vectorI18cmListFileArgumentSaIS1_EER17cmExecutionStatus+0x4e)[0x527fbe]
cmake(_ZN10cmMakefile14ExecuteCommandERK18cmListFileFunctionR17cmExecutionStatus+0x2ec)[0x48c27c]
cmake(_ZN10cmMakefile12ReadListFileEPKcS1_PSs+0x49d)[0x492e4d]
cmake(_ZN16cmLocalGenerator9ConfigureEv+0xac)[0x59ef9c]
cmake(_ZN29cmLocalUnixMakefileGenerator39ConfigureEv+0x87)[0x5a2747]
cmake(_ZN17cmGlobalGenerator9ConfigureEv+0x2e5)[0x5758e5]
cmake(_ZN5cmake15ActualConfigureEv+0xc3)[0x4d82b3]
cmake(_ZN5cmake9ConfigureEv+0x44)[0x4d87c4]
cmake(_ZN5cmake3RunERKSt6vectorISsSaISsEEb+0x17e)[0x4dfede]
cmake(_Z8do_cmakeiPPc+0xcb8)[0x46b688]
cmake(main+0x2c)[0x46c2cc]

(He mailed me this on February 25, sorry for not posting it sooner, I was busy with other stuff.)

Comment 14 Kevin Kofler 2009-03-07 23:21:43 UTC
The function at the top of the backtrace has this code:
  // remove any leading or trailing spaces and \r
  pos = ret.size()-1;
  while(ret[pos] == ' ' || ret[pos] == '\r')
    {
    ret.erase(pos);
    pos--;
    }
  pos = 0;
  while(ret.size() && ret[pos] == ' ' || ret[pos] == '\r')
    {
    ret.erase(pos,1);
    }

I think this should be:
  // remove any leading or trailing spaces and \r
  pos = ret.size()-1;
  while(ret.size() && (ret[pos] == ' ' || ret[pos] == '\r'))
    {
    ret.erase(pos);
    pos--;
    }
  pos = 0;
  while(ret.size() && (ret[pos] == ' ' || ret[pos] == '\r'))
    {
    ret.erase(pos,1);
    }

But I haven't tested at all if this helps. Unfortunately, the backtrace lacks details, I've asked André Wöbbeking if he can produce a Valgrind log with debugging information.

Comment 15 Kevin Kofler 2009-03-07 23:23:43 UTC
Note: This is cmIncludeDirectoryCommand::AddDirectory in Source/cmIncludeDirectoryCommand.cxx.

Comment 16 Kevin Kofler 2009-03-08 18:54:33 UTC
Created attachment 334448 [details]
Valgrind log provided by André Wöbbeking

Here's a Valgrind log, unfortunately also without debugging information.

Comment 17 Kevin Kofler 2009-03-08 19:00:04 UTC
The Valgrind log says the bug is caused by a std::string created within cmIncludeDirectoryCommand::AddDirectory. There's only one such string: the string ret. It then crashes on a call to erase from within that same function. This can only be one of the ret.erase calls mentioned in comment #14. So I think my suggested fix should fix this problem.

Comment 18 Kevin Kofler 2009-03-09 02:39:41 UTC
Should be fixed in Rawhide now. (Still waiting for kdepimlibs to build against the new cmake though, the chainbuild is currently stuck in the waitrepo phase.)

This should also be fixed in F9 and F10. Can we just sync 2.6.3-2 or should it be backported to 2.6.2? I'm for pushing 2.6.3.

Comment 19 Kevin Kofler 2009-03-09 07:59:32 UTC
Looks like my patch really fixed it. Successful kdepimlibs build here:
http://koji.fedoraproject.org/koji/buildinfo?buildID=93380

Now to get it upstreamed...

Comment 20 Kevin Kofler 2009-03-09 08:14:55 UTC
Upstream bug report (with patch): http://public.kitware.com/Bug/view.php?id=8704

Comment 21 Orion Poplawski 2009-03-09 21:16:14 UTC
Kevin - thanks for driving this.  I see no reason not to sync 2.6.3-2 to F10.  Do you want to keep driving this?

Comment 22 Kevin Kofler 2009-03-09 21:19:21 UTC
Yes, I'll handle it. I'd like to sync it to F9 as well (also currently on 2.6.2), is that OK with you?

Comment 23 Orion Poplawski 2009-03-10 14:40:44 UTC
(In reply to comment #22)
> Yes, I'll handle it. I'd like to sync it to F9 as well (also currently on
> 2.6.2), is that OK with you?  

Yes.  Sounds good.  Thanks again.

Comment 24 Fedora Update System 2009-03-12 03:15:42 UTC
cmake-2.6.3-2.fc10 has been submitted as an update for Fedora 10.
http://admin.fedoraproject.org/updates/cmake-2.6.3-2.fc10

Comment 25 Fedora Update System 2009-03-12 03:16:50 UTC
cmake-2.6.3-2.fc9 has been submitted as an update for Fedora 9.
http://admin.fedoraproject.org/updates/cmake-2.6.3-2.fc9

Comment 26 Fedora Update System 2009-03-25 16:10:43 UTC
cmake-2.6.3-3.fc9 has been pushed to the Fedora 9 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 27 Fedora Update System 2009-03-25 16:14:29 UTC
cmake-2.6.3-3.fc10 has been pushed to the Fedora 10 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.