|Summary:||Memory test bails prematurely when testing greater than 256 GB in RH4.6 x64|
|Product:||Red Hat Hardware Certification Program||Reporter:||Gregg Shick <gregg.shick>|
|Component:||Test Suite (tests)||Assignee:||Greg Nichols <gnichols>|
|Status:||CLOSED WONTFIX||QA Contact:||Lawrence Lim <llim>|
|Version:||5.2||CC:||dwa, gnichols, hcp-admin, micah.parrish, rick.hester, rlandry, sandy.garza, tools-bugs|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2008-12-16 20:44:44 UTC||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Cloudforms Team:||---||Target Upstream Version:|
Description Gregg Shick 2008-07-01 19:56:51 UTC
Description of problem: Memory test bails prematurely when testing greater than 256 GB in RH4.6 x64 Version-Release number of selected component (if applicable): RH4.6 x64 / HTS 5.2-16 Proliant 785 / 8 processors / 512GB memory How reproducible: Every time Steps to Reproduce: 1.Install RH4.6 x64 on a system with 512GB 2.Install HTS 5.2-16 3.Execute memory test Actual results: Test fails after only a few seconds of runtime. Expected results: Test at minimum runs to completion. Additional info: It will run successfully and pass with 256GB. The breaking point is somewhere between 320 and 384GB.
Comment 1 Gregg Shick 2008-07-01 19:56:52 UTC
Created attachment 310713 [details] test results passing with 256 and failing with greater than 256GB.
Comment 2 David Aquilina 2008-07-09 15:06:52 UTC
Gregg, Does the test fail if you run it manually outside of the test harness? To do so, switch to the directory containing the memory test and run runtest.sh: cd /usr/share/hts/tests/memory ./runtest.sh Please capture any output produced, as well as /var/log/messages. Thanks!
Comment 3 Gregg Shick 2008-07-16 18:34:14 UTC
Created attachment 311977 [details] /var/log/messages output
Comment 4 Gregg Shick 2008-07-16 18:34:47 UTC
Created attachment 311978 [details] runtest.sh results run outside of hts test harness
Comment 5 Gregg Shick 2008-07-16 18:36:22 UTC
David The test also fails outside of hts. I tried increasing swap to 1TB (David Hester suggestion). That also had no affect.
Comment 6 Greg Nichols 2008-07-16 23:45:35 UTC
To run the test outside of the hardness, use "make run", which will compile threaded_memtest.c as part of the test run. Please give this a try.
Comment 7 Sandy Garza 2008-07-17 14:56:34 UTC
Is there a memory limitation on RHEL 4.6? What is the max memory support for 4.6?
Comment 8 Rob Landry 2008-07-17 22:03:05 UTC
rhr2 has been deprecated, closing these remaining bugs as WONTFIX. Future bugs against the "hts" test suite should be opened agains the "Red Hat Hardware Certification Program" product selecting either "Test Suite (harness)" or "Test Suite (tests)" components.
Comment 9 Gregg Shick 2008-07-18 13:20:45 UTC
this was tested using hts-5.2-16.el4.noarch, not rhr2.
Comment 10 Gregg Shick 2008-07-21 18:22:08 UTC
(In reply to comment #6) > To run the test outside of the hardness, use "make run", which will compile > threaded_memtest.c as part of the test run. > > Please give this a try. Greg Do you have more specific instructions for doing this? Thanks..
Comment 11 David Aquilina 2008-08-20 15:56:16 UTC
(In reply to comment #10) > Do you have more specific instructions for doing this? Thanks.. cd /usr/share/hts/tests/memory make run ... should do the trick.
Comment 13 Micah Parrish 2008-08-28 07:05:14 UTC
chmod a+x ./runtest.sh ./memory.py ./runtest.sh /usr/share/hts/tests/memory/memory.py Running ./memory.py: System Memory: 515479 MB Free Memory: 515010 MB Swap Memory: 1983 MB Starting Threaded Memory Test running for more than free memory at 516034 MB for 60 sec. mmap: Cannot allocate memory Warning: memsize > free_mem. You will probably hit swap. Detected 32 processors. RAM: 99.6% free (501G/503G) Testing 503G RAM for 60 seconds using 64 threads: thread 0: mapping 8063M RAM thread 1: mapping 8063M RAM thread 2: mapping 8063M RAM thread 3: mapping 8063M RAM thread 4: mapping 8063M RAM thread 5: mapping 8063M RAM thread 6: mapping 8063M RAM thread 7: mapping 8063M RAM thread 8: mapping 8063M RAM thread 9: mapping 8063M RAM thread 10: mapping 8063M RAM thread 11: mapping 8063M RAM thread 12: mapping 8063M RAM thread 13: mapping 8063M RAM thread 14: mapping 8063M RAM thread 15: mapping 8063M RAM thread 16: mapping 8063M RAM thread 17: mapping 8063M RAM thread 18: mapping 8063M RAM thread 19: mapping 8063M RAM thread 20: mapping 8063M RAM thread 21: mapping 8063M RAM thread 22: mapping 8063M RAM thread 23: mapping 8063M RAM thread 24: mapping 8063M RAM thread 25: mapping 8063M RAM thread 26: mapping 8063M RAM thread 27: mapping 8063M RAM thread 28: mapping 8063M RAM thread 29: mapping 8063M RAM thread 30: mapping 8063M RAM thread 31: mapping 8063M RAM thread 32: mapping 8063M RAM thread 33: mapping 8063M RAM thread 34: mapping 8063M RAM thread 35: mapping 8063M RAM thread 36: mapping 8063M RAM thread 37: mapping 8063M RAM thread 38: mapping 8063M RAM thread 39: mapping 8063M RAM thread 40: mapping 8063M RAM thread 41: mapping 8063M RAM thread 42: mapping 8063M RAM thread 43: mapping 8063M RAM done. ...finished running ./memory.py, exit code=1 recovered exit code=1 hts-report-result /HTS/hts/memory FAIL /tmp/tmp.P28871
Comment 14 Micah Parrish 2008-08-28 07:43:00 UTC
I also tried to run memhog. It runs with memhog 255g and fails with memhog 256g. The failure message is: numactl: mmap: Cannot allocate memory
Comment 15 Rob Landry 2008-08-28 18:44:27 UTC
We'll need to look but I think we're just running into the process size ceiling of x86_64 and will need to do like we do with x86 (process split) but @ a larger number instead of ~4GB.
Comment 17 David Aquilina 2008-09-17 19:31:00 UTC
Greg, Can you please open up a certification request with the INFO test run and the (failed) memory test logs? 512G is larger than the current maximum so we'll need to have Engineering take a look at the system as well. Please post the certification # once you've done so. Thanks! -David
Comment 24 David Aquilina 2008-10-15 19:34:18 UTC
(In reply to comment #14) > I also tried to run memhog. It runs with memhog 255g and fails with memhog > 256g. The failure message is: > > numactl: mmap: Cannot allocate memory Can you provide us with a pointer to or copy of memhog? thanks!
Comment 25 Micah Parrish 2008-11-03 17:48:16 UTC
It's proprietary, part of a test suite called Xorsyst, formerly known as busy. I assume you can have it with the proper license. Contact firstname.lastname@example.org if you still need it.
Comment 26 Sandy Garza 2008-12-01 16:38:48 UTC
David, Our engineer is asking "Why does the test pass when system memory is 256GB, and fail when 512GB? What does "process limit" have to do with system RAM?" Thanks.
Comment 27 David Aquilina 2008-12-01 18:46:24 UTC
Sandy, Currently a single process is used to run the memory test, so when that process hits the process size limit it's unable to allocate any additional memory. -David
Comment 28 Sandy Garza 2008-12-03 15:17:48 UTC
David, Is RH proposing a code change to the Cert Test? If so what is the change. If RH is proposing a code change to the kernel, what is the change. A code snippet would be helpful. Thanks,Sandy
Comment 29 David Aquilina 2008-12-03 20:33:30 UTC
Sandy, We've been waiting to hear what HP's desire is here. If you do not care about increasing the process size limit then we can look into changing certification test suite to use multiple processes above 256G. This would however limit any one process to 256G, which could possibly cause customer problems as they bump against this limit. If you wanted to raise the process limit, you'll need to open an RFE with Ron to do so. -David
Comment 30 Sandy Garza 2008-12-09 15:53:24 UTC
David, We would like to close the BZ for the following reasons: 1. According to RH, the failure reported in this BZ seems to be expected behavior 2. We have no customer requests to increase the process size limit in the RH4.x kernel 3. RH4.x itself officially supports only 256G of system RAM for AMD64.
Comment 31 Rob Landry 2008-12-16 20:44:44 UTC
Closing wontfix per the above reasons provided by HP; as well RHEL5.x should not encounter a similar issue making this not a generic problem.