Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 451808 - RHEL 5 Performance issues on stat() over ext3 as compared to Ubuntu
Summary: RHEL 5 Performance issues on stat() over ext3 as compared to Ubuntu
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.1
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Josef Bacik
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-06-17 14:50 UTC by Issue Tracker
Modified: 2018-10-19 22:10 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-02-24 17:15:54 UTC


Attachments (Terms of Use)
ubuntu config file (deleted)
2008-06-18 18:50 UTC, Kent Baxley
no flags Details

Description Issue Tracker 2008-06-17 14:50:38 UTC
Escalated to Bugzilla from IssueTracker

Comment 1 Issue Tracker 2008-06-17 14:50:39 UTC
Kent,

Please see attached- I discussed this a bit with the two developers involved, and they were suprised at the performance degradation in this particular test using the RHEL 5 kernel. They are looking for some hints that we are hoping your kernel folk might be able to offer.

From: 	Brown, David M JR  
Sent:	Thursday, March 06, 2008 3:28 PM
To:	O'Leary, Shaun
Cc:	Felix, Evan J
Subject:	rhel kernel issue

We have a problem with the redhat kernel off of rhel5. We seem to be running into a performance issue compared to ubuntu kernels. We are seeing that the ubuntu kernel config seems to be a better configuration for the kernel than the rhel config for the I/O tests that we are running.

Hardware:
Dell 2950 with external raid (megaraid) expanders filled with 14x75Gb 15K SATA drives in a raid 10 configuration. 

Software:
RHEL 5 with updated kernel ext3 file systems everywhere.
We ran an application called fileop from the IOZone Package in the directory where the raid 10 volume is and the commandline we used was:
fileop -l 16 -u 32 -e -s 4192

Tests:
1) Initial discovery
First we installed rhel5 and ran the fileop and got something about 460,000 stats per second.
Then we started up an ubuntu gutsy live cd and performed the same fileop tests and got a little under 800,000 stats per second.
The difference in the results confused us greatly.

2) Further investigation
Then we decided to eliminate the user-space components of the file operations.
We then pulled the ubuntu kernel source and config then built in on the rhel5 system.
We then repeated the same results as step one, we got 800,000 stats per second with the ubuntu kernel built on a rhel5 system and 460,000 stats per second with the rhel5 kernel.

3) Common Source
We then decided to use a common source tarball, we figured since both distro's have patched kernels lets try a vanilla with the two configuration files.
So we pulled down a 2.6.22.18 kernel from kernel.org and copied the rhel5 config into one tree and the ubuntu config into the other tree. The build was simply done by:
yes "" | make oldconfig && make -j8 && make modules_install && make install

The results were slightly different but still quite a difference;
1.) 680,000 stats per second
	a. kernel.org 2.6.22.18 kernel
	b. ubuntu kernel config 
2.) 460,000 stats per second 
	a. kernel.org 2.6.22.18 kernel 
	b. rhel5 kernel config

We have looked through the kernel configs and have been unable to pin point why there is such a great difference between the two kernels running this particular test. Any help would be beneficial to help us keep from moving to using ubuntu.

Thanks,
- David Brown
This event sent from IssueTracker by fleite  [SEG - Kernel]
 issue 167982

Comment 2 Issue Tracker 2008-06-17 14:50:40 UTC
Shaun,

We could use some additional information here, for starters:

1) Can I get a sysreport from the 2950 that this was run on?
2) Were both tests performed using the 32-bit or 64-bit versions of the
operating system?  Specific RHEL kernel version that was used in initial
testing?  Ubuntu kernel version?
3) The full results from the fileop tests
4) If possible, can you capture iostat output while these tests are
running on RHEL and Ubuntu (use the -x option ).  Also some vmstat numbers
would be good.  I'd like to see the results from RHEL5 with the stock
kernel config, and the Ubuntu results with its stock config.  

The more data we can get about this, the better.

kbaxley assigned to issue for PNL.
Category set to: Kernel
Internal Status set to 'Waiting on Customer'
Status set to: Waiting on Client
Ticket type set to: 'Problem'

This event sent from IssueTracker by fleite  [SEG - Kernel]
 issue 167982

Comment 3 Issue Tracker 2008-06-17 14:50:41 UTC
Also, does your user happen to know what default I/O scheduler was being
used in the Ubuntu test?  According to what I'm reading, Ubuntu uses CFQ
as the default for desktop kernels, and Deadline for server kernels.  By
default all the RHEL kernels are set up as CFQ.  Depending on the type of
workload, one scheduler may have advantages over the other.  


This event sent from IssueTracker by fleite  [SEG - Kernel]
 issue 167982

Comment 4 Issue Tracker 2008-06-17 14:50:42 UTC
A few other items that highlight the differences between Ubuntu's client
and server kernels.  It looks as if Ubuntu optimizes their client and
server versions differently.

I'll have to check the RHEL5 ones to see how they're set up.  There is
no separate optimization for the RHEL5 client and server versions.

Info on Ubuntu kernels:

Preemption

The server kernel has kernel preemption turned off
(CONFIG_PREEMPT_NONE=y), while the desktop kernel has it enabled
(CONFIG_PREEMPT_BKL=y, CONFIG_PREEMPT_VOLUNTARY=y). Preemption works along
with scheduling to fine-tune performance, efficiency and responsiveness. In
non-preemptive kernels, kernel code runs until completion; the scheduler
can't touch it until it's finished. But the Linux kernel allows tasks to
be interrupted at nearly any point (but not when it is unsafe, which is a
whole huge fascinating topic all by itself), so that tasks of
lesser-priority can jump to the head of the line.

This is appropriate for desktop systems because users typically have
several things going at once: writing documents, playing music, Web
surfing, downloading and so on. Users don't care how responsive
background applications are; they care only about the ones they're
actively using. So if loading a Web page takes a little longer while the
user is writing an e-mail, it's an acceptable trade-off. Overall
efficiency and performance are actually reduced but not in a way that
annoys the user.

On servers you want to minimize any and all performance hits, so turning
off preemption is usually the best practice.

Memory

The 32-bit server kernel supports up to 64 GB of memory; the desktop
kernel, a mere 4 GB (CONFIG_HIGHMEM64G=y, CONFIG_HIGHMEM4G=y). You'll
only see these options in 32-bit kernels because the 32-bit address space
is big enough to support only 4 GB without trickery. Or by using the Intel
Physical Address Extension (PAE) mode, if you want to get technical. Linux
supports PAE, and you also need PAE support in your CPU. Anything newer
than a Pentium Pro or AMD K6-3 should be fine. On a 64-bit system you
won't see any memory options because it doesn't need hacks to overcome a
lack of memory addressing space; you should be fine until your needs exceed
16 exabytes of RAM.

Ticks and HZ

Both kernels support on-demand interrupt timers (CONFIG_NO_HZ=y), or the
so-called "tickless" option. This means that during periods of no
activity, the system goes into a truly idle state, which is supposed to
save on power and cooling.

The server kernel is set to a timer interrupt rate of 100 Hz
(CONFIG_HZ=100, CONFIG_HZ_100=y), which means it accepts 100 interrupts
per second. Another way to think of this is the kernel looks up and peers
around 100 times per second for something to do. The desktop kernel is set
to 250 Hz — lower numbers equal lower overhead and higher latency;
higher numbers equal higher overhead and lower latency. Higher numbers
generally mean the system feels more responsive, at the price of higher
CPU usage. Some processes require more interrupts; for example, video
processing and VoIP servers need 1000 Hz. If you need to change the Hz
value it requires a kernel re-compile.


This event sent from IssueTracker by fleite  [SEG - Kernel]
 issue 167982

Comment 5 Issue Tracker 2008-06-17 14:50:43 UTC
One thing that stuck out for me while looking at the kernel configs for
Ubuntu Client and RHEL5 is that we set the default HZ to 1000, versus 250
for the Ubuntu Client.  250, I believe, is the default setting for
upstream as well.  We've got a way to tune this setting in RHEL5 without
having to recompile the kernel.  I'm going to try these tests out with a
lower Hz setting and see how fileop does on RHEL5.




This event sent from IssueTracker by fleite  [SEG - Kernel]
 issue 167982

Comment 6 Issue Tracker 2008-06-17 14:50:44 UTC
File uploaded: fileop_results

This event sent from IssueTracker by fleite  [SEG - Kernel]
 issue 167982
it_file 123424

Comment 7 Issue Tracker 2008-06-17 14:50:45 UTC
I tried a few things with tweaking the default Hz setting.

I've got a system dual-booted with the RHEL5.2 Beta and Ubuntu 7.10
client edition.

I installed iozone on each instance and ran the fileop tests with the
parameters supplied in the ticket summary.  On a stock installation of
each, I got approximately 400,000 stats per second on Ubuntu vs. 200,000
on RHEL5.2.  Since the one thing that stood out the most in the kernel
configs (at least for me) was the difference in the default Hz setting
(1000 on RHEL5.2 vs 250 on Ubuntu), I decided to adjust the setting down
on my RHEL5 system.

I did this using divide=4 command on the kernel command line.  You can add
this to grub.conf.  This cranks down the Hz setting to 250.  I re-ran the
tests on RHEL5, and was able to get a better reading on the number of file
stats per second (300,000).  Cranking the HZ down to 100 didn't seem to
make much of a dent in the stats numbers, though.

So, for starters, would it be possible for your users to try the
"divide=" parameter to see if it helps improve things?  Start with
divide=4, then divide=10.  I've attached my results for your viewing. 

Also, are the plans for this system?  Are the day-to-day activities going
to mimic what the fileop test is doing?

Thanks.


This event sent from IssueTracker by fleite  [SEG - Kernel]
 issue 167982

Comment 8 Issue Tracker 2008-06-17 14:50:46 UTC
Kent,

Thanks- I'll forward this on to them.

They are working on a configuration for a very high performance Metadata
Server (MDS) for our 750TB Lustre archive. As I understand it, this
particular metric has a direct impact on the performance of the MDS (and
the overall performance of throughput to the archive). This is a new MDS
that will replace an existing one, and they put some extra money into a
high performance Raid 10 configuration using SAS drives to get very high
disk performance. 

Shaun

-----Original Message-----
From: Red Hat Issue Tracker [mailto:tao@redhat.com] 
Sent: Wednesday, March 12, 2008 11:26 AM
To: O'Leary, Shaun
Subject: (UPDATED) Issue #167982 (RHEL 5 Kernel issue)[PNL]


Update to issue 167982 by kbaxley
Action: I tried a few things with tweaking the default Hz setting.

I've got a system dual-booted with the RHEL5.2 Beta and Ubuntu 7.10
client edition.

I installed iozone on each instance and ran the fileop tests with the
parameters supplied in the ticket summary.  On a stock installation of
each, I got approximately 400,000 stats per second on Ubuntu vs. 200,000
on RHEL5.2.  Since the one thing that stood out the most in the kernel
configs (at least for me) was the difference in the default Hz setting
(1000 on RHEL5.2 vs 250 on Ubuntu), I decided to adjust the setting down
on my RHEL5 system.

I did this using divide=4 command on the kernel command line.  You can
add this to grub.conf.  This cranks down the Hz setting to 250.  I
re-ran the tests on RHEL5, and was able to get a better reading on the
number of file stats per second (300,000).  Cranking the HZ down to 100
didn't seem to make much of a dent in the stats numbers, though.

So, for starters, would it be possible for your users to try the
"divide=" parameter to see if it helps improve things?  Start with
divide=4, then divide=10.  I've attached my results for your viewing. 

Also, are the plans for this system?  Are the day-to-day activities
going to mimic what the fileop test is doing?

Thanks.



https://enterprise.redhat.com/issue-tracker/167982

Previous Events:
----------------
Posted 03-12-2008 02:26pm by kbaxley
File uploaded: fileop_results

https://enterprise.redhat.com/issue-tracker/?module=download&fid=123424


Posted 03-12-2008 11:19am by kbaxley
One thing that stuck out for me while looking at the kernel configs for
Ubuntu Client and RHEL5 is that we set the default HZ to 1000, versus
250 for the Ubuntu Client.  250, I believe, is the default setting for
upstream as well.  We've got a way to tune this setting in RHEL5 without
having to recompile the kernel.  I'm going to try these tests out with a
lower Hz setting and see how fileop does on RHEL5.









Internal Status set to 'Waiting on Support'
Status set to: Waiting on Tech

This event sent from IssueTracker by fleite  [SEG - Kernel]
 issue 167982

Comment 9 Issue Tracker 2008-06-17 14:50:47 UTC
Thank you, Shaun.  I'm also going to try and bounce this off of one of our
performance engineers in hopes that he may be able to offer some guidance. 


Internal Status set to 'Waiting on Customer'
Status set to: Waiting on Client

This event sent from IssueTracker by fleite  [SEG - Kernel]
 issue 167982

Comment 10 Issue Tracker 2008-06-17 14:50:48 UTC
More from David below- I'll upload their fsstats info.

--------------------
You might also mention that even with the 25% difference between ubuntu
and redhat with the mod he suggested is a difference between 28 Hours and
21 Hours to do a single operation on all the files in the archive which
means we can do that operation daily vs. every other day. Increasing the
performance of the default redhat kernel by 50% to be 75% of the
performance of ubuntu is good but at the scale we're talking about its
still a substantial difference.

Did he also get the file system statistics on the size and number of files
and directories in the archive?

Attached is a copy of the data and it's gone through ERICA I believe.

The lustre metadata server just has the basic file tree with all files and
directories the actual data is stored on the OSS's so when we want to do a
scan of the file system we are really hammering the MDS and that's taking
longer and longer as the file system gets bigger.

Also, since there's no data on the MDS these fileop operations are all
the MDS does nothing else...

Thanks,
- David Brown 



This event sent from IssueTracker by fleite  [SEG - Kernel]
 issue 167982

Comment 11 Issue Tracker 2008-06-17 14:50:49 UTC
File uploaded: fsstats.nwfs.01_10_2008

This event sent from IssueTracker by fleite  [SEG - Kernel]
 issue 167982
it_file 123613

Comment 12 Issue Tracker 2008-06-17 14:50:51 UTC
Internal Status set to: 'Waiting on Support'
Just an update for you...One of our performance engineers got back to me
and told me that it appears  that this is a valid (and interesting issue).
 He's going to collaborate with his team, hopefully, this week and try to
get back to me on it.

I'll keep you posted.


Status set to: Waiting on Tech

This event sent from IssueTracker by fleite  [SEG - Kernel]
 issue 167982

Comment 13 Issue Tracker 2008-06-17 14:50:52 UTC
From: shak <dshaks@redhat.com>
To: Kent Baxley <kbaxley@redhat.com>, shak <dshaks@redhat.com>
Had tried to enlist Shak's help, but, it appears he may be too swamped to
contribute much right now.

Subject: Re: Interesting performance issue at Pacific Northwest National
Lab
Date: Fri, 14 Mar 2008 03:27:27 -0400
User-Agent: Thunderbird 2.0.0.12 (Windows/20080213)


Hi Kent,

Need to research this a bit ...been traveling w/ customers and 
partners.  But it appears you have a valid interesting test case.
Will share with my team this week (Barry Marson, Sanjay Rao) and/or the 
filesystem guys (Peter Staubach and Eric Sandeen)

Can not commit but always happy to help .... when we can :)
A bit late tonight so going to get some rest now.

Shak



This event sent from IssueTracker by fleite  [SEG - Kernel]
 issue 167982

Comment 14 Issue Tracker 2008-06-17 14:50:54 UTC
Date Problem Reported:  3-7-08

Problem:  Customer reports that they're getting much better stat
performance from Ubuntu 7.04 Desktop version than they are from the RHEL5
Server edition. The "stats" numbers were viewed using iozone's fileop
test.  This was all done on the i386 versions of Ubuntu and RHEL5.

Why is the stats number important to this customer??:

The "fileop" test from iozone is a meta-data stress test.  The customer
is working on a configuration for a very high performance Metadata
Server (MDS) for their 750TB Lustre archive. The stats metric has a direct
impact on the performance of the MDS (and
the overall performance of throughput to the archive). This is a new MDS
that will replace an existing one, and they put some extra money into a
high performance Raid 10 configuration using SAS drives to get very high
disk performance.   

How reproducible: Always.  For complete details on what the customer did,
see the initial problem report summary.  

Steps to reproduce / What's been done so far:

I was able to reproduce this on a smaller scale.  Here's what I did:

1) Dual-booted a system with i386 versions of Ubuntu 7.04 and the RHEL5.2
Beta.  The Ubuntu Kernel was at 2.6.22-14, while, I used 2.6.18-84 from
RHEL5.2. ext3 filesystem was used in all tests (just like the
customer's). 

2) The system I used was a two-socket Intel Xeon 2.80GHz system with 4GB
of RAM. 120GB U160 scsi disk for main OS.

3) After installing the operating systems, I installed a copy of iozone on
each.  

4) On each system, I invoked the fileop command using the same options in
the customer test:

	fileop -l 16 -u 32 -e -s 4192

5) I also observed similar behavior.  The 'stats' output from the fileop
run on Ubuntu was nearly twice as fast as that of RHEL5.2 Beta's.  (see
the first entry in the fileop_results file attached to this ticket).

6) The customer broke things down to the kernel .config files to see if
there were any differences that stuck out, and they couldn't find
anything.  I took this information and did a comparison myself.  After
doing a little research, it appears that the kernel config on Ubuntu's
desktop version is very close to the configs that we ship on all our RHEL5
offerings (Server & Client).  The one thing that stuck out to me that
looked to have some impact on performance was the CONFIG_HZ setting. 
Ubuntu's is set to 250 (which, I think, is the default for the upstream
kernels, anyway), while ours is set to 1000. 

7) Based on this information, I attempted to adjust down the default Hz
setting to see if I could get any better numbers.  I was able to
accomplish this using the "divide=4" command-line parameter, which set
HZ to 250.  The numbers I got were much better (see fileop_results
attached), but, 300,000 stats per second was about the best I could
squeeze out of this.  Cranking things down further did not seem to make
much of a dent.

8) Customer was happy to hear that we could get the performance to within
75% of what they got on Ubuntu, but, when this is placed in context of
what the customer's doing, this makes the difference between 28 Hours and
21 Hours to do a single operation on all the files in their archive.

What is expected from Engineering / SEG:

I'm sort of at a loss as to what to do to try and squeeze out better
performance on this.  I experimented with the different I/O schedulers
(deadline, AS, etc.), but, as expected, I didn't get much improvement (if
any).  

I tried to engage John Shakshober, who commented that this is an
interesting and valid test case. When trying to touch base with him last
week, he said he was way too swamped right now to give this much attention
and said to go ahead and send it up via normal channels.

Summary edited.

This event sent from IssueTracker by fleite  [SEG - Kernel]
 issue 167982

Comment 15 Issue Tracker 2008-06-17 14:50:55 UTC
File uploaded: iostat_ubuntu

This event sent from IssueTracker by fleite  [SEG - Kernel]
 issue 167982
it_file 125051

Comment 16 Issue Tracker 2008-06-17 14:50:56 UTC
File uploaded: vmstat_ubuntu

This event sent from IssueTracker by fleite  [SEG - Kernel]
 issue 167982
it_file 125052

Comment 17 Issue Tracker 2008-06-17 14:50:57 UTC
File uploaded: rhel_vmstat

This event sent from IssueTracker by fleite  [SEG - Kernel]
 issue 167982
it_file 125058

Comment 18 Issue Tracker 2008-06-17 14:50:58 UTC
Attached iostat and vmstat output run during the fileop tests.  I'll have
the rhel iostat output attached soon.


This event sent from IssueTracker by fleite  [SEG - Kernel]
 issue 167982

Comment 19 Issue Tracker 2008-06-17 14:50:59 UTC
File uploaded: iostat_rhel

This event sent from IssueTracker by fleite  [SEG - Kernel]
 issue 167982
it_file 125071

Comment 20 Issue Tracker 2008-06-17 14:51:00 UTC
Sending up.  Escalation template has been filled out here:

https://enterprise.redhat.com/issue-tracker/?module=issues&action=view&tid=167982&gid=785&view_type=lifoall#eid_1940672


Issue escalated to Support Engineering Group by: kbaxley.
Internal Status set to 'Waiting on SEG'

This event sent from IssueTracker by fleite  [SEG - Kernel]
 issue 167982

Comment 21 Issue Tracker 2008-06-17 14:51:01 UTC
Ubuntu is probably using relatime, which we don't have in RHEL 5, but do
have in F8 (enabled by default).  atime updates would definitely have an
impact here.  I bet RHEL does a lot better when the filesystem is mounted
with noatime.

csnook assigned to issue for SEG - Kernel.

This event sent from IssueTracker by fleite  [SEG - Kernel]
 issue 167982

Comment 22 Issue Tracker 2008-06-17 14:51:02 UTC
To elaborate, gutsy, being a relatime-aware distro, is probably mounting
with relatime enabled by default (as Fedora 8 does), while RHEL is not
using the feature, even when running under a kernel that supports it.

csnook assigned to issue for SEG - Kernel.

This event sent from IssueTracker by fleite  [SEG - Kernel]
 issue 167982

Comment 23 Issue Tracker 2008-06-17 14:51:04 UTC
I had thought that, myself whilst researching this last week.  I forgot to
mention this, so, thanks for the reminder :)  

However, looking at the default mounts for my Gutsy setup, it appears that
nothing is being mounted relatime by default on Ubuntu.  I had even tried
experimenting with noatime on RHEL5.2 and still couldn't crack the
300,000 stats per second barrier.   


This event sent from IssueTracker by fleite  [SEG - Kernel]
 issue 167982

Comment 24 Issue Tracker 2008-06-17 14:51:06 UTC
File uploaded: comparing-writes.png

This event sent from IssueTracker by fleite  [SEG - Kernel]
 issue 167982
it_file 132138

Comment 25 Issue Tracker 2008-06-17 14:51:07 UTC
Finally looking at this issue.

Looking at the vmstat outputs, we can see that although RHEL had done a
real lot more writes when the stats begin to be collected, Ubuntu has a
slightly steeper slope so it is doing even more writes than RHEL. The
crude graphic attached shows this.

Something that caught my attention is that on the iostat output Ubuntu
constantly shows a little %nice time. I wonder if perhaps Ubuntu, being a
desktopy distro, runs some system daemons niced. Can you check that out in
your test box?

Can you check if SELinux has an effect on this or not? I have XUbuntu 8.04
installed on my kid's laptop, and it says SELinux is disabled on boot.

We'll have to use oprofile to find out where the kernel is wasting time
when running fileop. Can you install the kernel debuginfo package and
oprofile in your test box, and collect oprofile data on the kernel?

Thanks!
Fabio Olive

Internal Status set to 'Waiting on Support'

This event sent from IssueTracker by fleite  [SEG - Kernel]
 issue 167982

Comment 26 Issue Tracker 2008-06-17 14:51:08 UTC
File uploaded: fileop

This event sent from IssueTracker by fleite  [SEG - Kernel]
 issue 167982
it_file 132584

Comment 27 Issue Tracker 2008-06-17 14:51:09 UTC
Hi, Fabio.

I set up an ran the fileop tests using the latest RHEL5.2 kernel
(2.6.18-92) and profiled it with oprofile.  I tried to use oparchive to
tar up the files, but, oparchive kept crapping out with "error:
basic_string::erase", so, I tarred up the /var/lib/oprofile directory and
attached it here.

If you need more info from my test machine, it's tuxracer.rdu.redhat.com
(root/redhat)

SELinux didn't seem to make a dent in the performance gap.  I got the
same stat numbers with or without SElinux in the mix.

I'll see if I can find anything out for you on the nice values.

Internal Status set to 'Waiting on SEG'

This event sent from IssueTracker by fleite  [SEG - Kernel]
 issue 167982

Comment 28 Issue Tracker 2008-06-17 14:51:09 UTC
Ok. Doing a quick look-through of the system daemons on the Ubuntu system,
none of them appear to be niced.


This event sent from IssueTracker by fleite  [SEG - Kernel]
 issue 167982

Comment 29 Issue Tracker 2008-06-17 14:51:10 UTC
Finally revisiting this issue. I'll try to coerce opreport into reading
the archived data. :)


This event sent from IssueTracker by fleite  [SEG - Kernel]
 issue 167982

Comment 30 Issue Tracker 2008-06-17 14:51:11 UTC
Good! I removed some crud from /var/lib/oprofile on tuxracer (directories
named *anon*), and now we can get some results with:

# opreport -l /usr/lib/debug/lib/modules/2.6.18-92.el5PAE/vmlinux -t 0.1

Unfortunately the functions listed are non-obvious, so I'll do some
head-scratching now.

Cheers,
Fabio Olive


This event sent from IssueTracker by fleite  [SEG - Kernel]
 issue 167982

Comment 31 Issue Tracker 2008-06-17 14:51:12 UTC
Even with setting divider=N, we still seem to be doing a lot of times or
timer related activities:

[root@tuxracer test]# opreport -l
/usr/lib/debug/lib/modules/2.6.18-92.el5PAE/vmlinux -t 0.1 | grep -i time
Counted GLOBAL_POWER_EVENTS events (time during which processor is not
stopped) with a unit mask of 0x01 (mandatory) count 100000
684099    6.3493  apic_timer_interrupt
601602    5.5837  timer_interrupt
420687    3.9045  hrtimer_run_queues
207220    1.9233  run_timer_softirq
95447     0.8859  smp_apic_timer_interrupt
73492     0.6821  account_system_time
70723     0.6564  do_timer
61411     0.5700  run_posix_cpu_timers
39373     0.3654  do_gettimeofday
38002     0.3527  update_process_times
32985     0.3061  current_kernel_time
12709     0.1180  getnstimeofday
12514     0.1161  __mod_timer
11819     0.1097  sys_gettimeofday

Looking at the code, I have found that in many places some activities
still happen at HZ frequencies, with forced loops such as:

for (i = 0; i < tick_divider; i++) {
  do_timer(regs);
  update_process_times(user_mode(regs), regs);
}

That happens at a few places. So while we may be interrupting less often,
we will work harder at each interrupt. Yeah, it also sounds stupid to me,
but it must have a reason to be like that. :)

Can you compile a kernel with HZ actually set to 250, just like Ubuntu,
and test that on tuxracer? I imagine you would only need to tweak the
config files bundled in the kernel source RPM.

Thanks!
Fabio Olive

Internal Status set to 'Waiting on Support'

This event sent from IssueTracker by fleite  [SEG - Kernel]
 issue 167982

Comment 32 Issue Tracker 2008-06-17 14:51:13 UTC
I'll try and knock that out this week...thanks for your help.


This event sent from IssueTracker by fleite  [SEG - Kernel]
 issue 167982

Comment 33 Issue Tracker 2008-06-17 14:51:14 UTC
Minor note: I noticed that on tuxracer only CPU 0 gets the timer
interrupts. The smp_affinity was set to 00000001, so I set it to 0000000f
but still only CPU 0 would get it. I think spreading the timer interrupt
more evenly would help with load distribution.


This event sent from IssueTracker by fleite  [SEG - Kernel]
 issue 167982

Comment 34 Issue Tracker 2008-06-17 14:51:15 UTC
Compiling with 250hz as the default still only got me to within 75% of what
Ubuntu's doing...I've got one more thing I wanna try, while I've got
some extra time today...


This event sent from IssueTracker by fleite  [SEG - Kernel]
 issue 167982

Comment 35 Issue Tracker 2008-06-17 14:51:16 UTC
Ok. Still can't seem to squeeze more stat performance out of this. If
there's anything else you want me to try, I'll give it a shot.

Thanks.

Internal Status set to 'Waiting on SEG'

This event sent from IssueTracker by fleite  [SEG - Kernel]
 issue 167982

Comment 36 Issue Tracker 2008-06-17 14:51:17 UTC
Hi there,

Every time I see this issue still on my queue I wonder if there is
anything else we can try, but I can't come up with anything.

Do you want me to escalate this to Engineering? I really don't know where
else to go.

Cheers,
Fabio Olive

Internal Status set to 'Waiting on Support'

This event sent from IssueTracker by fleite  [SEG - Kernel]
 issue 167982

Comment 37 Issue Tracker 2008-06-17 14:51:18 UTC
Hi, Fabio.

Escalation to engineering is fine. If nothing else, I'd really be curious
as to what is causing the differences in performance, here (if there is
anything).  I really appreciate the detailed work you did on this. 

Thanks, again.

Internal Status set to 'Waiting on SEG'

This event sent from IssueTracker by fleite  [SEG - Kernel]
 issue 167982

Comment 38 Issue Tracker 2008-06-17 14:51:19 UTC
Sending up to see if someone in Engineering has a better hunch on where to
look for this apparent performance issue.


Issue escalated to RHEL 5 Kernel by: fleite.
Internal Status set to 'Waiting on Engineering'

This event sent from IssueTracker by fleite  [SEG - Kernel]
 issue 167982

Comment 40 Eric Sandeen 2008-06-18 15:26:47 UTC
Would it be possible to attach the configs used for the 2.6.22.18 kernel tests
in comment #1 item 3) ?

Thanks,

-Eric

Comment 41 Kent Baxley 2008-06-18 18:50:32 UTC
Created attachment 309778 [details]
ubuntu config file 

Here's the config from ubunutu.  Let me know if you need anything else.

Comment 43 Eric Sandeen 2008-06-19 21:08:08 UTC
Looking at the ubuntu config:
--- ubuntu      2008-06-19 15:02:17.000000000 -0500
+++ rhel        2008-06-19 15:02:22.000000000 -0500
-CONFIG_SECURITY_SELINUX_BOOTPARAM_VALUE=0
+CONFIG_SECURITY_SELINUX_BOOTPARAM_VALUE=1

caught my eye.  I also tested xfs as well as ext3 to see if any of this is fs
specific.  This was on 2.6.22.18 with the "rhel" and "ubuntu" configs.

Here's the executive summary; numbers below are just the first stat nr that pops
out but it's representative:

	(default)					(default)
	RHEL		Ubuntu		RHEL		Ubuntu
	selinux		selinux		noselinux	noselinux
ext3	228632		301079		245539		353321
xfs	187838		262036		226405		306046

So at least on my rig, selinux *does* seem to account for some of the
difference.  But there's more ...  Also ext3 and xfs both show the differences
between configs, so I don't think its' ext3-specific.

-Eric

Comment 45 Josef Bacik 2008-06-20 14:43:02 UTC
yeah its looking more and more like its just a configuration difference, and not
an ext3 related problem, so we don't need the systemtap info.

Comment 46 Eric Sandeen 2008-06-20 15:08:20 UTC
I got all motivated last night and got creative; I diffed the 2 configs, split
the patch into 100 small patches, one per hunk, committed them in order into a
git tree and then used git bisect to try to find which hunk causes the biggest
regression.  The one I got to is unfortunately fairly large (the big, major
config pieces at the start of the config file) so I'll whittle it down more. 
The hunk does change the HZ setting, for one.

I'll keep you posted.

-Eric

Comment 47 Eric Sandeen 2008-06-20 21:00:08 UTC
So far the biggest single hitter I see is CONFIG_AUDITSYSCTL.  On my setup, with
selinux disabled, on the ubuntu config, my numbers drop from around 360,000 to
310,000 when CONFIG_AUDITSYSCTL is turned on.

Conversely, with the RHEL config and selinux  turned off, the numbers jump from
around 245,000 to 305,000 when CONFIG_AUDITSYSCTL is turned off.

selinux, CONFIG_NR_CPUS, and HZ seem to have some effect as well but
CONFIG_AUDITSYSCTL looks like one of the biggest ones at this point, and I think
it's worth looking at the overhead of this.

There's still another big hitter though, looking for it.

-Eric

Comment 48 Eric Sandeen 2008-06-21 01:53:31 UTC
On a stock rhel5 box you can try:

chkconfig --del auditd

and boot with "selinux=0 audit=0" on the boot commandline.

This made a fairly significant improvement on runs on my box (210,000 ->
290,000), though still not quite up to the other config's performance.  You can
make these changes w/o needing a rebuild though.  (Ubuntu has selinux off by
default, and CONFIG_AUDITSYSCALL off so this puts it more on par if you don't
need the security features)

I'm talking with the selinux/audit guys about the performance impact of these
options.

Comment 50 Eric Sandeen 2008-06-24 01:59:28 UTC
If they're willing, it would be interesting to see how much difference the
suggestions in comment #48 make on their rig with the rhel5 kernel.

-Eric

Comment 51 Rik van Riel 2008-06-28 03:59:22 UTC
Adding Steve Grubb to the CC list since syscall auditing seems to be the main
performance issue.

Comment 52 Eric Sandeen 2008-06-28 04:03:00 UTC
It's at least one.  It seems there's some other slowdown elsewhere, but haven't
isolated it yet.

Comment 53 Eric Sandeen 2008-06-28 04:51:23 UTC
One other thing they might check is whether different cpuspeed governors are in
effect on ubuntu vs. rhel.  On rhel, turning cpuspeed off jumps the stat nr. by
about 30,000 - 40,000 in my tests.

Comment 54 Eric Sandeen 2008-07-08 14:23:18 UTC
Has the customer been able to try any of the suggestions made so far?

Comment 55 Issue Tracker 2008-07-08 14:28:12 UTC
I haven't heard anything back from PNNL, yet.


This event sent from IssueTracker by kbaxley 
 issue 167982

Comment 57 RHEL Product and Program Management 2009-02-16 15:37:50 UTC
Updating PM score.

Comment 58 Josef Bacik 2009-02-24 17:15:54 UTC
closing since the it closed.


Note You need to log in before you can comment on or make changes to this bug.