* ========================================================================== * ========================================================================== * * $RCSfile: solaris_etc.system_tuning.examples,v $ * $Revision: 1.1 $ * $Date: 2001-10-24 10:53:20-07 $ * * Purpose: The following /etc/system examples were put together based * on multiple sources of information including application * requirements, performance tuning suggestions, as well as * the experiences of myself and others. * * Author: Rodney P. Rutherford, * * Born on Date: September 30, 1999 * * Modification History: * * 1.1 - Wed Oct 24 10:50:42 PDT 2001 * - corrected typo * * 1.0 - Thu Sep 30 15:51:47 PDT 1999 * - Original file created and tested * * WARNING!: These examples are presented in the hope that they will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. * It is the the user's responsibility to verify these settings * and tune according to their individual system requirments. * All modifications should be tested and confirmed on a * non-production system prior to installing on a production system. * * All of these examples are related to large system installations * (E5500, E6500, E10K, etc.), and are based on systems with 2GB or * more of memory. * * NOTE: All of these parameters are based on Solaris 2.6 values. * * References used include but are not limited to: * - Sun Education Class SA-400 "Solaris System Performance Management" * - "Delivering Performance on Sun: System Tuning" A Sun Technical * White Paper (Document ID 1442) by Greg Schmitz and Allan Esclamado * - "Sun Performance and Tuning: SPARC and Solaris" by Adrian Cockroft * published by SunSoft Press/PTR Prentice Hall, ISBN 0-13-149642-3 * - "Oracle 8i Installation Guide Release 8.1.5 for Sun SPARC Solaris" * * ========================================================================== * START OF EXAMPLE KERNEL TUNING PARAMETERS * ========================================================================== * * Existing Solaris 2.6 OS Install defaults for enterprise class systems * * tune_t_gpgslo = page stealing low water mark * tune_t_minarmem = The minimum available resident (not swappable) memory * needed to avoid deadlock, in pages * tune_t_minasmem = The minimum available swappable memory needed to avoid * deadlock, in pages * set tune_t_gpgslo=250 set tune_t_minarmem=100 set tune_t_minasmem=250 * * ========================================================================== * * Increase the default number of file descriptors (soft limit=64) to 256 * (hex 0x100 = 256) To view from csh the command is limit. * set rlim_fd_cur=0x100 * * ========================================================================== * * The following entries enable correctable memory error reporting. * Initially Solaris 2.6 had these variables enabled by default for * enterprise class systems. However, at some point Sun changed this * back to the 2.5.1 default of disabled (bug?). Also, at some point * in one of the kernel revisions (-08 thru -11 for sure), the * report_ce_log variable was removed. See Sun bugids 4045242 and * 4188211 for additional info. * * set report_ce_log=1 * set report_ce_console=1 * * ========================================================================== * * As per Veritas release notes for VxFs, 3.2.5 panics on Solaris 2.6 * high kernel stack consumptions. Other applications are also known * to have problems with this, so it is recommended to automatically * increase the stack size on enterprise class systems. * * The value is expressed in bytes and must be a multiple of PAGESIZE * bytes (Sun4c, Sun4u, Sun4u1 = 8K), (Sun4m, Sun4d = 4K). * * These settings increase the stack size to 16K (the default is 8K). * (16384 bytes is hexidecimal 4000) * set rpcmod:svc_run_stksize=0x4000 set lwp_default_stksize=0x4000 * * Here are the details on these parameters per Sun Infodoc 20151 * * lwp_default_stksize * This is the stacksize used for kernel threads that support * user processes. Typically we find that most kernel threads * in a system are of this nature (because every process has at least * one support kernel thread). Tuning this can assist with: * * - stack overflows occuring during system call processing (eg, * a pagefault that needs to be handled during a system call) * * - stack overflows that could occur while handling a pagefault * from a user process not running in kernel mode (these are rare) * * - stack overflows resulting from an interrupt borrowing the * stack of an LWP it pins * * Default value is 8K on 32-bit kernels and 16K on 64-bit kernels. * * rpcmod:svc_run_stksize * This is the stack size used for kernel threads that service * RPC calls to the kernel (NFS being the prime example). * These threads are often subject to multiple product code * layers and pagefaults are often handled during processing. * If a system is using VxFS, VxVM, DiskSuite etc (and especially * if there are also dedicated disk product drivers such as rm6, * fca, etc) then it may be worth increasing this value. * NFS servers using layered products would be well advised * to increase this value. * * Default value is 0 - which means we will use the value for * _defaultstksz unless we have tuned svc_run_stksize in /etc/system. * * In releases 2.5.1 and later (for 2.5.1 you'll need kernel update * 103640-28 or later) it is possible to tune svc_run stacksize. * * ========================================================================== * * The following parameter turns on priority paging. By setting * priority_paging, it is causing the kernel to tune the cachefree * parameter to double that of lotsfree. It then causes the page daemon * to start scanning for pages when free memory falls below cachefree. * However, it only frees file system pages first rather than executables * and shared libraries, unless the free memory continues to drop below * lotsfree. * set priority_paging=1 * * On a well tuned system, free memory should be around the value of * lotsfree (once the system and apps have been up long enough to * allocate available memory). With priority_paging enabled, it will * hover around the value of cachefree. * * If free memory drops below these it will start agressively scanning * for more pages and the scan rate will rise. If the scan rate stays * very high (100-200+) for any length of time, then the system may * need more physcical memory, or you may try tuning the memory parameters * even further. Free memory and the scan rate can be viewed with * vmstat (memory free and sr) or sar -r (freemem) and sar -g (pgscan/s). * sar reports free memory in pages, while vmstat reports it as Kbytes. * * The following memory parameters are all automatically scaled based * on maxusers (which is based on physical memory). The values specified * are in pages (pages * pagesize (8192) / 1024 = Kbytes / 1024 = MB). * The values listed here are the defaults for a system with 4GB of memory. * * physmem: 511929 Reports the physical amount of system memory * * cachefree: 7932 when priority paging is off, this is the same * as lotsfree and has no effect. * lotsfree: 7932 when free memory drops below this, the page * daemon starts scanning for memory to add to it * at the slowscan rate. * minfree: 1983 as free memory drops below lotsfree and nears * minfree, it starts scanning faster. If it reaches * it is then scanning at the fastscan rate. * desfree: 3966 this is the level at which swapping begins. * * It is not recommended to tune these parameters directly, but rather to * first enable priority paging. However, if after enabling priority * paging, you still wish to tune these parameters, you should always adjust * lotsfree, minfree, and desfree together. For example, if you are * doubling lotsfree, you should also double minfree and desfree. * In the example below we are doubling the values for lotsfree, minfree, * and desfree. With priority_paging enabled, cachefree will automatically * tune itself to double that of lotsfree. * * set tune:lotsfree=15864 * set tune:minfree=3966 * set tune:lotsfree=7932 * * ========================================================================== * * fastscan = The rate of pages scanned per second when free memory = minfree * Value is specified in pages. Default value is 1/2 of physical * memory up to a max of 8192 pages (64MB). * slowscan = The rate of pages scanned per second when free memory = lotsfree * Default value is 100 pages. * handspreadpages = The number of pages between the front hand clearing the * reference bit and the backhand checking the reference bit. * Value is specified in pages. Default is equal to fastscan. * * On large multiprocessor systems with Virtual Adrian installed, Virtual * Adrian will automatically adjust them with: * Increasing slowscan to 500 pages/s for fewer pager wakeups * Increasing fastscan rate on MP machine to speed up filesystem * (lifting the 64MB limit) * Increasing handspreadpages, but clamping to about 250MB * * The new values are: * slowscan: 500 * fastscan: physmem / 4 * handspreadpages:30960 (250MB) or = fastscan, whichever is smaller * set slowscan=500 set fastscan=127982 set handspreadpages=30960 * * ========================================================================== * * maxpgio = The maximum number of modified pages the page daemon can * take every second and write back to disk to free up memory. * It is divided by four by the system to obtain the limit for * each invocation of the page daemon. * * The default is either 40 or 60 depending on system type. On small * systems this is fine, but on large systems it is way too small. If * set too small the scan rate will be high, if set too high it can cause * I/O spikes. * * Properly tuned, the scan rate on a system should be near 0 without * excessively increasing I/O (pagein + pageout). To properly view * paging use the memstat utility, not vmstat. vmstat reports both * system and application I/O, while memstat properly reports just * system paging. memstat is freeware available from: * http://www.sun.com/sun-on-net/performance/memstat.Z * * Some minimum guidelines for initialy setting maxpgio are: * 60 * the number of swap drives on the system * 100 * the number of standard SCSI disk controllers * 200 * the number of FCAL controllers * * For example, the following would set it to 500 pages per second * set maxpgio=500 * * NOTE: on large systems with Virtual Adrian installed, Virtual Adrian * will automatically adjust it with: * Removing limit on paging rates by setting maxpgio = 25468 * set maxpgio=25468 * * ========================================================================== * * The following entries were recommended by Virtual Adrian to reduce the * amount of cpu time that fsflush was using: * * fsflush used 5.6% of a CPU (max 5.0%) * fsflush may be using too much CPU time, right now it runs every * 5 (tune_t_fsflushr) seconds and checks each page every 30 (autoup) seconds * * The following changes the default from 5/30 to 10/60. * set tune_t_fsflushr=10 set autoup=60 * * Note: some large system recommendations recommend setting autoup to a * very large number, ie set autoup=900; however, I would not recommend * this as it means that 15 minutes of data has the potential to be lost. * * ========================================================================== * * bufhwm = Sets the high water mark (max size) for the Buffer Cache * * Limits the buffer cache size in memory (the kmem_alloc_8192 bucket) * bufhwm is the UFS metadata cache which holds inodes, cylinder groups, * and indirect blocks (no file data blocks). * * This is recommended for systems with large amounts of memory as * the default of 0 allows up to 2% of main memory to be used. * Value is in Kbytes, recommended values are from 4-8MB. * (sysdef -i displays the value of bufhwm in bytes) * * The following sets the buffer cache size to 8MB * set bufhwm=8192 * * ========================================================================== * * Solaris by default has 48 5.x pseudo-ttys (pts) and 48 4.x pseudo-ttys * configured. It is recommended to increase these to 128 - 256 on large * systems. * * npty -> Number of 4.x pseudo_ttys * pt_cnt -> Number of 5.x pseudo_ttys * set npty=256 set pt_cnt=256 * * ========================================================================== * * These IPC, Shared memory, and Semaphore parameters are set for a very * large sysytem and Oracle Database. They were designed around the FQC * testing on the E10000, but equally apply to any large multiprocessor * system with large amounts of physical memory. * * NOTE: Most of these values are guidelines. If you are running * multiple applications that require these facilities, be * sure and allocate the values accordingly. * * Semaphore Facility * seminfo_semmap = Number of entries in the semaphore map * seminfo_semmni = Number of semaphore identifiers * seminfo_semmns = Number of semaphores in the system * seminfo_semmnu = Number of undo structures in the system * seminfo_semmsl = Maximum number of semaphores, per id * seminfo_semopm = Maximum number of operations, per semaphore call * seminfo_semume = Maximum number of undo entries, per process * seminfo_semvmx = Semaphore maximum value * seminfo_semaem = Maximum value for adjustment on exit * * Oracle Tuning NOTES: * (Minimum values based on Oracle requirements alone) * * seminfo_semmsl => the PROCESSES value from the largest init.ora + 10 * seminfo_semmns => sum of the processes from all instances, except the * largest one, plus 2 times the largest PROCESSES value * plus 10 times the number of Oracle databases. * set semsys:seminfo_semmap=128 set semsys:seminfo_semmni=1000 set semsys:seminfo_semmns=8192 set semsys:seminfo_semmnu=1024 set semsys:seminfo_semmsl=512 set semsys:seminfo_semopm=512 set semsys:seminfo_semume=1024 set semsys:seminfo_semvmx=32767 * * Shared Memory * shminfo_shmmax = Maximum SINGLE shared memory segment size * Default = 1048576 bytes (1MB) * shminfo_shmmin = Minimum SINGLE shared memory segment size * Default = 1 and should not be modified * shminfo_shmmni = Number of shared memory identifiers, in other words * it is the maximum shared memory segments PER SYSTEM. * Default = 100 * shminfo_shmseg = Maximum shared memory segments PER PROCESS * Default = 6 * * shminfo_shmmax should be set to no more than 75% of physical memory, or * 3.5GB whichever is smaller. Max is 4GB - 512MB for the kernel memory. * Even better is too keep it no more than 50% of physical memory. * If not, then an application using a very large shared memory could * cause excessive paging for other applications and kernel processes. * * Even better yet is to make sure that (shminfo_shmmax * shminfo_shmseg) * is less than (physmem * .75). * * NOTES for Oracle specific tuning (assuming no other applications * are running on the server). If other apps are on the server, the * values should be higher based on the requirements of all apps. * * Oracle startup parameters are specified per INSTANCE and are found * in the init.ora file. is the INSTANCE ID name. * The init.ora file is normally found in $ORACLE_BASE (where * oracle was installed) under oracle/admin//pfile/init.ora * * shminfo_shmmax = 1.5 * Largest Oracle System Global Area (SGA) size * If shminfo_shmmax * shminfo_shmseg is not at least as * large as the total SGA size the database instance will not * be able to start. If shminfo_shmmax is less than the * SGA size, then Oracle will do a shmget to grab as many * segments as are needed (up to the max number specified * by shminfo_shmseg and/or shminfo_shmmni) to create the SGA. * * However it is recommended to that the SGA be contained in * a single segment to prevent possible swapping of a portion * of the SGA, so shminfo_shmmax should be set to at least as * large as the largest instance's SGA size. By making * shmmax 50% larger than the SGA size, you allow the * Oracle DBA's to increase their SGA size during their * database tuning, without requiring constant updates to * this system file. * * Oracle computes the SGA size by adding the shared_pool_size * to the Database and Redo Buffers. Oracle computes all values * in bytes based on the db_block_size bytes, plus there is * some overhead added. However, you can easily estimate the * SGA size based on the settings in the init.ora file. * For Example: * * shared_pool_size = Shared Pool in bytes * db_block_buffers * db_block_size = Database Buffers in bytes * log_buffer = Redo Buffers in bytes * * Therefore, the required SGA size would be: * * 1.5 * (shared_pool_size + (db_block_buffers * db_block_size) + log_buffer) * * If you wish to know exactly how many bytes the SGA size * will be, set shmmax to physical memory and startup the * largest Oracle instance. The startup messages will report * the SGA parameters. For example: * * SVRMGR> ORACLE instance started. * Total System Global Area 95116816 bytes * Fixed Size 48656 bytes * Variable Size 12967936 bytes * Database Buffers 81920000 bytes * Redo Buffers 180224 bytes * * Then just adjust shmmax to be 1.5 * Total System Global Area * * shminfo_shmseg = recommnded max value is physmem * .75 / shminfo_shmmax * This will prevent a single instance from using more than * 75% of physical memory. For example, on a system with 4GB * of memory with a 300MB shminfo_shmmax, the value would be: * 4096 * .75 / 300 = 10 segments * * shminfo_shmmni = shminfo_shmseg * The normal setting for this is typically * shminfo_shmseg * number of instances * However, if you have multiple instances running on the * server and you have set shminfo_shmmax large enough to * allow each instance to allocate the SGA in a single * segment, then shminfo_shmmni should be set to equal * shminfo_shmseg so that you prevent all the instances * from using more than 75% of physical memory. * * If you are going to have a fluctating number of instances * (such as in a development environment), then you can make * this number large enough to support the maximum number of * anticipated instances. For example: * 20 instances * shmseg (10) = 200 * * However, you need to ensure that the SGA sizes of these * instances (and the corresponding shminfo_shmmax value) * does not allocate too much memory and rob the system * processes of available memory. In other words, you should * ensure that the total of ALL the instance's SGA sizes * does not exceed 75% of physical memory. * * Set maximum shared memory segment to 2.5GB * set shmsys:shminfo_shmmax=2621440000 * Set maximum shared memory segment to 1.5GB * set shmsys:shminfo_shmmax=1572864000 * Set maximum shared memory segment to 1.0GB * set shmsys:shminfo_shmmax=1024000000 * Set maximum shared memory segment to 500MB * set shmsys:shminfo_shmmax=524288000 * Set maximum shared memory segment to 300MB * set shmsys:shminfo_shmmax=314572800 * set shmsys:shminfo_shmmax=314572800 set shmsys:shminfo_shmmin=1 set shmsys:shminfo_shmmni=200 set shmsys:shminfo_shmseg=10 * * Message Queue * msginfo_msgmap = Number of entries in the message map * msginfo_msgmax = Maximum message size * msginfo_msgmnb = Maximum bytes on queue * msginfo_msgmni = Number of message queue identifiers * msginfo_msgssz = Segment size of a message * (should be a multiple of the word size) * msginfo_msgtql = Number of system message headers * msginfo_msgseg = Number of message segments (must be < 32768) * set msgsys:msginfo_msgmap=2048 set msgsys:msginfo_msgmax=32768 set msgsys:msginfo_msgmnb=262144 set msgsys:msginfo_msgmni=1024 set msgsys:msginfo_msgssz=512 set msgsys:msginfo_msgtql=256 set msgsys:msginfo_msgseg=4096 * * ========================================================================== * * The following forces the IPC modules to be loaded. This is not * necessary as they will automatically load when the application * needs them, however sysdef and other tools will report values * as 0 until it is loaded. This just causes it to load at boot time * so that values can be viewed prior to any application calling them. * forceload: sys/semsys forceload: sys/shmsys forceload: sys/msgsys * * ========================================================================== * * ncsize = Directory Name Lookup Cache (DNLC) size * * It is a cache of the most recently referenced directory entry * names and their associated vnodes. It caches directory entries * for both UFS and NFS. Maximum directory name size that can be * cached in Solaris 2.6 is 31 characters. (unlimited in Solaris7) * You should therefore keep heavily used directory paths to names * that are less than 31 chars (the name between each set of //). * * Value of ncsize is specified as number of entries. * The default is scaled based on maxusers as (4*(maxusers+max_nprocs))+320 * For systems with greater than 2GB of memory the default is 69992 entries. * * Average cache hit rates since bootup are viewed with "vmstat -s" * and are reported as * x number of total name lookups (cache hits %). * For example: 24093483 total name lookups (cache hits 89%) * * If the cache hits percentage is below 90%, the ncsize should be increased. * Typically you should just double the existing size (unless it is quite * large. You can get the current setting from adb. * * vmstat -s also shows an entry for "toolong". This is the number of * name lookups that were for directory names larger than 31 characters. * * The hitrate can also be determined with sar -a to see current values * and patterns over time. For example * 00:00:00 iget/s namei/s dirbk/s * 03:40:00 20 59 77 * * The forumula to get the % hitrate is 100 * (namei/s - iget/s) / namei/s * For example, 100 * ( 59 - 20 ) / 59 = 66.10% hit rate * An optional forumula is (1 - (iget/s / namei/s)) * 100 * * ufs_ninode sets the inode cache size. For every DNLC UFS entry (DNLC * also caches NFS entries), there will be a corresponding inode cache * entry. The value specified and the defaults are the same as for ncsize. * The value for ufs_ninode is actually a soft limit and the number of * inodes actually in memory can exceed it; however, the system prefers * to keep the size below ufs_ninode. * * When tuning the ncsize parameter, the ufs_ninode paramter should also * be adjusted to match. * * sar -g will report %ufs_ipf This shows the number of inodes flushed * from the cache which had reusable file pages still in memory. It should * always be 0. If it is not, then the inode cache is tool small and * the ufs_ninode value should be increased. If you do so, be sure to * also increase ncsize to match. * * sar -v will show the following: * inode_sz - Current number of inode cache entries (current/max) * The max value is equivelent to the setting for ufs_ninode. * If the current and max are equal or close, or current is larger * than the max (soft limit remember), then your inode cache may be * to small and the ufs_ninode valude should be increased. * * The following entries are for systems with large amounts of memory * (greater than 2GB), and it doubles the default setting for these. * set ncsize=139984 set ufs_ninode=139984 * * ========================================================================== * END OF EXAMPLE KERNEL TUNING PARAMETERS * ========================================================================== * ==========================================================================