Your Ad Here
首页 | 编程语言 | 网站建设 | 游戏天堂 | 冲浪宝典 | 网络安全 | 操作系统 | 软件时空 | 硬件指南 | 病毒相关 | IT 认证
软讯网络 > 编程语言 > C/C++ > Performance analysis on Linux
【标  题】:Performance analysis on Linux
【关键字】:Performance,analysis,on,Linux
【来  源】:http://blog.chinaunix.net/article.php?articleId=32038&blogId=5487

Performance analysis on Linux

Your Ad Here This article will tell us how to do performance analysis on Linux.

    Performance analysis and bottleneck determination in Linux is not rocket science. It requires some basic knowledge of the hardware and kernel architecture and the use of some standard tools. Using a hands-on approach we’ll walk readers through the different subsystems and the key indicators, to understand which component constitutes the current bottleneck of a system.

    Sometimes, sysadmins know that a system needs to be upgraded, but they can not precise why, or what part needs to be upgraded. In many cases, some of the bottlenecks can be resolved by just tuning the system, without any hardware upgrade at all.

    First of all let’s review the basics:

    For performance analysis, there are three general type of critical resources: processing (execution), memory and I/O (transmission).

    Memory is comprised of real memory and virtual memory (real memory plus swapping space).

    I/O can be split among two big categories: disk I/O and network I/O.

    In a Unix system (as in almost any timeshare system) processes are in either one of two states: running or sleeping. Processes that are sleeping, are in that state either because they exhausted all their CPU time for the current round (i.e. the kernel moved them preemptively to sleeping mode until all other runnable processes complete their CPU timeslot as well) or because they are blocking (i.e. they are waiting for a resource that’s currently unavailable, like disk I/O, network I/O, terminal I/O, etc.). If the sleeping process has been moved away from main memory to swapping space (paged out), it could be in a third mode, which is just “ready to run but waiting to be paged in”.

    The kernel normally keeps a list of all running processes and marks them according to their state. If the process is tagged as blocked because it is waiting for some resource, then it won’t be runnable again until that resource becomes available. If the process is just sleeping because it used all it’s CPU quota, it will remain in that state until all other runnable processes have the opportunity of taking their CPU share as well; after that, the process becomes runnable again and it will be scheduled for execution based on its priority.

    Priorities define how much relative CPU time is assigned to a given process. Processes are assigned two values: a static nice value, which is a number between -20 and 19 (where lower means less nice that equals to more priority) that can be defined by the user in runtime (using the ‘nice’ and ‘renice’ commands), and a real time priority value between 0 and 99 (higher is more priority) which is a dynamic value that the scheduler assigns depending on many factors.

    Virtual memory is defined as the total amount of memory in the system, and it includes real memory (RAM) as well as swapping space. Real memory is split among four pools: free memory, cache (free memory that has been dynamically assigned to filesystem cache, and it can be deallocated to be reused as needed), I/O buffers (which include in-kernel network and I/O buffers) and used memory (memory allocated to processes).

    The Kernel Virtual Memory Manager can move processes out of real memory and into swap space if there is pressure for free memory in the system (and according to the current settings in /proc/sys/vm/). If a page residing in swap space needs to be executed, it is brought back into real memory. In the rare case of running out of usable virtual memory, the Out of Memory Killer (OOM Killer) is awaken and it tries to use heuristics to identify processes to be killed in order to reclaim some free real memory back to the system.

    Now let’s go to the tools that we will use to identify which component is the culprit for the low performance of our system.

    The best tool to analyze the state of the system as a whole is ‘vmstat’. Vmstat, when executed with a numerical parameter as an interval in seconds (as in ‘vmstat 5’) it will show the totals since uptime initially, and then it will refresh with the deltas for each interval. A typical output looks like this:


    The top headers define which data you’re looking at:

    * Procs: number of processes in runnable ( r ) state and in uninterruptible (b – blocking) sleep
    * Memory: amount of memory swapped (swpd), free, used for buffers (buff) and used for filesystem cache (cache)
    * Swap: amount of memory swapped in (si) and swapped out (so) per second
    * IO: blocks received from a block device (bi) and blocks sent to a block device (bo) per second
    * System: numer of interrupts per second (in) and number of context switches per second (cs)
    * CPU: percentages of total CPU time spent in user space (us), kernel space (sy), idle (id) and waiting for I/O (wa)

    Interpretation of these values is not too difficult. Let’s go through a few real world examples.

    If you have a system that consistently has high number of processes in runnable state ( r ), it usually means that you need to either add additional CPU’s or replace the current CPU’s for faster ones, as processes are starving for CPU. It’s important to not that the length in the queue of processes in runnable state keeps a proportion with the number of CPU’s in the system (i.e. 5 or 6 processes in runnable state in a quad CPU system may be considered similar to having 1 or 2 processes in runnable state in a single CPU system).

    If you’re looking for places to optimize things, look into the CPU utilization numbers as well, as they will tell you if the high utilization is in kernel or user space (sy and us numbers). Running “top” in such a system will also give you an idea of which processes the CPU spends more time on, thus pinpointing who is the primary candidate for optimization.

    In the special case where there are more than one CPU in the system, the CPU section of vmstat shows the averages, so it may not very accurate (i.e. when there is only one heavy process running which is single threaded). ‘mpstat -P ALL 5’ can be used to show the CPU statistics on an aggregated and on a per CPU basis.

    On a second case, if you have a system where the amount of swapped memory is high, the free and cached memory numbers are low and there is a consistent swap-in and swap-out activity, all that is indicative that the system is under memory pressure. Adding more memory to that system will give you the most bang for your buck.

    A third typical case would be a system where there are consistently many processes in blocking state (b), the CPU wait (wa) times are high and I/O numbers (bi/bo) are consistently high. Such a system would be showing that I/O is your bottleneck, so increasing the CPU or the memory wouldn’t probably help. There is a new utility coming in your rescue to help you determine which block device is the culprit for the high wait times (and a candidate to be moved to a faster device or even to a RAM based filesystem): iostat. Iostat, when invoked with the -x command line parameter, returns an extended format that includes several useful values (remember to include an interval in seconds to show deltas):

    iostat -x 5

    The key values are %util which is a percentage of CPU time used to issue requests to the device, await which is the average time that requests take to be fulfilled (including the time that they spend in the queue) and svctm (service time) which is the time that the device takes to service the requests. High awaits, service times and percentage of utilization indicate that the device is too slow for the current load.

    Network congestion is another potential bottleneck. After verifying that the speed and duplex settings are correct for the configuration of the network switch (the most common cause of network erratic behaviour), ‘netstat -ic’ can be used to monitor the traffic per second in each interface. Analog to vmstat and iostat, the first set of results show the aggregated total since the last system boot, and the subsequent ones are the deltas (per second):

    netstat -ic

    Network tuning goes far beyond the goals of this analysis. However, adequate buffers, the use of TCP selective acknowlege, TCP window sizes big enough to account for the network latencies, etc. can make a very important difference when it comes to fast links.

    This guide doesn’t intend to be all inclusive. The resolution of these tools is not fine grained enough when you start considering other areas of the hardware arhitecture, or when more specific questions are formulated: Where do I see if the front side bus is my bottleneck? How do I know if I need to upgrade my video card or my CPU first? Is the memory too slow for the CPU?

    However, I’m fairly sure that the information that you can gather with these few simple tools, will help you troubleshoot more than 90% of the performance problems that a sysadmin sees on a daily basis.

    From: geminis.dyndns.org/wordpress/index.php/2005/06/05/performance-analysis-on-linux
Lesson 3: Visual C++ IDE and Workspaces:【上一篇】
sql relay的c++接口:【下一篇】
【相关文章】
  • Lesson 3: Visual C++ IDE and Workspaces
  • Lesson 4: MFC Basics
  • Lesson 5: Dialog-Based Applications
  • Lesson 6: SDI and MDI Applications
  • e book c/c++/com(from http://www.linuxeden.com/)
  • Lesson 7: Data Viewer
  • 基于 linux 平台的 libpcap 源代码分析- -
  • linux系统ioctl使用示例
  • Linux C函数
  • linux用户&用户组
  • 【随机文章】
  • 图像文件格式之PNG(转载)
  • [转载]网马新世界 2006
  • "本廠不收河南人"5/11
  • 但是对action的使用总是不太顺手
  • Iptables笔记(转帖)
  • SEO--搜索引擎优化学习手记(一)搜索引擎分类
  • 创建任意长度重复字符串的简洁方法
  • Case Interview In Point—通往咨询的必经之路-4(zz)
  • CentOS 4 -ix86
  • Visual Assist X 模板功能(AutoText)的一些修改
  • 【相关评论】
    没有相关评论
    【发表评论】
    姓名:
    邮件:
    随机码*
    评论*
          
    |  首 页  |  版权声明  |  联系我们   |  网站地图  |
    CopyRight © 2004-2007 软讯网络 All Rigths Reserved.