Checking the Performance of FindArray

FindArray Example

Let’s create a program that shows how a sample C++ compiler generates code for a function

named FindArray. Later, we will write an assembly language version of the function, attempting

to write more efficient code than the C++ compiler. The following FindArray function (in C++)

searches for a single value in an array of long integers:

bool FindArray( long searchVal, long array[], long count )

{

for(int i = 0; i < count; i++)

{

if( array[i] == searchVal )

return true;

}

return false;

}

Linking MASM to Visual C++

Let’s create a hand-optimized assembly language version of FindArray, named AsmFindArray.

A few basic principles are applied to the code optimization:

Move as much processing out of the loop as possible.

Move stack parameters and local variables to registers.

Take advantage of specialized string/array processing instructions (in this case, SCASD).

We will use Microsoft Visual C++ (Visual Studio) to compile the calling C++ program and

Microsoft MASM to assemble the called procedure. Visual C++ generates 32-bit applications that

run only in protected mode. We choose Win32 Console as the target application type for the examples

shown here, although there is no reason why the same procedures would not work in ordinary

MS-Windows applications. In Visual C++, functions return 8-bit values in AL, 16-bit values in AX,

32-bit values in EAX, and 64-bit values in EDX:EAX. Larger data structures (structure values,

arrays, etc.) are stored in a static data location, and a pointer to the data is returned in EAX.

Our assembly language code is slightly more readable than the code generated by the C++

compiler because we can use meaningful label names and define constants that simplify the use

of stack parameters. Here is the complete module listing:

TITLE AsmFindArray Procedure (AsmFindArray.asm)

.586

.model flat,C

AsmFindArray PROTO,

srchVal:DWORD, arrayPtr:PTR DWORD, count:DWORD

.code

;-----------------------------------------------

AsmFindArray PROC USES edi,

srchVal:DWORD, arrayPtr:PTR DWORD, count:DWORD

;

; Performs a linear search for a 32-bit integer

; in an array of integers. Returns a boolean

; value in AL indicating if the integer was found.

;-----------------------------------------------

true = 1

false = 0

mov eax,srchVal ; search value

mov ecx,count ; number of items

mov edi,arrayPtr ; pointer to array

repne scasd ; do the search

jz returnTrue ; ZF = 1 if found

returnFalse:

mov al,false

jmp short exit

returnTrue:

mov al, true

exit:

ret

AsmFindArray ENDP

END

Checking the Performance of FindArray

Test Program It is interesting to check the performance of any assembly language code

you write against similar code written in C++. To that end, the following C++ test program

inputs a search value and gets the system time before and after executing a loop that calls

FindArray one million times. The same test is performed on AsmFindArray. Here is a listing

of the findarr.h header file, with function prototypes for the assembly language procedure and

the C++ function:

// findarr.h

extern "C" {

bool AsmFindArray( long n, long array[], long count );

// Assembly language version

bool FindArray( long n, long array[], long count );

// C++ version

}

Main C++ Module Here is a listing of main.cpp, the startup program that calls FindArray and

AsmFindArray:

// main.cpp - Testing FindArray and AsmFindArray.

#include <iostream>

#include <time.h>

#include "findarr.h"

using namespace std;

int main()

{

// Fill an array with pseudorandom integers.

const unsigned ARRAY_SIZE = 10000;

const unsigned LOOP_SIZE = 1000000;

long array[ARRAY_SIZE];

for(unsigned i = 0; i < ARRAY_SIZE; i++)

array[i] = rand();

long searchVal;

time_t startTime, endTime;

cout << "Enter value to find: ";

cin >> searchVal;

cout << "Please wait. This will take between 10 and 30

seconds...
";

// Test the C++ function:

time( &startTime );

bool found = false;

for( int n = 0; n < LOOP_SIZE; n++)

found = FindArray( searchVal, array, ARRAY_SIZE );

time( &endTime );

cout << "Elapsed CPP time: " << long(endTime - startTime)

<< " seconds. Found = " << found << endl;

// Test the Assembly language procedure:

time( &startTime );

found = false;

for( int n = 0; n < LOOP_SIZE; n++)

found = AsmFindArray( searchVal, array, ARRAY_SIZE );

time( &endTime );

cout << "Elapsed ASM time: " << long(endTime - startTime)

<< " seconds. Found = " << found << endl;

return 0;

}

Assembly Code versus Nonoptimized C++ Code We compiled the C++ program to a

Release (non-debug) target with code optimization turned off. Here is the output, showing the

worst case (value not found):

Assembly Code versus Compiler Optimization Next, we set the compiler to optimize the

executable program for speed and ran the test program again. Here are the results, showing the

assembly code is noticeably faster than the compiler-optimized C++ code:

Pointers versus Subscripts

Programmers using older C compilers observed that processing arrays with pointers was more efficient

than using subscripts. For example, the following version of FindArray uses this approach:

bool FindArray( long searchVal, long array[], long count )

{

long * p = array;

for(int i = 0; i < count; i++, p++)

if( searchVal == *p )

return true;

return false;

}

Running this version of FindArray through the Visual C++ compiler produced virtually the

same assembly language code as the earlier version using subscripts. Because modern compilers

are good at code optimization, using a pointer variable is no more efficient than using a subscript.

Here is the loop from the FindArray target code that was produced by the C++ compiler:

$L176:
cmp esi, DWORD PTR [ecx]
je SHORT $L184
inc eax
add ecx, 4
cmp eax, edx
jl SHORT $L176

Your time would be well spent studying the output produced by a C++ compiler to learn about

optimization techniques, parameter passing, and object code implementation. In fact, many computer

science students take a compiler-writing course that includes such topics. It is also important to

realize that compilers take the general case because they usually have no specific knowledge about

individual applications or installed hardware. Some compilers provide specialized optimization for a

particular processor such as the Pentium, which can significantly improve the speed of compiled

programs. Hand-coded assembly language can take advantage of string primitive instructions, as

well as specialized hardware features of video cards, sound cards, and data acquisition boards.

原文地址:https://www.cnblogs.com/dreamafar/p/5995125.html