Optimal Coding 2 – Borland vs. MS vs. GNU

Delphi (Borland Developer Studio 2006):

function AddInt64_1(A, B : Int64) : Int64;
begin
Result := A + B;
end;

The BDS2006 compiler generated this:
OptimizeMathFunctions.dpr.74: writeln(IntToHex(AddInt64_1($1111000022223333,$1234567890123456),16));
0040915F 6800001111 push $11110000
00409164 6833332222 push $22223333
00409169 6878563412 push $12345678
0040916E 6856341290 push $90123456
00409173 E844F8FFFF call AddInt64_1
...
OptimizeMathFunctions.dpr.9: begin
004089BC 55 push ebp
004089BD 8BEC mov ebp,esp
004089BF 83C4F8 add esp,-$08
OptimizeMathFunctions.dpr.10: Result := A + B;
004089C2 8B4510 mov eax,[ebp+$10]
004089C5 8B5514 mov edx,[ebp+$14]
004089C8 034508 add eax,[ebp+$08]
004089CB 13550C adc edx,[ebp+$0c]
004089CE 8945F8 mov [ebp-$08],eax
004089D1 8955FC mov [ebp-$04],edx
OptimizeMathFunctions.dpr.11: end;
004089D4 8B45F8 mov eax,[ebp-$08]
004089D7 8B55FC mov edx,[ebp-$04]
004089DA 59 pop ecx
004089DB 59 pop ecx
004089DC 5D pop ebp
004089DD C21000 ret $0010

What We wanted is this:

function AddInt64_A(A, B : Int64) : Int64;
asm
mov eax,[ebp+$10]
mov edx,[ebp+$14]
add eax,[ebp+$08]
adc edx,[ebp+$0c]
end;

Which looks like this in action:

OptimizeMathFunctions.dpr.82: writeln(IntToHex(AddInt64_A($1111000022223333,$1234567890123456),16));
004091A1 6800001111 push $11110000
004091A6 6833332222 push $22223333
004091AB 6878563412 push $12345678
004091B0 6856341290 push $90123456
004091B5 E826F8FFFF call AddInt64_A
...
OptimizeMathFunctions.dpr.14: asm
004089E0 55 push ebp
004089E1 8BEC mov ebp,esp
004089E3 8B4510 mov eax,[ebp+$10]
004089E6 8B5514 mov edx,[ebp+$14]
004089E9 034508 add eax,[ebp+$08]
004089EC 13550C adc edx,[ebp+$0c]
OptimizeMathFunctions.dpr.19: end;
004089EF 5D pop ebp
004089F0 C21000 ret $0010

Much better, (the delphi optimizer looks lazy).

Lets see the Visual Studio 2008 Pro C++:

typedef long long int64;
int64 AddInt64_1(int64 A, int64 B)
{
return (A+B);
}

printf("%I64xn",AddInt64_1(0x1111000022223333UL,0x1234567890123456UL));
004135AC push 12345678h
004135B1 push 90123456h
004135B6 push 11110000h
004135BB push 22223333h
004135C0 call AddInt64_1 (4111D1h)
...
004111D1 jmp AddInt64_1 (4113A0h)
...
004113A0 push ebp
004113A1 mov ebp,esp
004113A3 sub esp,0C0h
004113A9 push ebx
004113AA push esi
004113AB push edi
004113AC lea edi,[ebp-0C0h]
004113B2 mov ecx,30h
004113B7 mov eax,0CCCCCCCCh
004113BC rep stos dword ptr es:[edi]
return (A+B);
004113BE mov eax,dword ptr [A]
004113C1 add eax,dword ptr [B]
004113C4 mov edx,dword ptr [ebp+0Ch]
004113C7 adc edx,dword ptr [ebp+14h]
}
004113CA pop edi
004113CB pop esi
004113CC pop ebx
004113CD mov esp,ebp
004113CF pop ebp
004113D0 ret

I would have expected something more efficient. I thought MS VS2008 would produce better code than BDS2006 for sure. I was wrong.

What I wanted looks something like this:

int64 __declspec(naked) AddInt64_AN2(int64 A, int64 B)
{
__asm {
mov eax,dword ptr [esp+04h]//[A]
add eax,dword ptr [esp+0ch]
mov edx,dword ptr [esp+08h]
adc edx,dword ptr [esp+10h]
ret
}
}

Assembly is still king. And so is pascal/delphi with assembler rutines.

Next is gcc…

MinGW32 – gcc 3.4.5 (mingw-vista special r3)

int64 AddInt64_1(int64 A, int64 B)
{
return (A+B);
}
...
printf("%I64xn",AddInt64_1(0x1111000022223333LL,0x1234567890123456LL));

the result of gcc -S :

movl $-1877855146, 8(%esp)
movl $305419896, 12(%esp)
movl $572666675, (%esp)
movl $286326784, 4(%esp)
call __Z10AddInt64_1xx
...
__Z10AddInt64_1xx:
pushl %ebp
movl %esp, %ebp
movl 16(%ebp), %eax
movl 20(%ebp), %edx
addl 8(%ebp), %eax
adcl 12(%ebp), %edx
popl %ebp
ret

Thats what I call nice code (apart from the GAS syntax)

Optimalis kod iras

Regen volt mar amikor a HelpPC es hasonlo dos-os programokbol neztem ki az utasitas vegrehajtasi idoket az OP kod mereteket.
I love xor ax,ax !

Azota jo par ev eltelt, egyre tobb fajta processzor es egyre bonyolultabb pipeline es cache szervezes van a mai PC architekturakban.

Sajnos a Delphi fordito nem eppen produkal optimalis kodot alapbol. Ez nem jelenti azt, hogy delphiben akrar csak pascalban ne lehetne optimalizalt kodot irni. Igenis lehet. Mi sem jobb bizonyitek erre mint a fast code project.
FastMM, BASM
http://dennishomepage.gugs-cats.dk/CodingForSpeedInDelphi.doc
http://fastcode.sourceforge.net/
http://mikedimmick.blogspot.com/2008/03/what-heck-does-ret-mean.html

Nehany altalanos hasznos anyag:
http://www.agner.org/optimize/

AMD
AMD Athlonâ„¢ Processor x86 Code Optimization Guide
Software Optimization Guide for AMD64 Processors
AMD64 Architecture Programmer’s Manual Volume 1: Application Programming
AMD64 Architecture Programmer’s Manual Volume 2: System Programming
AMD64 Architecture Programmer’s Manual Volume 3: General-Purpose and System Instructions
AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions
AMD64 Architecture Programmer’s Manual Volume 5: 64-Bit Media and x87 Floating-Point Instructions
Performance Guidelines for AMD Athlonâ„¢ and AMD Opteronâ„¢ ccNUMA Multiprocessor Systems
Software Optimization Guide for AMD Family 10h Rev C Processors
http://www.amd.com/us-en/Processors/TechnicalResources/0,,30_182_739,00.html
http://www.amd.com/us-en/Processors/TechnicalResources/0,,30_182_739_3748,00.html
http://www.amd.com/us-en/Processors/TechnicalResources/0,,30_182_739_7044,00.html
http://www.amd.com/us-en/Processors/DevelopWithAMD/0,,30_2252_875_7044,00.html

Intel
IA-32 Intel® Architecture Optimization Reference Manual
IA-32 Intel® Architecture Software Developer’s Manual Volume 1: Basic Architecture
IA-32 Intel Architecture Software Developer’s Manual Volume 2: Instruction Set Reference

Assembly
Sok jo konyv es leiras van ebben a temaban az egyik ilyen az
The Art of Assembly Language Programming

MS Surface Computing

Mar majdnem egy eve lattam a youtube-on a Microsoft Surface Computing technologiajat, viszont ugy tunik meg sokan nem hallottak rola.

Azt hiszem ez volt az elso video amit lattam:
Microsoft Surface – Exclusive Review
de rengeteg van
http://www.youtube.com/results?search_query=microsoft+surface&search_type=
Microsoft Surface Computer
Microsoft Surface on Public at NY Sheraton
Microsoft Surface: Hands-on First Look