MASM

From OSDev Wiki
Jump to navigation Jump to search

The Microsoft Macro Assembler is an x86 architecture assembler for MS-DOS and Microsoft Windows. While the name MASM has earlier usage as the Unisys OS 1100 Meta-Assembler, it is commonly understood in more recent years to refer to the Microsoft Macro Assembler. It is an archetypal MACRO assembler for the x86 PC market that is owned and maintained by a major operating system vendor and since the introduction of MASM version 6.0 in 1991 has had a powerful preprocessor that supports pseudo high level emulation of variety of high level constructions including loop code, conditional testing and has a semi-automated system of procedure creation and management available if required. Version 6.11d was 32 bit object module capable using a specialised linker available in the WinNT 3.5 SDK but with the introduction of binary patches that upgraded version 6.11d, all later versions were 32 bit Portable Executable console mode application that produced both OMF and COFF object modules for 32 bit code.

NOTE: Using MASM for operating system development is not prohibited in the license agreement although you may sometimes hear that. This is because people often confuse the MASM and MASM32 licenses; they are 2 unrelated projects.

While recent versions of MASM only come with Visual Studio, its syntax is in widespread use in existing code and is also used as a guideline in the development of other assemblers, such as JWASM and Pelle's PoAsm assembler.

History

The Microsoft Assembler has been in production since 1981 and is upgraded by Microsoft on a needs basis to reflect technology changes in both operating systems and processor hardware capacity. The copyright string from the 1991 version of ML.EXE is as follows.

   Microsoft (R) Macro Assembler Version 6.00
   Copyright (C) Microsoft Corp 1981-1991. All rights reserved.

The last freestanding commercial version of the Mirosoft assembler was version 6.11d released in 1993 and retailed through the middle to late 1990s. With the release of 32 bit versions of Windows with both the OEM win95 and WinNT version 4.0, Microsoft developed ML.EXE mainly for internal use as an operating system vendor and it was mainly available only through MSDN subscription for the development of device drivers but Microsoft developed patches for the last commercial version of ML.EXE that upgraded it from a 16 bit MZ executable to a proper 32 bit portable executable file that ran natively on the 32 bit Windows platforms. With the release of the 6.14 patch, ML.EXE became a very reliable tool that supported Intel opcodes up to the early SSE instruction set.

In middle 2000 Microsoft re-integrated ML.EXE back into their VC98 commercial software development package with the processor pack as the downloadable file VCPP5.EXE which was licenced so that licenced end users of VC98 could redistribute the processor pack to other licenced end users of VC98 (VCPP5.EXE EULA) and all versions of Microsoft Visual C and Visual Studios have contained ML.EXE as a component since that time. The ML.EXE version supplied in the VCPP5 pack was ML.EXE version 6.15 which added support for the SSE2 Intel instruction set. Successive versions of ML.EXE have been developed on a needs basis to include later Intel opcodes. Later in Visual C++ 2005, a seperate 64-bit version of MASM appeared under the file name ml64.exe.

Although MASM is no longer a freestanding commercial product, it has since 2000 been a component of the Microsoft commercial development environment Visual Studio but Microsoft have also made it availabe in many different packages for device development and more recently in the free downloadable versions of Visual Studio.

Version 7.0 was included with Visual C++ .NET 2002. Version 7.1 was included with Visual C++ .NET 2003. Version 8.0 was included with Visual C++ 2005 which also includes a version that can assemble x64 code. Version 9.0 is included with Visual C++ 2008. Some of the newer versions of MASM are also included in various Microsoft SDKs and DDKs.

ML.EXE is typically an internal usage industrial tool maintained by a major operating system vendor to serve their own purpose without the need to make it a particularly "user friendly" application as the vast majority of its users are experienced programmers who have used it for many years. Microsoft have tended to use assembler code in the very low levels of their operating systems where even the best C compilers do not deliver sufficiently optimised code for the intended purpose. This is evident for programmers with enough technical experience who have some need to disassemble occasional OS components for research and compatibility purposes like NTOSKRNL.EXE and HAL.DLL and the tell tale indication of code written by hand in ML.EXE is the use of the trailing LEAVE mnemonic at the end of a procedure with a stack frame.

The two current ML versions as of January 2010 are as follows,

2009 ML version copyright string

   Microsoft (R) Macro Assembler Version 9.00.30729.01
   Copyright (C) Microsoft Corporation.  All rights reserved.

2009 ML64 version copyright string

   Microsoft (R) Macro Assembler (x64) Version 9.00.30729.207
   Copyright (C) Microsoft Corporation.  All rights reserved.

Usage

The Microsoft assembler has been the main vehicle for preserving the earlier Intel assembler notation and it can still be written as a fully specified language, a format that many dis-assemblers produce. The most common notation of this type are the data size specifiers,

   BYTE PTR  The data size specifier for the target being 8 bit.
   WORD PTR  The data size specifier for the target being 16 bit.
   DWORD PTR The data size specifier for the target being 32 bit.

Addressing Notation

ML.EXE maintains the historical distinction between transient stack addressing and fixed data addressing by using the notation OFFSET to denote data in either the initialised or uninitialised data sections. Transient stack addressing is handled by a number of methods. With a procedure that uses a stack frame, named LOCAL variables are used for readability purposes and where the address of the variable is required it can be accessed eith by the LEA mnemonic or in an INVOKE call by the ADDR operator. The LOCAL variables are [EBP] stack addresses when used within a procedure. Procedures written without a stack frame are generally written purely in mnemonics using direct [ESP] based argument addressing. To maintain compatibility with the historical MASM method of pseudo high level notation, there is a notation to turn off the stack frame generation for the procedure on a needs basis.

Square Brackets

ML.EXE does not require the addition of square brackets around a named variable but will tolerate such notation deviation by ignoring the notation. This notation difference has at time been a source of confusion for programmers familiar with other assemblers that use the square brackets to denote an address.

   mov eax, local_var     ; Standard ML.EXE notation
   mov eax, [local_var]   ; Square brackets are ignored by ML.EXE in this context.

There is no Intel mnemonic that will produce the extra level of indirection implied by placing un-necessary square brackets around a named variable and the practice leads to confusion of programmers who are experienced in using other x86 assemblers that use the square bracket notation differently. ML.EXE uses square brackets around direct mnemonic code to perform the dereferencing operation as in the following example.

   lea eax, variable_name   ; load the address of a stack variable into the EAX register
   mov eax, [eax]           ; dereference the CONTENT of the variable and copy it into the EAX register

ML.EXE will allow the following.

   mov eax, [eax+ebx]
   mov eax, [eax][ebx]

In this context the second pair of square brackets perform the ADDITION function within the complex addressing notation. It is a useful technique to use when writing procedures that do not have a stack frame as the second pair of brackets can be used to contain the stack displacement that changes with PUSH and POP mnemonics.

Notation Abreviation

Over a long period a form of shorthand notation developed as the parsers in early versions improved and generally if the assembler can determine the size of the data the data size specifier is not necessary although it still can be used. This shorthand has confused some users who have used other assemblers which are not by default data size specified tools. It can lead to problems when the user is not familiar with the default data size specifiers while using the shorthand notation.

   movzx eax, [esi]            ; generates an error - data SIZE cannot be determined by the assembler
   movzx eax, BYTE PTR [esi]   ; zero extend a BYTE into the 32 bit EAX register

Limited Type Checking

From at least version 6.0, ML.EXE has supported a pseudo high level notation for creating procedures that perform argument size and count checking. It is part of a system using the PROC ENDP PROTO and INVOKE operators. The PROTO operator is used to define a function prototype that has a matching PROC that is terminated with the ENDP operator. The prototyped procedure can then be called with the INVOKE operator which is protected by the limited size and argument count checking. There is additional notation at a more advanced level for turning off the automatically generated stack frame for the procedure where stack overhead in the procedure call may have an effect with very small procedures. ML.EXE is also capable of being written completely free of the pseudo high level notation using only bare Intel mnemonics.

Using an example prototype from the 32 bit Windows API function set,

   SendMessage PROTO STDCALL :DWORD,:DWORD,:DWORD,:DWORD
   SendMessage equ <SendMessageA>

The code to call this function using the INVOKE notation is as follows.

   invoke SendMessage,hWin,WM_COMMAND,wParam,lParam

Which is translated exactly to,

   push lParam
   push wParam
   push WM_COMMAND
   push hWin
   call SendMessage

The advantage of the INVOKE method is that it tests the size of the data types and the argument count and generates an assembly time error if the arguments do not match the prototype.

Note that ML64.EXE does not currently support the INVOKE notation and may not in the future. Based off Microsoft's history of updating earlier versions of ML.EXE on a needs basis for their own internal usage, this feature set may not be developed unless they have a need to add it for their own usage.

Calling Conventions

ML.EXE supports a number of different calling conventions on both the 16 bit real mode DOS operating system, the 16 bit Windows versions and the later 32 bit versions. ML.EXE supports the C, SYSCALL, STDCALL, BASIC, FORTRAN and PASCAL calling conventions.

Pseudo High Level Emulation

ML.EXE provides a notation to emulate a variety of high level control and loop structures.
It supports the .IF block structure,

 .if
   -
 .elseif
   -
 .else
   -
 .endif

It also supports the .WHILE loop structure,

 .while eax > 0
   sub eax, 1
 .endw

And the .REPEAT loop structure.

 .repeat
   sub eax, 1
 .until eax < 1

The high level emulation also supports C runtime comparison operators that work according to the same rules as Intel mnemonic comparisons. For the .IF block notation the distinction between SIGNED and UNSIGNED data is handles with a minor data type notation variation where the storage size DWORD which is by default UNSIGNED can also be specified as SDWORD for SIGNED comparison. This data type distinction is only appropriate for the pseudo high level notation as it is unused at the mnemonic level of code where the distinction is determined by the range of conditional evaluation techniques available in the Intel mnemonics.

The combined pseudo high level emulation allows MASM to more easily interface with the later current operating systems that use a C style application programming interface. Generally the pseudo high level interface is used for non-speed critical code where clarity and readability are the most important factors, speed critical code is usually written directly in mnemonics.

Note that ML64.EXE does not support all of the earlier pseudo high level notation and may not in the future. Based off Microsoft's history of updating earlier versions of ML.EXE on a needs basis for their own internal usage, this feature set may not be developed unless they have a need to add it for their own usage.

Pre-Processor

The Microsoft assembler has a very powerful pre-processor that has considerable more functionality than modern C compilers which is consistent with its designation as a macro assembler and it has been designed from the introduction of ML.EXE version 6.0 with C style pseudo high level functionality for programmers who prefer to use this style of notation for not speed critical code. On the down side the pre-processor is an old design that is known to be quirky in its operation and reasonably difficult to use without a lot of experience when writing macros that are of a more complex nature.

At their simplest macros written for the ML.EXE pre-processor are useful for automating many different simple tasks.

   ; ----------------------------
   ; memory to memory assignment
   ; ----------------------------
     m2m MACRO M1, M2
       push M2
       pop  M1
     ENDM
   ; --------------------------------------------------
   ; memory to memory assignment using the EAX register
   ; --------------------------------------------------
     mrm MACRO m1, m2
       mov eax, m2
       mov m1, eax
     ENDM

Using the EXITM <return item> notation a macro can return a value or register in a way that can used similar to a high level function call. Using a very simple example,

   addregs32 MACRO reg1, reg2
     add reg1, reg2
     EXITM <reg1>
   ENDM

In the .CODE section.

   mov ecx, 16
   mov edx, 8
   
   mov eax, addregs32(ecx, edx)

Which disassembles exactly to the following mnemonics.

   0040102B B910000000             mov     ecx,10h
   00401030 BA08000000             mov     edx,8
   00401035 03CA                   add     ecx,edx
   00401037 8BC1                   mov     eax,ecx

At a slightly more complex level the pre-processor can be used to emulate higher level languages which allows non-critical code to be simplified for higher programming throughput.

   fn MessageBox,0,str$(eax),"Title",MB_OK

In this working example fn is a macro that encapsulates the INVOKE notation and adds functionality so that quoted text can be inserted directly into the API function call in much the same way as a high level language. The str$() macro is an emulation of traditional basic for converting numeric data as either a memory operand or a register into string data for display. The example is taken from the MASM32 Project main macro file.

Object Module Compatibility

The 32 bit versions of ML.EXE introduced with the patches for the last commercial version onwards produce object modules in both the older OMF format and the Microsoft version of the Portable Executable specification COFF format. The object module format is compatible with modern Microsoft C compilers and object modules produced by either ML.EXE or CL.EXE can be routinely intermixed and linked into applications written either with ML.EXE or CL.EXE.

COFF Object Modules Reference

Compatible Linkers

For building 16 bit MS-DOS applications a Microsoft OMF linker is required as the 32 bit linkers build 32 bit PE code. The original linkers provided with commercial versions of MASM work correctly and for later versions, Microsoft have had various links over time for their last OMF 16 bit linker that have been available on their website.

Microsoft Reference For ML.EXE

Publication

  • Mirosoft Macro Assembler Programmerss Guide (c) 1991 The Microsoft Corporation. (Supplied with MASM version 6.0)
  • Programmer's Guide (c) 1992 The Microsoft Corporation. (Supplied with MASM version 6.1)

Online

Microsoft Licence

MASM has been available over a very long period and has been subject to a number of licencing methods over that time. The different licences break down into two different types, the commercial versions of MASM as either a seperate product or a later component of the Visual C development environment may be used for any purpose including developing non-Microsoft operating system components but almost exclusively the version made available from Microsoft at no cost are restricted to use on Microsoft operating systems and specifically exclude the production of Open Source code. With the wide range of different licences over the products lifetime the individual licence should be read in detail to determine what it can be used for and what if any restrictions apply in its usage.

MASM Compatible Assemblers

For assembler programmers who are unable to use the Microsoft assembler for licencing reasons there are two directly MASM compatible assemblers that can build almost all MASM code apart from the more complex macros developed under MASM.

  • Pelle's Macro Assembler is a component of the Pelles C development environment and comes complete with detailed documentation supplied with Pelles C. For the assembler language programmer, Pelles C comes with a compatible linker and resource compiler as well as a mature C compiler and an IDE that also contains its own resource editor. It is both mnemonic and operator compatible with MASM but diverges in its macro capacity. It is both 32 bit and 64 bit capable.
  • http://www.smorgasbordet.com/pellesc/download.htm
  • JWASM Macro Assembler is a MASM clone that has a very high level of compatibility with the original Microsoft Macro Assembler. It is currently supplied as both source code and a working binary and it comes close to assembling almost all MASM code including the vast majority of MASM macros. It will build MASM code for 16, 32 and 64 bit platforms. It is burdened by having no supporting documentation and relies wholly on the availability of documentation from Microsoft for MASM. It is available under the Sybase Open Watcom EULA.
  • http://www.japheth.de/JWasm.html