Building XMLgawk (old)

May 2013:
Since October 2012 XMLgawk is not updated anymore; XMLgawk is succeeded by the gawk extension libraries.
Consequently I stopped the development of XMLgawk for Windows and continued with the gawk extension libraries for Windows.

Intro

Building XMLgawk for Windows needs to be done in a MinGW/Msys environment. For those who do not have such environment, we will show how to accomplish that. Besides the standard MinGW environment, building XMLgawk requires the Expat and iconv libraries. Further, gawk can use the functionality from the sigsegv library. We will show how to install these libraries below.

Gawk has built in functionality to load extensions dynamically. The XML extension is clearly meant to use this functionality, so our aim is to create a dynamic link library for the extension. It appears that the supplied source files for “pc” are not yet prepared for that. Most modifications proposed on this page are related to the connection between the gawk executable and the xml etension and to loading the extension dynamically. We will show a number of modifications to the gawk source, the source for the XML extension and to the build script of the extension. We will go through all required modifications in the sections Building gawk.exe and Building the xml extension.

All modifications to the original XMLgawk source files can be found in the Downloads section.

Preparing the environment

Standard MinGW install

We used the TDM-GCC build of MinGW to set up our MinGW environment, see the TDM site. We created the directory c:\Programs\MinGW and installed there. By installing the package we obtain:

TDM-GCC Current 4.5.0-tdm-1 (core and g++)
binutils-2.20.1-2-mingw32-bin
mingwrt-3.18-mingw32-dev
mingwrt-3.18-mingw32-dll
w32api-3.14-mingw32-dev
mingw32-make 3.81-20090914
gdb-7.1.2-mingw-bin

We also installed the following libraries:

libmpc-0.8.1-1-mingw32-dll-2.tar.lzma
libmpfr-2.4.1-1-mingw32-dll-1.tar.lzma
libgmp-5.0.1-1-mingw32-dll-10.tar.lzma
libpthread-2.8.0-3-mingw32-dll-2.tar.lzma (for OpenMP)

together with the associated dev packages. The archives should be unpacked to the MinGW directory. These libraries are needed according to gcc-4.5.0-1-mingw32.RELEASE_NOTES-1.txt. You can obtain the files from the MinGW archive at Sourceforge.

Expat and iconv libraries

XMLgawk requires the Expat and iconv libraries. We, therefore, need to install:

gettext-0.17-1-mingw32-bin.tar.lzma
gettext-0.17-1-mingw32-dev.tar.lzma
libgettextpo-0.17-1-mingw32-dll-0.tar.lzma
libintl-0.17-1-mingw32-dll-8.tar.lzma
libexpat-2.0.1-1-mingw32-dev.tar.gz
libexpat-2.0.1-1-mingw32-dll-1.tar.gz
libiconv-1.13-mingw32-dev.tar.gz
libiconv-1.13-mingw32-dll-2.tar.gz

These libraries are also available at Sourceforge.

Sigsegv library

Gawk can use the functionality of the sigsegv library. This functionality is to signal to the application (gawk in our case) when it makes an invalid memory reference. If the library is available, the functions of sigsegv are statically built into the gawk executable.

If the sigsegv fuctionality is wanted, building gawk needs the library files of sigsegv. Source files for the sigsegv library are available in the xgawk package, but I was not able to build the library files from that. I, therefore, searched a source archive, that included more recent files for sigsegv. Gawk-3.1.7 appeared to be useful.

We download the archive gawk-3.1.7.tar.gz which is available at GNU gawk. Unpack the archive to a suitable directory, for example c:\Programs\gawk-3.1.7. We start MSYS and cd to gawk-3.1.7\libsigsegv. We subsequencely give the commands:

./configure --host=i386-pc-mingw32
make
make check

The last command should give:

==================
All 5 tests passed
==================

A succesful build gives us the header file sigsegv.h and the library files:

libsigsegv.a, libsigsegv.la and libsigsegv.lai

The header file is created in gawk-3.1.7\libsigsegv\src; we copy it to MinGW\include. The library files are in gawk-3.1.7\libsigsegv\src\.libs; we copy them to MinGW\lib.

Building gawk.exe

The sourcefiles for XMLgawk can be obtained from the XMLgawk home page, current link “second release candidate”. We obtain the archive xgawk-3.1.6a-20090408.tar.gz. We unpack the xgawk archive to C:\Programs\xgawk-3.1.6.

Edit pc\Makefile

We change prefix = c:/gnu to prefix = c:/Programs/MinGW.

In accordance with the instructions in the file README-d\README.pcdynamic, particularly in the part after “—” we make the following changes to obtain dynamic linking:

DYN_FLAGS=-DDYNAMIC
DYN_EXP=gawk.exp
DYN_OBJ=dlfcn$O $(DYN_EXP)
#DYN_MAKEXP=$(DMEvcWin32)
DYN_MAKEXP=$(DMEmingw32)

To link with the sigsegv library and to reduce the size of the exe’s, we change the line with link commands and flags from
LNK=LMINGW32 PLNK=PLMINGW32 LF=”gdwarf-2 -g3″ LF2=-lmsvcp60 RSP= to

LNK=LMINGW32 PLNK=PLMINGW32 LF="-gdwarf-2 -g3 -s" LF2="-lmsvcp60
 -lsigsegv" RSP=

Finally, we removed “xml_puller$O” from the macros AWKOBJS1 and PAWKOBJS1. This, since we do not want the xml_puller extension to be statically linked into gawk.exe. The modified macro definitions become:

AWKOBJS1 = array$O builtin$O eval$O field$O floatcomp$O gawkmisc$O
 io$O main$O
PAWKOBJS1 = array$O builtin$O eval_p$O field$O floatcomp$O
 gawkmisc$O io$O main$O

The definitions are now equal to the ones in the original gawk Makefile (for pc).

Edit pc\gawkmisc.pc

We add the definition of the variable deflibpath to this file:

# ifdef DEFLIBPATH
char *deflibpath = DEFLIBPATH;
# else
char *deflibpath = ".;c:\\Windows;c:\\Windows\\System32;\
c:\\Programs\\MinGW\\bin;\
c:\\Programs\\MinGW\\lib;c:\\Programs\\MinGW\\lib\\awk" ;
# endif ;

We insert this code after the definition of DEFPATH. This is an omission in the code for pc: deflibpath is not defined without this addition. We add the definition to gawkmisc.pc, in agreement with the code in the posix variants: there the variable is defined in posix\gawkmisc.c.

Edit pc\config.h

Add the definition of SHLIBEXT at the end of config.h:

#define SHLIBEXT "dll"

Copy “pc files” and build

Copy the files (exept Changelog) from the pc sub directory to the xgawk directory. Run MSYS, go to the xgawk directory and run

make mingw32

This gives gawk.exe and pgawk.exe. The file Build log gawk shows the output generated by make.

Build the XML extension

As said in the intro, our aim is to build a dynamic link library. This is not straightforeward, but fortunately the file README-d\README.pcdynamic comes to the rescue. Particularly the part after “—“. The source files for the extension are in the sub directory extension. We cd to this sub dir.

Edit extension\Makefile

We need to create a specific makefile to build the XML extension. As a starting point I used the supplied makefile Makefile.pc and copied it to Makefile.
Since the xml extention uses the Expat and iconv libraries, we need to link to these. We therefore change the line
MWLDFLAGS=-s -Wl,–enable-stdcall-fixup -L.. -lgawk to

MWLDFLAGS=-s -Wl,--enable-stdcall-fixup -L.. -lgawk -lexpat -liconv

A number of other changes are needed to the makefile. The modified file is given here.

Run extension\xml-conv-enc

We start MSYS, cd to extension and run xml-conv-enc. This creates the files xml_enc_registry.inc and xml_enc_tables.inc. These are required by xml_enc_handler.c.
(xml_enc_registry.inc also provides the required array encs[].)

Edit extension\xml_interface.c

The header langinfo.h is used by xml_interface.c. As far as I know, MinGW does not have an implementation for langinfo.h. As a solution we comment out the line #include <langinfo.h> and change the line char *charset = nl_langinfo(CODESET); to

char *charset = "";

This seems quite harmless as the possibility is also metioned by the author of xml_interface.c.

First build of the xml extension

From the earlier quoted README-d\README.pcdynamic file, we can expect that building the extension will give some errors the first time. This since the extension needs a number of functions and variables from the gawk.exe that are not exported by gawk yet. Indeed running

make mingw32

gives a number of errors. The log of is given in Build log initial build xml 1Nov2009.

Edit extension\gawkw32.def and awk.h

The variables and functions that give errors need to be exported by gawk. This can be accomplished via the  gawkw32.def file, as explained in the PC Dynamics file.
We, therefore, edit gawkw32.def and add the variables and functions to be exported. The ones needed are obtained from the errors during a first compilation of xml.dll. We add

register_open_hook @121
lookup             @122
Nnull_string       @123
node               @124
install            @125
make_str_node      @126
do_traditional     @127
ERRNO_node         @128

at the end of the def file.

We edit awk.h accordingly: add ATTRIBUTE_EXPORTED at the beginning of the lines containing the declarations of the variables or functions that were added in gawkw32.def.

Rebuild gawk.exe and the xml extention

cd to the xgawk directory and rebuild gawk.exe.
cd to the extension sub directory and rebuild the xml extension. The build now succeeds and gives xml.dll !

First test

For a first test, we have all binaries: gawk.exe, xml.dll, libexpat-1.dll and libiconv-2.dll in the current directory. We use the simple script file example.awk:

@load xml
BEGIN { nodes = 0
}
XMLSTARTELEM { nodes ++ }
END { print "nodes = " nodes
}

that counts the number of nodes in an xml file. We run the script on the file books.xml from the sub directory data of the xgawk archive. Running

gawk -f example.awk books.xml

gives the correct result: nodes = 19.

Installation

Generally, we have the following options for the file locations:

  • gawk.exe, libexpat-1.dll and libiconv-2.dll need to be in the current directory or in a directory which is contained in your path (e.g. in c:\WINDOWS\system32 or in c:\Programs\MinGW\bin) and
  • xml.dll needs to be in the current directory or in a directory which is in AWKLIBPATH or in a directory defined in deflibpath. On my system it is in c:\Programs\MinGW\lib\awk.

You can set AWKLIBPATH yourself by issueing:

set AWKLIBPATH=[your_path]

deflibpath is defined in gawk.exe, with current definition:

c:\Windows;c:\Windows\System32;c:\Programs\MinGW\bin;\
c:\Programs\MinGW\lib;c:\Programs\MinGW\lib\awk

This is probably not very useful on other systems than mine, I am afraid.

Downloads

In the archive below you will find the modified source files for XMLgawk, containing all modifications as described on this page. The files are meant as replacements for the corresponding source files from xgawk-3.1.6a-20090408.tar.gz as found on the XMLgawk home page. Hence, the archive needs to be unpacked to the same directory where you unpacked the original XMLgawk archive, overwritting the corresponding original files.

Download archive: xgawk-3.1.6a-mod-20100714.zip