Ab Initio Vector Assignment

* Corresponding authors

a Department of Chemistry and Biology “A. Zambelli”, University of Salerno, Via G. Paolo II, 132, Fisciano 84084, Italy
E-mail:gmonaco@unisa.it
Fax: +39 (0)89 969570
Tel: +39 (0)89 969603

b Department of Chemistry, University of Antwerp, Antwerp, Belgium

c Department of Inorganic and Physical Chemistry, Ghent University, Belgium

From AbInitio

Jump to: navigation, search

[edit]

Introduction

Scientific software for performing large computations is typically managed using textual control files that specify the parameters of the computation. Historically, these control files have typically consisted of long, inflexible collections of numbers whose meaning and format is hard-coded into the program. With libctl, we make it easy for programmers to support a greatly superior control file structure, and with less effort than was required for traditional input formats.

The "ctl" in "libctl" stands for Control Language (by convention, libctl control files end with ".ctl" and are referred to as ctl files). Thus, libctl is the Control Language Library (where the "lib" prefix follows the Unix idiom).

[edit]

Design Principles

The libctl design has the following goals:

  • Input readability The control file should be self-annotating and human-readable (as opposed to an inscrutable sequence of numbers). Of course, it should allow comments.
  • Input flexibility: The control file should not be sensitive to the ordering or spacing of the inputs.
  • Input intelligence The user should never have to enter any information that the program could reasonably infer. For example, reasonable defaults should be available wherever possible for unspecified parameters.
  • Program flexibility: It should be easy to add new parameters and features to the control file without breaking older control files or increasing complexity.
  • Scriptability Simple things should be simple, but complex things should be possible. The control file should be more than just a file format. It must be a programming language, able to script the computation and add new functionality without modifying the simulation source code.
  • Programmer convenience: All of this power should not come at the expense of the programmer. Rather, it should be easier to program than ever before—the programmer need only specify the interaction with the control file in an abstract form, and everything else should be taken care of automatically.

All of these goals are achieved by libctl with the help of Guile, the GNU scripting and extensibility language. Guile does all of the hard work for us, and allows us to embed a complete interpreter in a program with minimal effort.

Despite its power, libctl is designed to be easy to use. A basic user only sees a convenient file format...with a programming language to back it up if her needs become more complex. For the programmer, all headaches associated with reading input files are lifted—once an abstract specification is supplied, all interaction with the user is handled automatically.

In the subsequent sections of this manual, we will discuss in more detail the interaction of the user and the programmer with libctl.

[edit]

Basic User Experience

At their most basic level, ctl files are simply a collection of values for parameters required by the simulation.

The ctl syntax for all programs using libctl is similar, although the specific parameters needed will vary. The following examples are given for a fictitious libctl-using program, in order to illustrate its general style.

[edit]

A fictitious example

For example, suppose that the simulation solves a one-dimensional differential equation and requires an input called "grid-size" specifying the number of grid points used in the discretization of the problem. We might specify this in a ctl file by the statement:

(set! grid-size 128)

All input variable settings can follow the format . The parentheses are important, but white space is ignored. Alternatively, we can use:

(set-param! grid-size 128)

which works exactly like except that now can be overridden from the command-line. For this reason, (a libctl extension to Scheme) is usually preferred. (See also Command-line parameters.)

Settings of input variables can appear in any order at all in the file. They can even be omitted completely in many cases, and a reasonable default will be used. Variables can be of many different types, including integers, real numbers, boolean values ( and ), strings, 3-vectors, and lists. Here is how we might set some parameters of various types:

(set-param! time-step-dt 0.01)  ; a real number (set-param! output-file-name "data.hdf")  ; a string (set-param! propagation-direction (vector3 0 0.2 7))  ; a 3-vector (set! output-on-time-steps  ; a list of integers... (list 25 1000 257 128 4096))

Everything appearing on a line after a semicolon (";") is a comment and is ignored. Note also that we are free to split inputs over several lines--as we mentioned earlier, white space is ignored.

3-vectors are constructed using . If the y or z components are omitted, they are set to zero. Lists may contain any number of items (including zero items), and are constructed with .

A typical control file is terminated with a single statement, something like:

(run)  ; run the computation

This tells the program to run its computation with whatever parameter values have been specified up to the point of the . This command can actually appear multiple times in the ctl file, causing multiple runs, or not at all, which drops the user into an interactive mode that we will discuss later.

[edit]

Running a simulation

The user runs the simulation program simply by:

programctl-files

Here, is the name of the simulation program executable and ctl-files are any ctl files that you want to use for the run. The result is as if all the ctl-files were concatenated, in sequence, into a single file.

[edit]

Structured data types

For many programs, it is useful to structure the input into more complicated data types than simple numbers, vectors, and lists. For example, an electromagnetic simulation might take as input a list of geometric objects specifying the dielectric structure. Each object might have several parameters--for example, a sphere might have a radius, a center, and a dielectric constant.

libctl allows programs to specify structured datatypes, called classes, that have various properties which may be set. Here is what a list of geometric objects for a dielectric structure might look like:

(set! geometry (list (make sphere (epsilon 2.8) (center 0 0 1) (radius 0.3)) (make block (epsilon 1.7) (center 0 0 1) (size 1 3.5 2))))

In this case, the list consists of two objects of classes called and . The general format for constructing an object (instance of a class) is . Properties is a sequence of items setting the properties of the object.

Properties may have default values that they assume if nothing is specified. For example, the class might have properties , , and that specify the directions of the block edges, but which default to the coordinate axes if they are not specified. Typically, each class will have some properties that have defaults, and some that you are required to specify.

Property values can be any of the primitive types mentioned earlier, but they can also be other objects. For example, instead of specifying a dielectric constant, you might instead supply an object describing the material type:

(define Si (make material-type (epsilon 11.56))) (define SiO2 (make material-type (epsilon 2.1))) (set! geometry (list (make sphere (material Si) (center 0 0 1) (radius 0.3)) (make block (material SiO2) (center 0 0 1) (size 1 3.5 2))))

We have snuck in another feature here: is a way of defining new variables for our own use in the control file. (This and other features of the Scheme language are discussed in the next section.)

[edit]

What do I enter?

Every program will have a different set of variables that it expects you to set, and a different set of classes with different properties. Whatever program you are using should come with documentation saying what it expects.

You can also get the program to print out help by inserting the command in your ctl file, or by entering it in [advanced-user.html#interactive interactive mode]. You can also simply enter the following command in your shell:

echo "(help)" | program

For example, the output of in the electromagnetic simulation we have been using in our examples might look like:

Class block: Class geometric-object: material-type material vector3 center vector3 e1 = #(1 0 0) vector3 e2 = #(0 1 0) vector3 e3 = #(0 0 1) vector3 size Class sphere: Class geometric-object: material-type material vector3 center number radius Class geometric-object: material-type material vector3 center Class material-type: number epsilon number conductivity = 0.0 Input variables: vector3 list k-points = () geometric-object list geometry = () integer dimensions = 3 Output variables: number list gaps = () number mean-dielectric = 0.0

As can be seen from above, the help output lists all of the classes and their properties, along with the input and output variables (the latter will be described later). Any default values for properties are also given. Along with each variable or property is given its type.

You should also notice that the class is listed as a part of the classes and . These two classes are subclasses of . A subclass inherits the property list of its superclass and can be used any place its superclass is allowed. So, for example, both spheres and blocks can be used in the list, which is formally a list of geometric-objects. (The astute reader will notice the object-oriented-programming origins of our class concept; our classes, however, differ from OOP in that they have no methods.)

[edit]

Advanced User Experience

Many more things can be accomplished in a control file besides simply specifying the parameters of a computation, and even that can be done in a more sophisticated way than we have already described. The key to this functionality is the fact that the ctl file is actually written in a full programming language, called Scheme. This language is interpreted and executed at run-time using an interpreter named Guile. The fact that it is a full programming language means that you can do practically anything--the only limitations are in the degree of interaction supported by the simulation program.

In a later section, we provide links to more information on Scheme and Guile.

[edit]

Interactive mode

The easiest way to learn Scheme is to experiment. Guile supports an interactive mode where you can type in commands and have them executed immediately. To get into this mode, you can just type at the command-line.

If you run your libctl program without passing any arguments, or pass a ctl file that never invokes , this will also drop you into a Guile interactive mode. What's more, all the special features supported by libctl and your program are available from this interactive mode. So, you can set parameters of your program, invoke it with , get help with , and do anything else you might otherwise do in a ctl file. It is possible that your program supports other calls than just , in which case you could control it on an even more detailed level.

There is a boolean variable called that controls whether interactive mode will be entered. This variable is initially, but is typically set to by . You can force interactive mode to be entered or not by -ing this variable to or , respectively.

[edit]

Command-line parameters

It is often useful to be able to set parameters of your ctl file from the command-line when you run the program. For example, you might want to vary the radius of some object with each run. To do this, you would define a parameter in your ctl file:

(define-param R 0.2)

You would then use instead of a numeric value whenever you wanted this radius. If nothing is specified on the command-line, will take on a default value of . However, you can change the value of on a particular run by specifying on the command-line. For instance, to set to , you would use:

program R=0.3 ctl-file

You can have as many command-line parameters as you want. In fact, all of the predefined input variables for a program are defined via already, so you can set them via the command line too.

To change the parameter once it is defined, but to still allow it to be overridden from the command line, you can use

(set-param! R 0.5)

where the above command line would change the value of to 0.3. If you want to change the parameter to a new value regardless of what appears on the command line, you can just use :

(set! R 1.3)

Note that the predefined input variables for a typical libctl-using program are all created via , so they can be overridden using .

[edit]

Programmatic parameter control

A simple use of the programmatic features of Scheme is to give you more power in assigning the variables in the control file. You can use arithmetic expressions, loops and functions, or define your own variables and functions.

For example, consider the following case where we set the of a band-structure computation (such as MPB). We define the corners of the Brillouin zone, and then call a libctl-provided function, , to linearly interpolate between them.

(define Gamma-point (vector3 0 0)) (define X-point (vector3 0.5 0)) (define M-point (vector3 0.5 0.5)) (set! k-points (list Gamma-point X-point M-point Gamma-point)) (set! k-points (interpolate 4 k-points))

The resulting list has 4 points interpolated between each pair of corners:

The function is provided as a convenience by libctl, but you could have written it yourself if it weren't. With past programs, it has often been necessary to write a program to generate control files--now, the program can be in the control file itself.

[edit]

Interacting with the simulation

So far, the communication with the simulation program has been one-way, with us passing information to the simulation. It is possible, however, to get information back. The command lists not only input variables, but also output variables--these variables are set by the simulation and are available for the ctl program to examine after returns.

For example, a band-structure computation might return a list of the band-gaps. Using this, the ctl file could vary, say, the radius of a sphere and loop until a band-gap is maximized.

[edit]

Developer Experience

If you are thinking of using libctl in a program that you are writing, you might be rolling your eyes at this point, thinking of all the work that it will be. A full programming language? Complicated data structures? Information passing back and forth? Surely, it will be a headache to support all of these things.

In fact, however, using libctl is much easier than writing your program for a traditional, fixed-format input file. You simply describe in an abstract specifications file the variables and data types that your program expects to exchange with the ctl file, and the functions by which it is called. From these specifications, code is automatically generated to export and import the information to and from Guile.

The specifications file is written in Scheme, and consists of definitions for the classes and input/output variables the program expects. It may also contain any predefined functions or variables that might be useful in ctl files for the program, and says which functions in your program are callable from the ctl script.

[edit]

Defining input variables

To define an input variable (a variable specified by the ctl file and input into the program), use the following construction:

(define-input-var name value type [ constraints ... ])

Here, is the name of the variable, and is its initial value--so far, this is just like a normal statement. However, input variables have constraints on them, the simplest of which is that they have a specific type. The parameter can be one of:

  • - a real number
  • - a complex number
  • - an integer
  • - a real 3-vector
  • - a real 3x3 matrix
  • - a complex 3-vector
  • - a complex 3x3 matrix
  • - a boolean value, or
  • - a string
  • - a function (in C, a Guile SCM function pointer)
  • - an member of
  • - a list of elements of type
  • - a generic Scheme object

Note that the quote before a type name is Scheme's way of constructing a symbol, which is somewhat similar to a C enumerated constant.

The final argument is an optional sequence of constraints. Each constraint is a function that, given a value, returns or depending on whether that value is valid. For example, if an input variable is required to be positive, one of the constraints would be the function (predefined by Guile). More complicated functions can, of course, be constructed.

Here are a few examples:

(define-input-var dimensions 3 'integer positive?) (define-input-var default-epsilon 1.0 'number positive?) (define-input-var geometry '() (make-list-type 'geometric-object)) (define-input-var k-points '() (make-list-type 'vector3))

Notice that all input variables have initial values, meaning that a user need not specify a value in the ctl file if the default value is acceptable. If you want to force the user to explicitly give a value to a variable, set the initial value to . (This way, if the variable is not set by the user, it will fail the type-constraint and an error will be flagged.) Such behavior is deprecated, however.

[edit]

Defining output variables

Output variables, which are passed from the simulation to the ctl script, are defined in a manner similar to input variables:

(define-output-var name type)

Notice that output variables have no initial value and no constraints. Your C program is responsible for assigning the output variables when it is called (as is discussed below).

A variable can be both an input variable and an output variable at the same time. Such input-output variables are defined with the same parameters as an input variable:

(define-input-output-var name value type [constraints])

[edit]

Defining classes

To define a class, one has to supply the parent class and the properties:

(define-class name parent [ properties... ])

is the name of the new class and is the name of the parent class, or if there is none.

The of the class are zero or more of the following definitions, which give the name, type, default value, and (optional) constraints for a property:

(define-property name default-value type [ constraints... ])

is the name of the property. It is okay for different classes to have properties with the same name (for example, both a sphere and a cylinder class might have properties)--however, it is important that properties with the same name have the same type. The and optional are the same as for , described earlier.

If is , then the property has no default value and users are required to specify it. To give a property a default value, should simply be that default value.

For example, this is how we might define classes for materials and dielectric objects in an electromagnetic simulation:

(define-class material-type no-parent (define-property epsilon no-default 'number positive?) (define-property conductivity 0.0 'number)) (define-class geometric-object no-parent (define-property material no-default 'material-type) (define-property center no-default 'vector3)) (define-class cylinder geometric-object (define-property axis (vector3 0 0 1) 'vector3) (define-property radius no-default 'number positive?) (define-property height no-default 'number positive?)) (define-class sphere geometric-object (define-property radius no-default 'number positive?))

[edit]

Derived properties

Sometimes, it is convenient to store other properties with an object that are not input by the user, but which instead are computed based on the other user inputs. A mechanism is provided for this called "derived" properties, which are created by:

(define-derived-property name type derive-func)

Here, is a function that takes an object of the class the property is in, and returns the value of the property. (See below for an example.) derive-func is called after all of the non-derived properties of the object have been assigned their values.

[edit]

Post-processed properties

It is often useful to store a function of the user input into a property, instead of just storing the input itself. (For example, you might want to scale an input vector so that it is stored as a unit vector.) The syntax for defining such a property is the same as except that it has one extra argument:

(define-post-processed-property name default-value type process-func [ constraints... ])

is a function that takes one argument and returns a value, both of the same type as the property. Any user-specified value for the property is passed to , and the result is assigned to the property.

Here is an example that defines a new type of geometric object, a . Blocks have a property that specifies their dimensions along three unit vectors, which are post-processed properties (with default values of the coordinate axes). When computing whether a point falls within a block, it is necessary to know the projection matrix, which is the inverse of the matrix whose columns are the basis vectors. We make this projection matrix a derived property, computed via the libctl-provided matrix routines, freeing us from the necessity of constantly recomputing it.

(define-class block geometric-object (define-property size no-default 'vector3)  ; the basis vectors, which are forced to be unit-vectors  ; by the unit-vector3 post-processing function: (define-post-processed-property e1 (vector3 1 0 0) 'vector3 unit-vector3) (define-post-processed-property e2 (vector3 0 1 0) 'vector3 unit-vector3) (define-post-processed-property e3 (vector3 0 0 1) 'vector3 unit-vector3)  ; the projection matrix, which is computed from the basis vectors (define-derived-property projection-matrix 'matrix3x3 (lambda (object) (matrix3x3-inverse (matrix3x3 (object-property-value object 'e1) (object-property-value object 'e2) (object-property-value object 'e3))))))

[edit]

Exporting your subroutines

In order for the ctl script to do anything, one of your C routines will eventually have to be called.

To export a C routine, you write the C routine as you would normally, using the data types defined in ctl.h and ctl-io.h (see below) for parameters and return value. All parameters must be passed by value (with the exception of strings, which are of type ).

Then, in your specifications file, you must add a declaration of the following form:

(define-external-function name read-inputs? write-outputs? return-type [ arg0-type arg1-type ... ])

is the name of the function, and is the name by which it will be called in a ctl script. This should be identical to the name of the C subroutine, with the exception that underscores are turned into hyphens (this is not required, but is the convention we adopt everywhere else).

If is , then the input variables will be automatically imported into C global variables before the subroutine is called each time. If you don't want this to happen, this argument should be . Similarly, says whether or not the output variables will be automaticaly exported from the C globals after the subroutine is called. All of this code, including the declarations of the C input/output globals, is generated automatically (see below). So, when your function is called, the input variables will already contain all of their values, and you need only assign/allocate data to the output variables to send data back to Guile. If is , the output variables must have valid contents when your routine exits.

is the return type of the subroutine, or if there is no return value (i.e. the function is of type ). The remaining arguments are the types of the parameters of the C subroutine.

Usually, your program will export a subroutine that performs the simulation given the input variables, and returns data to the ctl script through the output variables. Such a subroutine would be declared in C as:

void run(void);

and in the specifications file by:

(define-external-function run true true no-return-value)

As another example, imagine a subroutine that takes a geometric object and returns the fraction of electromagnetic energy in the object. It does not use the input/output global variables, and would be declared in C and in the specifications file by:

/* C declaration: */ number energy_in_object(geometric_object obj); ; Specifications file: (define-external-function energy-in-object false false 'number 'geometric-object)

[edit]

Data structures and types

The data structures for holding classes and other variable types are defined automatically in the generated file (see below). They are fairly self-explanatory, but it should be noted that they use some data types defined in , mostly mirrors of the corresponding Scheme types. (e.g. is a synonym for , and is a structure with , , and fields.) ( also declares several functions for manipulating vectors and matrices, e.g. .)

[edit]

Allocating and deallocating data

The input variables are allocated and deallocated automatically, as necessary, but you are responsible for allocating and deallocating the output data. As a convenience, the function is defined, which deallocates all of the output data pointed to by the output variables. You are responsible for calling this when you want to deallocate the output.

Often, after each run, you will simply want to (re)allocate and assign the output variables. To avoid memory leaks, however, you should first deallocate the old output variables on runs after the first. To do this, use the following code:

if (num_write_output_vars > 0) destroy_output_vars(); /* ... allocate & assign the output variables ... */

The global variable is automatically set to the number of times the output variables have been written.

Remember, you are required to assign all of the output variables to legal values, or the resulting behavior will be undefined.

[edit]

Other useful things to put in a specifications file

The specifications file is loaded before any user ctl file, making it a good place to put definitions of variables and functions that will be useful for your users. For example, the electromagnetic simulation might define a default material, :

(define air (make material-type (epsilon 1.0)))

You can also define functions (or do anything else that Scheme allows), e.g. a function to duplicate geometric objects on a grid. (See the directory of libctl for an example of this.)

To change the Guile prompt in interactive mode to your own prompt, do:

(ctl-set-prompt! "my prompt string")

(We defined our own function so that we have something that works in both Guile 1.x and 2.x.)

[edit]

Writing your program

Once the specifications have been written, you have to do very little to support them in your program.

First, you need to generate C code to import/export the input/output variables from/to Guile. This is done automatically by the script in the directory (installed into a directory by ):

gen-ctl-io --code specifications-file gen-ctl-io --header specifications-file

The commands above generate two files, and . The former defines global variables and data structures for the input/output variables and classes, and the latter contains code to exchange this data with Guile.

Second, you should use the file from the directory; if you use the example (see below), this is done automatically for you. This file defines a main program that starts up Guile, declares the routines that you are exporting, and loads control files from the command line. You should not need to modify this file, but you should define preprocessor symbols telling it where libctl and your specification file are (again, this is done for you automatically by the example ).

For maximum convenience, if you are wisely using GNU autoconf, you should also copy the from ; you can use the otherwise. At the top of this file, there are places to specify your object files, specification file, and other information. The will then generate the files and do everything else needed to compile your program.

You then merely need to write the functions that you are exporting (see above for how to export functions). This will usually include, at least, a function (see above).

The default handles a couple of additional command-line options, including (or ), which sets a global variable to (it is otherwise ). You can access this variable (it is intended to enable verbose output in programs) by declaring the global "" in your program.

Have fun!

[edit]

Guile and Scheme Information

There are many places you can go to on the Web to find out more regarding Guile and the Scheme programming language. We list a few of them here:

[edit]

Scheme:

Scheme is a simplified derivative of Lisp, and is a small and beautiful dynamically typed, lexically scoped, functional language.

[edit]

Guile:

Guile is a free implementation of Scheme, designed to be plugged in to other programs as a scripting language.

  • The home site for the GNU Guile project.
  • See parts IV and V of the Guile Reference Manual for additional Scheme functions and types defined within the Guile environment.

[edit]

How to write a loop in Scheme

The most frequently asked question seems to be: how do I write a loop in Scheme? We give a few answers to that here, supposing that we want to vary a parameter x from a to b in steps of dx, and do something for each value of x.

The classic way, in Scheme, is to write a tail-recursive function:

(define (doit x x-max dx) (if (<= x x-max) (begin ...perform loop body with x... (doit (+ x dx) x-max dx)))) (doit a b dx) ; execute loop from a to b in steps of dx

There is also a do-loop construct in Scheme that you can use

(do ((x a (+ x dx))) ((> x b)) ...perform loop body with x...)

If you have a list of values of x that you want to loop over, then you can use :

(map (lambda (x) ...do stuff with x...) list-of-x-values)

[edit]

How to read in values from a text file in Scheme

A simple command to read a text file and store its values within a variable in Scheme is . As an example, suppose a file foo.dat contains the following text, including parentheses:

(1 3 12.2 14.5 16 18)

In Scheme, we would then use

(define port (open-input-file "foo.dat")) (define foo (read port)) (close-input-port port)

The variable foo would then be a list of numbers '(1 3 12.2 14.5 16 18).

[edit]

Tricks specific to libctl-using programs such as MPB or Meep

libctl has a couple of built-in functions and (see the user reference) to construct lists of a regular sequence of values, which you can use in conjunction with as above:

(map (lambda (x) ...do stuff with x...) (arith-sequence x-min dx num-x))

or

(map (lambda (x) ...do stuff with x...) (interpolate num-x (list a b)))

Finally, if you have an entire libctl input file myfile.ctl that you want to loop, varying over some parameterx, you can do so by writing a loop on the Unix command-line. Using the bash shell, you could do:

for x in `seq a dx b`; do program x=$x myfile.ctl; done

[edit]

License and Copyright

libctl is copyright © 1998, 1999, 2000, 2001, 2002, 2006, Steven G. Johnson.

libctl is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this library; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. You can also find it on the GNU web page:

http://www.gnu.org/copyleft/gpl.html

Many of the files in libctl are individually licensed under the terms of the GNU Lesser General Public License; either version 2 of the License, or (at your option) any later version. This is indicated by the licensing comments at the top of each file. There are a few files in libctl that we place in the public domain, which are not restricted by the terms of the GPL or LGPL; these files explicitly indicate this fact at the top of the file. All files fall under the GPL unless they expressly say otherwise.

The files and contain multi-dimensional numeric integration code that was adapted in part from HIntLib by Rudolf Schuerer and from the GNU Scientific Library by Brian Gough. Both of these libraries are licensed under the GNU GPL, version 2 or later.

Category: Libctl

0 thoughts on “Ab Initio Vector Assignment

Leave a Reply

Your email address will not be published. Required fields are marked *