Reading and writing the image files in SPARX/EMAN2

EMAN2 supports many image formats, including all major EM file formats (SPIDER, IMAGIC, MRC, ...). However, it is recommended to use hdf file format whenever possible as only this format supports attributes set in file header with sufficient flexibility.

We have recently added bdb file format to our programs. The reading and writing of this file format is much faster than hdf file format. The bdb files are stored in a directory called EMAN2DB, there are usually two files for an image stack, one is for the image information, the other is for the header information. However, we don't deal with these two files directly, instead, we use "bdb:filename" outside the EMAN2DB directory to represent the image stack.

Notice: It is dangerous to move the EMAN2DB directory, especially between different machines. Hence, if you do need to move the bdb file, it is recommended that you first convert it to the hdf file using sxcpy.py command:

. sxcpy.py bdb:filename filename.hdf

After moving the hdf file to the desired location, you can use sxcpy.py command again to convert it back to bdb file format

. sxcpy.py filename.hdf bdb:filename

Note for SPIDER users: SPIDER file format is the only one in which there is a distinction between single image file and stack of images. In other formats single image = stack with one image, so for consistency it is advisable to avoid using single SPIDER files in SPARX.

READ IMAGE

To read a single image from a one-image file:
- a = EMData()
- a.read_image("name.hdf")
- a.read_image("bdb:name")
or
- a = get_image("name.hdf")
- a = get_image("bdb:name")
To read a single image from a stack file:
- a = EMData()
- i=12
- a.read_image("name.hdf" , i)
- a.read_image("bdb:name", i)
Note: image numbers are from 0 to n-1, where n is the total number of images in the stack file.
To read all images from a stack file:
- a = EMData.read_images("name.hdf")
- a = EMData.read_images("bdb:name")
To obtain the number of images in the stack file
- stack = "data.hdf"
- stack = "bdb:data"
- n = EMUtil.get_image_count(stack)
Note: it is not possible to obtain other characteristics (image size, image format, ...) without reading one file into the memory.
To read header information of an image
ima = EMData()
ima.read_image(stack, 0, True)

WRITE IMAGE

To write a single image to a one-image file:
- a.write_image("name.hdf")
- a.write_image("bdb:name")
or
- imgtype = EMUtil.ImageType.IMAGE_HDF
- a.write_image("name.hdf", 0, imgtype)
- or
- drop_image(a, "name.hdf")
- drop_image(a, "name.spi", "s")
- drop_image(a, "bdb:name")
To write a single image to a stack file:
- i=12
- a.write_image("name.hdf" , i)
- a.write_image("bdb:name", i)
Notice: image numbers are from zero to n-1, where n is the total number of images in the stack file.
To place image in an indexed spider stack:
- imgtype = EMUtil.ImageType.IMAGE_SPIDER
- a.write_image("name.spi", i, imgtype)
To write header to the image file (this does not write the image itself):
- ima.write_image(stack, i, EMUtil.ImageType.IMAGE_HDF, True)
- DB = db_open_dict(stack)
- DB.set_header(i, ima)
- or
- write_header(stack, ima, i) (This works for both hdf and bdb file.)

ATTRIBUTES IN FILE HEADERS

Any object can be attached to an in-core hdf file as an attribute and when the file is written to the disk, the attached information is also stored. When the file is read, the attached information is available through the attribute name.
Any user can attached a set of attributes to images and interpret them. However, there is a set of standard names that some programs will expect/modify and many programs will not work unless these attributes are preset to default values. For the official list, see http://blake.bcm.edu/emanwiki/Eman2Metadata. Specifically, in SPARX we use image attributes that are catergorized into the following four classes:

.i. 2D orientation/alignment attributes:

The 2D orientation/alignment attributes are now stored in a single attribute xform.align2d as a Transform object. However, we usually don't need to access this parameter itself; instead, we can use one of the two following commands.

To get 2D orientation/alignment attributes, use the following command:

. alpha, sx, sy, mirror, scale = get_params2D(ima)

To set 2D orientation/alignment attributes, use the following command:

. set_params2D(ima, [alpha, tx, ty, mirror, scale])

alpha: 0 - rotation angle
tx: 0 - shift in x direction in image plane (in 2-D)
ty: 0 - shift in y direction in image plane (in 2-D)
mirror: 0 - do not mirror, 1 - after application of in-plane transformation, image has to be x-mirrored, i.e., `f_m(x',y') = f(-x, y)`.
scale: scale of the image, generally set to 1.0

.ii. Projection orientation attributes:

The projection orientation attributes are now stored in a single attribute xform.projection as a Transform object. However, we usually don't need to access this parameter itself; instead, we can use one of the two following commands.

To get projection orientation attributes, use the following command:

. phi, theta, psi, s2x, s2y = get_params_proj(ima)

To set projection orientation attributes, use the following command:

. set_params_proj(ima, [phi, theta, psi, s2x, s2y])

phi: 0 - Eulerian angle for 3D reconstruction (azimuthal)
theta: 0 - Eulerian angle for 3D reconstruction (tilt)
psi: 0 - Eulerian angle for 3D reconstruction (in-plane rotation of projection)
s2x: 0 - shift in x direction
s2y: 0 - shift in y direction

.iii. 3D orientation/alignment attributes:

The 3D orientation/alignment attributes are now stored in a single attribute xform.align3d as a Transform object. However, we usually don't need to access this parameter itself; instead, we can use one of the two following commands.

To get 3D orientation/alignment attributes, use the following command:

. phi, theta, psi, s3x, s3y, s3z, mirror, scale = get_params3D(ima)

To set 3D orientation/alignment attributes, use the following command:

. set_params3D(ima, [phi, theta, psi, s3x, s3y, s3z, mirror, scale])

phi: 0 - Eulerian angle for 3D reconstruction (azimuthal)
theta: 0 - Eulerian angle for 3D reconstruction (tilt)
psi: 0 - Eulerian angle for 3D reconstruction (rotation around new z axis)
s3x: 0 - shift in x direction
s3y: 0 - shift in y direction
s3z: 0 - shift in z direction
mirror: 0 - do not mirror, 1 - image has to be x-mirrored, i.e., `f_m(x',y') = f(-x, y)`.
scale: scale of the image, generally set to 1.0

.iv. CTF related attributes:

Outdated, currently it is an object PAP 01/19/09

ctf_applied: 0 - image was not multiplied by the CTF, 1 - image was multiplied by the CTF with parameters given by the following attributes:
defocus: 15234 - defocus [Å] associated with the image, positive value corresponds to underfocus
amp_contrast: 0.1 - amplitude contrast, see definition of the CTF for details
voltage: 200 - accelerating voltage of the microscope [kV]
Cs: 2.0 - spherical aberration constant [mm].
Pixel_size: 2.2 - pixel size in Å, currently used only in the context of the CTF

.vi. Image formation attributes:

outdated

Noise_a: The first parameter of baseline noise in 1D rotationally averaged power spectrum of particles.
Noise_b: The second parameter of baseline noise in 1D rotationally averaged power spectrum of particles.
CTF_noise: The parameter gives the noise affected by CTF.
B_factor: The parameter in Guassian like envelope function, which roughly explains Fourier factor dumping of the image.

.vii. Image ID attributes:

defg: defocus group ID.
mic: from which micrograph the particle is picked up. This attribute can help you to get back to the original micrograph.
xp: x coordinate of the particle in the micrograph
yp: y coordinate of the particle in the micrograph

SET ATTRIBUTES

To set an attribute:
- proj.set_attr_dict({'phi': 12.0})
or
- b = 12.0
- proj.set_attr_dict({'phi': b})
To set multiple attributes:
proj.set_attr_dict({'defocus':0.0, 'amp_contrast':0.1, 'voltage':200, 'Cs':2.0, 'pixel':2.2})
To store a list in the header:
img.set_attr('myArray', [1.1, 2.2, 3.3])
To write header information of an imth file to a stack (this does not write the image itself):
data[im].write_image(stack, im, EMUtil.ImageType.IMAGE_HDF, True)

GET ATTRIBUTES

To get attribute value from a single image:
- phi = proj.get_attr('phi')
To get attribute value from a single image when we do not know whether the attribute was set:
- ccc = proj.get_attr_default('ccc', -1.0)
- this will set ccc to the value of the attribute 'ccc' or to -1.0 if this attribute does not exist.
To get the dictionary of all attributes in an image:
- l = image.get_attr_dict()
- print l
To get attribute values from all images in a stack file:
- phi = EMUtil.get_all_attributes('stack_name','phi')
phi is a list of all values of the attribute phi set to images.
To retrieve a list from the header:
arr = img.get_attr('myArray')
print arr

[1.1, 2.2, 3.3]

To read header from the disk file without reading the image file
dummy = EMData()
dummy.read_image(stack, im, True)

DELETE ATTRIBUTES

To delete attribute:
- image.del_attr("attr_name")
or
- image.del_attr(['a_name1', 'a_name2', 'a_name3'])

Non-image scratch files

If you want to store blocks of fixed size binary data and retrieve them in random order, this is quite simple:

a=file("cache","w")
a.seek(recnum*blocksize)
a.write(data)
a.close
a=file("cache","r+")
a.seek(recnum*blocksize)
data=a.read(data,blocksize)
a.seek(recnum*blocksize)
a.write(data)
where data is a string object with arbitrary binary data.