Skip to content

Latest commit

 

History

History
80 lines (56 loc) · 2.79 KB

README.md

File metadata and controls

80 lines (56 loc) · 2.79 KB

Bio CIF

BioCif is a small C# library designed to parse the Crystallographic Information File (CIF) format, the standard for information interchange in crystallography. It is designed to be fast and easy-to-use.

It provides access to both Tokenization and Parsing of CIF formats for both version 1.1 and version 2.0 as well as convenience wrappers for an API for the Protein Data Bank (PDB) data. The PDB hosts CIF format data (PDBx/mmCIF - Macro-molecular CIF) for protein structure.

Usage

To access the raw stream of tokens:

using BioCif.Core.Tokenization;
using BioCif.Core.Tokens;

using (var fileStream = File.Open(@"C:\path\to\data.cif"))
using (var streamReader = new StreamReader(fileStream))
{
    foreach (Token token in CifTokenizer.Tokenize(streamReader))
    {
        Console.WriteLine(token.TokenType);
    }
}

To access the parsed CIF structure:

using (var fileStream = File.Open(@"C:\path\to\data.cif"))
{
    Cif cif = CifParser.Parse(fileStream);

    DataBlock block = cif.DataBlocks[0];
    Console.WriteLine($"Block name: {block.Name}");

    foreach (IDataBlockMember member in block.Members)
    {
        // ...
    }
}

To access a parsed PDBx/mmCIF:

Pdbx pdbx = PdbxParser.ParseFile(@"C:\path\to\mypdbx.cif");
PdbxDataBlock block = pdbx.First;
List<AuditAuthor> auditAuthors = block.AuditAuthors;

Notes

Defined terms from the CIF specification:

  • data file - information relating to an experiment
  • dictionary file - contains information about data names
  • data name (AKA Tag): identifies the content of a data value
  • data value: string representing a value of any type.
  • data item: data name + data value

Notes on structures within a CIF file:

data block : highest level of cif file
  data_<block name>
  [data items or save frames]

save frame: partitionaed collection of data items
  save_<frame code>
  [data items]
  save_   # Terminates the save frame
  ^ only used in dictionary files

Useful Links

Status

Early stage/incomplete/unmaintained.