Bruno Zanuttini
GREYC

AASeq and AARand

What are AARand and AASeq?

AARand and AASeq are two software tools for helping de novo sequencing of peptides. Used together with a mass spectrometer, they allow circumventing the difficulty of de novo sequencing due to the absence of databases of known sequences to which new spectra can be compared in traditional sequencing.

AARand and AASeq do not analyze MS spectra, neither do they give the sequence of a peptide. What they allow to do is to create databases of sequences that can be used to "feed" softwares like Sequest®. AArand creates random databases and AASeq creates complete databases of sequences given constraints. If the peptide to be sequenced indeed satisfies these constraints, then its sequence will be in the database generated by AASeq, and consequently, if the MS/MS spectrum is good the sequence will be found out. All the databases generated by both are in fasta format.

The use of AARand and AAseq can be summarized as follows:

  1. Collect information about your peptide: An MS/MS spectrum, an m/z value, and information about some of its subsequences. The latter can be guessed by using random sequences libraries generated by AArand and working against the MS/MS spectrum, using your favourite software for that purpose. For instance, you could find out that some sequence tags for N- and C-terminal subsequences occur in all random sequences highly rated by your software.
  2. Use the collected information for generating all candidate sequences using AASeq, and test them all against the MS/MS spectrum, still using your favourite software for that purpose. For instance, you could ask AAseq to generate all sequences weighing 950 Da with a precision of 1 DA, having "FRF" as their C-terminal subsequence and containing at least one "A". This would generate a database of sequences in which your software should find the right one.

The whole procedure is summarized in the following figure:

Usage of AARand/AASeq

Authors and conditions of use

AARand and AASeq are developed at the University of Caen, France, by Joël Henry (BioMEA Laboratory) and Bruno Zanuttini (Research Group on Computer Science, Image, Automatic Control and Instrumentation of Caen, GREYC). You are free to download and use them without any fee. You may also redistribute them for free or modify them under the terms of the GNU General Public Licence. For any information about these topics please contact Bruno Zanuttini by email (bruno.zanuttini@unicaen.fr).

Download and installation

The current version of AARand is version 2.0, and that of AASeq is version 5.0. Both are available for Linux and Windows 95 or later. No particular material configuration is required.

Even if you are free to download the files below, we would appreciate if you could give us some information about you and your motivation for using AARand and AASeq. This would also help us in supporting you if you encounter problems with their use or when we publish new versions. Nothing in the following form is mandatory.

Windows

Simply download one of the following three files: aarand.zip (AARand, 72 kB) - aaseq.zip (AASeq, 129 kB) - aarand-aaseq.zip (both, 199 kB).

Save the file into any directory of your disk, preferrably a new one, and unzip it (this should be achieved by double-clicking onto the file name and following the instructions for extracting). You are now ready to use AARand and/or AASeq.

Linux

Simply download one of the following three files: aarand.tgz (AARand, 19 kB) - aaseq.tgz (AASeq, 62 kB) - aarand-aaseq.tgz (both, 80 kB).

Save the file into any directory of your disk, preferrably a new one, and unarchive it. This should be achieved by the following command typed in the directory where you saved the file: tar xvzf file.tgz. You are now ready to use AARand and/or AASeq.

Source files

According to the GNU General Public Licence, you are allowed to modify and redistribute AARand and AASeq. The following archives contain all the required files:

Usage and user's manuals

You will find below essential information to get started with AARand and AASeq, but their complete user's manuals are available in PDF format: aarand-manual.pdf (AARand, 77 kB) - aaseq-manual.pdf (AASeq, 119 kB).

If you want to get supported for any technical problem, feel free to contact Bruno Zanuttini by email (bruno.zanuttini@unicaen.fr).

Getting a command prompt

AARand and AASeq have no graphical interface (yet). So you have to run them "on the command line". Windows users will have to launch Command Prompt in their Start menu and then type cd c:\Documents" "and" "Settings\AASeq (if this is the path to the place where they saved and extracted the zip file); Linux users will have to run a terminal and similarly type, e.g., cd ~/AASeq.

Databases, command and output files

Both AARand and AASeq use text files. AASeq needs constraints on sequences to be typed into such a file, and both need the list of acids together with their masses to be typed in this manner too.

You may create your own databases and command files by using any text editor, e.g., NotePad, Emacs, Vi etc. Nevertheless, if you use editors such as Word, remember to save the file in Text (.txt) format.

Since databases of amino acids are common to both, here are some words about them. Your installation contains two files named "database.txt" and "aasacids.txt". Reading the first one with any text editor will give you all necessary information for creating your own database of acids, and the second one is a ready-to-use database of the twenty common amino acids together with their average masses.

Finally, both AARand and AASeq write their sequences into again a text file. This file is in "fasta" format, which is standard for amino acid libraries. You may read such files for example with a text editor, and your other softwares for peptide sequencing will certainly understand them.

Basic use of AARand

The basic use of AARand consists in asking it to generate a certain number of random sequences of a certain length. To generate 100 random sequences each one containing 100000 amino acids from database "myacids.txt" and to store them into file "mysequences.fasta", get a command prompt as explained above and type:

aarand -n 100 -s 100000 -d myacids.txt -o mysequences.fasta

Press "enter"; the desired random sequences are now stored in file "mysequences.fasta" in the directory where you installed AARand.

Typing aarand --help will give you some more information directly on the command line.

Basic use of AASeq

The most basic use of AASeq consists in asking it to generate all sequences with a given mass. To generate all sequences with a total mass of 500 Da plus or minus 1 Da type the following into a file named, e.g., "seqs500.txt" in the directory where you installed AASeq:

[database] aasacids.txt
[molecular-mass] 500 ~ 1
[water-mass] 0
    

Save the file, then get a command prompt as explained above and type:

aaseq seqs500.txt

Press "enter"; the desired sequences are now stored in file "seqs500.fasta" in the directory where you installed AASeq.

Typing aaseq --help will give you some more information directly on the command line. Reading the file "command.txt" will give you all details, including examples, on how to impose constraints on the generation process, and finally you can use the file "template.txt" as a basis when you want to create your own command files, by first copying and then completing it.

Important note Do not use AASeq for, e.g., generating all sequences weighing between 500 and 1000 Da. The only thing you will get is a full disk! AASeq can generate sequences very quickly, but it is designed to generate every sequence it is asked to. It is your own responsibility to know in advance whether there will be a reasonable number of them, so that they can all be stored on your disk, but also so that your other softwares can handle all of them.

Bruno Zanuttini bruno.zanuttini@unicaen.fr XHTML 1.0 Strict valid CSS valid