Aliscan is a tool for visually analyzing nucleotide sequence alignments to identify signature patterns. It provides a web interface for uploading, configuring, and visualizing sequence alignments with color-coded scoring based on nucleotide position significance. It supports group-based analysis, allowing users to define custom sequence groups for comparative analysis. The scoring formula can be customized to fine-tune the analysis based on specific research requirements. Aliscan can be used to visualize conserved and variable regions, to compare sequence patterns across multiple groups of sequences in an interactive manner and especially to design taxon-specific PCR primers and qPCR/RT-qPCR probes.
- Web Interface: Upload and analyze alignment files through a user-friendly web interface
- Group-Based Analysis: Create custom sequence groups for comparative analysis
- Configurable Parameters: Adjust scoring parameters for consensus (ka) and aspecificity (kb)
- Gaps and Unknown Bases Support: Proper support for alignments containing gaps and unknown bases
- Custom Scoring Formula: Modify the scoring formula to customize the analysis
- Visual Results: Color-masked alignment visualization with score highlighting
- Downloadable Results: Export analysis results in HTML format
- Persistent Storage: SQLite database for reliable session state management
- Multi-user Support: Separate sessions for concurrent users
- Python API: Programmatic access to Aliscan functionality for automation and integration
- Python 3.6 or higher: Core programming language used for the application
- Flask 2.2.3: Web framework for building the application's interface
- Werkzeug 2.2.3: WSGI utility library for handling HTTP requests and serving the web application
- BioPython 1.81: Library for biological computation, used for processing and analyzing nucleotide sequences
- SQLite3: Included in Python's standard library, used for state persistence
git clone https://github.com/tripitakit/aliscan.git
cd aliscan
It is recommended to use a virtual environment to manage dependencies. You can create and activate a virtual environment using the following commands:
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
pip install -r requirements.txt
- Start the web server:
flask run
-
Open a browser and navigate to
http://127.0.0.1:5000/
-
Follow the steps in the web interface:
- Upload a FASTA alignment file
- Define sequence groups for analysis
- Configure analysis parameters
- Run the scan and view results
Aliscan uses two main parameters to control the analysis:
-
ka (Consensus coefficient):
- 20: Strict consensus requirement
- 10: High consensus (majority rule high)
- 3: Low consensus (majority rule low)
- Custom values: Any value between 0-100 can be set using the numeric input
-
kb (Aspecificity tolerance):
- 20: Forbid appearance in outgroup
- 10: Penalize appearance in outgroup
- 0: Allow appearance in outgroup (no penalty)
- Custom values: Any value between 0-100 can be set using the numeric input
The scoring formula used is: 1 - (ka*0.5)*(1-a) - (kb*0.1)*b
Where:
- a: frequency of the nucleotide in the ingroup (consensus)
- b: frequency of the nucleotide in the outgroup (aspecificity)
- ka: consensus coefficient
- kb: aspecificity tolerance coefficient
Aliscan includes a customizable scoring formula feature that allows you to fine-tune how sequence alignments are evaluated:
- a: Frequency of a residue in the ingroup (0-1)
- b: Frequency of a residue in the outgroup (0-1)
- ka: Consensus coefficient (preset values of 20, 10, or 3, or any custom value between 0-100)
- kb: Aspecificity tolerance (preset values of 20 or 10, or any custom value between 0-100)
- Basic operations:
+
,-
,*
,/
,()
for grouping
1 - (ka*0.5)*(1-a) - (kb*0.1)*b
This formula evaluates each position by penalizing positions with low ingroup consensus (first term) and high outgroup frequency (second term).
You can modify this formula in the results page to customize the alignment analysis to your specific research needs.
Aliscan provides a Python API for programmatic access to its functionality:
Creates a new empty state object.
Returns:
- A new state dictionary.
Loads an alignment file into the state.
Parameters:
state
: The state dictionary.filepath
: Path to the alignment file in FASTA format.
Returns:
- Updated state with alignment data.
Sets the sequence groups for analysis.
Parameters:
state
: The state dictionary.groups
: List of lists, where each inner list contains sequence indices for a group.
Returns:
- Updated state with group information.
Sets the consensus coefficient parameter.
Parameters:
state
: The state dictionary.ka
: Integer value for the consensus coefficient (typically 3, 10, or 20, but can be any integer between 0-100).
Returns:
- Updated state with new ka value.
Sets the aspecificity tolerance parameter.
Parameters:
state
: The state dictionary.kb
: Integer value for the aspecificity tolerance (typically 10 or 20, but can be any integer between 0-100).
Returns:
- Updated state with new kb value.
Sets a custom scoring formula.
Parameters:
state
: The state dictionary.formula
: String containing the formula using variables a, b, ka, kb.
Returns:
- Updated state with the new formula.
Performs the analysis using the current state configuration.
Parameters:
state
: The configured state dictionary.
Returns:
- Updated state with analysis results.
Generates HTML output from analysis results.
Parameters:
state
: The state dictionary with analysis results.output_filepath
: Path where the HTML output will be saved.
Returns:
- None (writes file to disk).
import aliscan
# Initialize state
state = aliscan.create_state()
# Load alignment file
state = aliscan.load_alignment(state, "alignment.fasta")
# Set sequence groups
state = aliscan.set_groups(state, [[0, 1, 2], [3, 4, 5]])
# Set analysis parameters
state = aliscan.set_ka(state, 10)
state = aliscan.set_kb(state, 10)
# Set custom scoring formula (default expression provided as example)
state = aliscan.set_scoring_formula(state, "1 - (ka*0.5)*(1-a) - (kb*0.1)*b")
# Run the scan
state = aliscan.scan(state)
# Generate HTML output
aliscan.scores2html(state, "results.html")
Aliscan consists of three main components:
- aliscan.py: Core library that handles sequence alignment processing and scoring
- app.py: Flask web application that provides the user interface and handles HTTP requests
- db.py: Database module that manages state persistence using SQLite
The application uses a functional approach where the state is never modified in-place but instead each function returns a new state object. This state is stored in an SQLite database between requests for persistence.
- Session data: Stored in an SQLite database (
aliscan.db
) - Uploaded files: Stored in the
uploads
directory - Results: Generated as HTML files in the
uploads
directory with unique session IDs
The output alignment is color-coded based on the calculated scores:
- No Background: Score < 0.5
- Blue: Score between 0.5 and 0.7
- Green: Score between 0.7 and 0.8
- Yellow: Score between 0.8 and 0.9
- Red: Score > 0.9
Aliscan accepts FASTA format multiple sequence alignment files (.fasta, .fa, .aln).
Contributions to improve aliscan are welcome. Please feel free to submit a Pull Request.
This project is licensed under the GNU General Public License v3.0 - see the COPYING file for details.
Author: Patrick De Marta
Email: patrick.demarta@gmail.com
This project is complete rewrite of the original Aliscan tool developed by P. De Marta and G. Firrao and presented at the Cost Action 853 meeting in Wadensvill, 2002. If you use Aliscan in your research, please cite the following reference:
Aliscan. An interactive tool to assist the design of sequence alignment-based probes. P. De Marta, G. Firrao. Cost Action 853 - Agricultural Biomarkers for Array Technology. Wadensvill 2002.