Multiple Sequence Alignment (MSA)

Rowan's multiple sequence alignment (MSA) workflow provides a private, reproducible system for generating high-quality alignments suitable for protein structure prediction and co-folding models such as Boltz-2, Chai-1, and Boltz-1. All sequence processing takes place within Rowan's managed compute environment—no data is transmitted to third-party servers.

How It Works

Rowan's MSA workflow provides MSA functionality within a controlled, contractually governed environment. The workflow uses a Rowan-hosted ColabFold MMSeqs2 server for both single-chain and paired-chain searches, producing MSAs directly compatible with Rowan's model ecosystem. The resulting alignments can be used immediately by Boltz-2, Chai-1, and other AlphaFold-derived architectures without reformatting.

All alignments are generated against the curated datasets recommended by the ColabFold team (available at https://opendata.mmseqs.org/colabfold).

These include:

DatabaseDescriptionApprox. Size
UniRef30 (2023_02, 2022_02, 2021_03)30% identity-clustered sequences derived from UniRef10075–103 GB
BFD / MGnify (bfd_mgy_colabfold)Combined Big Fantastic Database and MGnify environmental sequences, clustered at 30 % identity91 GB
ColabFold DB (colabfold_envdb_202108)Composite of BFD/MGnify with MetaEuk, SMAG, TOPAZ, MGV, GPD, and MetaClust2118 GB
PDB70 / PDB100Sequence clusters from the Protein Data Bank for structural templates21–28 MB
FoldSeek PDB100PDB100 database in FoldSeek format19 GB

The original downloads of all the database files are available via https://opendata.mmseqs.org/colabfold and https://colabfold.mmseqs.com/.

Output Formats

The workflow emits alignments in formats directly usable by different structure prediction models and by external AlphaFold-derived pipelines.

FormatIntended UseOutput Structure
BoltzBoltz-1 / Boltz-2 co-folding modelsseq_0.csv, seq_1.csv, ...
ChaiChai-1 co-folding modelaligned.pqt file
ColabFoldDirect MMSeqs2 outputunpaired/ and paired/ .a3m files

All outputs are packaged into a compressed archive for speed of data transfer.