Rowan's multiple sequence alignment (MSA) workflow provides a private, reproducible system for generating high-quality alignments suitable for protein structure prediction and co-folding models such as Boltz-2, Chai-1, and Boltz-1. All sequence processing takes place within Rowan's managed compute environment—no data is transmitted to third-party servers.
Rowan's MSA workflow provides MSA functionality within a controlled, contractually governed environment. The workflow uses a Rowan-hosted ColabFold MMSeqs2 server for both single-chain and paired-chain searches, producing MSAs directly compatible with Rowan's model ecosystem. The resulting alignments can be used immediately by Boltz-2, Chai-1, and other AlphaFold-derived architectures without reformatting.
All alignments are generated against the curated datasets recommended by the ColabFold team (available at https://opendata.mmseqs.org/colabfold).
These include:
| Database | Description | Approx. Size |
|---|---|---|
| UniRef30 (2023_02, 2022_02, 2021_03) | 30% identity-clustered sequences derived from UniRef100 | 75–103 GB |
| BFD / MGnify (bfd_mgy_colabfold) | Combined Big Fantastic Database and MGnify environmental sequences, clustered at 30 % identity | 91 GB |
| ColabFold DB (colabfold_envdb_202108) | Composite of BFD/MGnify with MetaEuk, SMAG, TOPAZ, MGV, GPD, and MetaClust2 | 118 GB |
| PDB70 / PDB100 | Sequence clusters from the Protein Data Bank for structural templates | 21–28 MB |
| FoldSeek PDB100 | PDB100 database in FoldSeek format | 19 GB |
The original downloads of all the database files are available via https://opendata.mmseqs.org/colabfold and https://colabfold.mmseqs.com/.
The workflow emits alignments in formats directly usable by different structure prediction models and by external AlphaFold-derived pipelines.
| Format | Intended Use | Output Structure |
|---|---|---|
| Boltz | Boltz-1 / Boltz-2 co-folding models | seq_0.csv, seq_1.csv, ... |
| Chai | Chai-1 co-folding model | aligned.pqt file |
| ColabFold | Direct MMSeqs2 output | unpaired/ and paired/ .a3m files |
All outputs are packaged into a compressed archive for speed of data transfer.