| --- |
| license: llama2 |
| datasets: |
| - mims-harvard/ProCyon-Instruct |
| language: |
| - en |
| base_model: |
| - meta-llama/Llama-2-7b-hf |
| tags: |
| - biology |
| - protein |
| --- |
| # ProCyon-Split |
|
|
| ProCyon-Split is a multimodal foundation model for protein phenotypes, which combines a large language model with protein encoders to support inputs of interleaved free text and proteins. |
| In contrast to ProCyon-Full, this model is instruction-tuned using the training split of the [ProCyon-Instruct](https://huggingface.co/datasets/mims-harvard/ProCyon-Instruct) dataset to |
| enable rigorous model evaluation on held-out protein-phenotype pairs. |
|
|
| For more information on the model design, training, and validation, please see the [overview page](https://zitniklab.hms.harvard.edu/ProCyon/) or the [paper](https://www.biorxiv.org/content/10.1101/2024.12.10.627665v1). |
|
|
| Additional versions of the model are available as [ProCyon-Full](https://huggingface.co/mims-harvard/ProCyon-Full) and [ProCyon-Bind](https://huggingface.co/mims-harvard/ProCyon-Bind). |
|
|