基因组注释软件braker安装

1. braker2 conda安装

1.1. braker2安装

由于手动安装需要配置大量的依赖,这边可以使用conda安装,来帮我们自动解决一些依赖,帮我们降低难度。

conda create braker2
conda activate braker2
conda install bioconda::braker2

这边安装完后报了一些提示信息:

1

这些提示是关于 BRAKER2 软件安装后的一些注意事项和后续配置要求。下面是每一部分的解释和解决步骤:

1.2. AUGUSTUS 配置目录问题

提示:

The config/ directory from AUGUSTUS can be accessed with the variable AUGUSTUS_CONFIG_PATH.
BRAKER2 requires this directory to be in a writable location...

解释:

BRAKER2 依赖 AUGUSTUS 的配置文件目录 (config/),并要求该目录对用户具有写权限。如果安装后,默认的配置目录不可写(通常在 Conda 环境中不可写),需要将其复制到一个用户有写权限的地方。

解决步骤:

首先看看安装了AUGUSTUS软件没有,如果没有的话可以conda安装一下:

conda install bioconda::augustus

查看是否安装成功:

augustus --version

这边提示 错误augustus: error while loading shared libraries: libcolamd.so.2: cannot open shared object file: No such file or directory

AUGUSTUS 在尝试启动时,无法找到名为 libcolamd.so.2 的共享库文件。这个库文件是 COLAMD 库的一部分,它通常用于稀疏矩阵计算。

这边可以conda安装:

conda install conda-forge::libcolamd

好吧,这边已经提示安装了的:

2

那应该就是因为 AUGUSTUS 找不到安装的库,或者库的路径没有正确配置。

可以通过conda list libcolamd确认 libcolamd 是否已经正确安装并找到库文件的位置。

可以通过命令find $CONDA_PREFIX -name "libcolamd.so.2"查找 libcolamd.so.2 的实际路径。

有时库文件会有符号链接,指向特定版本。如果没有符号链接,可以先find $CONDA_PREFIX -name "libcolamd.so"找到地址,然后创建一个指向 libcolamd.so.2 的链接:

ln -s $CONDA_PREFIX/lib/libcolamd.so $CONDA_PREFIX/lib/libcolamd.so.2

然后再augustus --version命令运行成功!

配置 AUGUSTUS_CONFIG_PATH:

AUGUSTUS_CONFIG_PATH 环境变量需要指向可写的配置目录。您可以将 config 目录复制到您有写入权限的目录,并设置环境变量。执行以下步骤:

将 config 目录复制到一个您有写权限的路径:cp -r $CONDA_PREFIX/config /path/to/writable/directory/

设置 AUGUSTUS_CONFIG_PATH 环境变量指向新的目录:

echo 'export AUGUSTUS_CONFIG_PATH=/path/to/writable/directory/config' >> ~/.bashrc
source ~/.bashrc

使用命令echo $AUGUSTUS_CONFIG_PATH查看是否配置完成。

1.3 GeneMark-ES/ET安装

可以从 http://exon.gatech.edu/GeneMark/license_download.cgi 下载GeneMark(选择需要的软件,然后填写相关信息后可下载),并按照文档进行安装,最后添加环境变量。

1.3.1. 安装

# gmes_linux_64_4.tar  安装包
# gm_key  密钥
tar -xvf gmes_linux_64_4.tar  # 解压安装包
cp gm_key ~/.gm_key  #将密钥保存到home

检查文件完整性:

~/check_install.bash

出现一下提示说明安装成功了:

Checking GeneMark-ES installation
Checking Perl setup
All required Perl modules were found
Checking GeneMark.hmm setup
GeneMark.hmm was found
GeneMark.hmm is set
GeneMark.hmm is executable
Performing GeneMark.hmm test run
All required components for GeneMark-ES were found

添加到环境变量:

echo 'export PATH=/you/to/path/gmes_linux_64_4:$PATH' >> ~/.bashrc
source ~/.bashrc

PS:

可能中途会有一些perl模块缺失,可以使用cpan安装。

运行查看Genemark是否能正常运行:

gmes_petap.pl 

1.4. 检查braker依赖的软件是否完整

braker.pl --checkSoftware

根据输出信息将相应的软件安装上。

比如我的还缺少cdbtools,这边查了一下,直接可以conda安装,就用conda安装到当前目录下:

conda install -c conda-forge cdbtools

再check一下,如下就算安装好了:

1

2. Braker3 安装

BRAKER3是BRAKER套件中的最新管道。它使RNA-seq和蛋白质数据能够在全自动化管道中使用GeneMark-ETP和AUGUSTUS训练和预测高度可靠的基因。该流水线的结果是两个基因预测工具的组合基因集,其中只包含有高外部证据支持的基因。

2.1. 创建一个新的环境

conda create -n braker3 python=3.8
conda activate braker3

2.2. 安装基本依赖

conda install -c anaconda perl
conda install -c anaconda biopython
conda install -c bioconda perl-app-cpanminus
conda install -c bioconda perl-hash-merge
conda install -c bioconda perl-parallel-forkmanager
conda install -c bioconda perl-yaml
#conda install -c bioconda perl-yaml-xs #不在这个通道里面
conda install eumetsat::perl-yaml-xs
conda install -c bioconda perl-scalar-util-numeric
conda install -c bioconda bamtools
conda install -c bioconda bedtools
conda install -c bioconda hisat2
conda install -c bioconda gffread
conda install -c bioconda stringtie
conda install -c bioconda cdbtools
conda install -c bioconda perl-file-spec

conda install -c bioconda perl-list-util
conda install -c bioconda perl-module-load-conditional
conda install -c bioconda perl-posix
conda install -c bioconda perl-file-homedir

conda install -c bioconda perl-scalar-util-numeric

conda install -c bioconda perl-class-data-inheritable
conda install -c bioconda perl-exception-class
conda install -c bioconda perl-test-pod
conda install -c bioconda perl-file-which # skip if you are not comparing to reference annotation
conda install -c bioconda perl-mce
conda install -c bioconda perl-threaded
conda install -c bioconda perl-list-util
conda install -c bioconda perl-math-utils

conda install -c eumetsat perl-yaml-xs
conda install -c bioconda perl-data-dumper

2.3 安装额外 Perl 模块

cpanm File::Spec::Functions List::Util MCE::Mutex Module::Load::Conditional POSIX Math::Utils File::HomeDir

这边提示List::Util安装失败:

1

根据日志,安装失败的原因是缺少 crypt.h 头文件,这通常是因为缺少开发工具包或相关库。在 Linux 上,crypt.h 通常是由 libcrypto 提供的,可以尝试通过安装相关开发包来解决这个问题。

#conda install libcrypto#安装不上
conda install openssl#这个也包含libcrypot

将这个添加到环境变量:

export CPATH=/path/to/openssl/include:$CPATH
export LIBRARY_PATH=/path/to/openssl/lib:$LIBRARY_PATH
export LD_LIBRARY_PATH=/path/to/openssl/lib:$LD_LIBRARY_PATH

再次尝试安装一下:

cpanm List::Util

安装成功!

2

2.4. braker安装

git clone https://github.com/Gaius-Augustus/BRAKER.git
cd BRAKER/scripts
chmod a+x *.pl *.py

添加到环境变量:

echo 'export PATH="$PATH:to/you/path/BRAKER/scripts"' >> ~/.bashrc 
source ~/.bashrc

推荐添加到当前conda到启动文件中:

mkdir -p /data/wangxingbin/anaconda3/envs/braker3/etc/conda/activate.d
touch /data/wangxingbin/anaconda3/envs/braker3/etc/conda/activate.d/env_vars.sh

添加 braker 路径到 env_vars.sh 文件

#!/bin/bash
export PATH="/data/wangxingbin/Software/braker-3.0.8/scripts:$PATH"

2.5. 依赖的软件安装

先查看少了些什么,然后一个个安装:

braker.pl --checkSoftware

2.5.1. Augustus安装

缺少Augustus,conda暂时安装不了3.5.0版本,所以需要手动重新安装:

git clone http://github.com/Gaius-Augustus/Augustus.git
cd Augustus
make augustus
make auxprogs

2.5.1.1. mysql++安装

这边出现了一个小插曲,没有识别到〈mysql++.h〉,这边需要手动安装一下mysql:

#下载MySQL++源码并解压
wget https://tangentsoft.com/mysqlpp/releases/mysql%2B%2B-3.2.5.tar.gz
tar -xvf mysql++-3.2.5.tar
mv mysql++-3.2.5 mysql++
  cd mysql++

因为mysql++还依赖于mysql.h,而我服务器没有安装,又没有sudo权限,所以安装在了conda下,这边需要找到位置:

conda install anaconda::mysql
find /data/wangxingbin/anaconda3/envs/braker3 -name "mysql.h"

设置配置,安装:

#执行./configure生成makefile文件
./configure --with-mysql=/data/wangxingbin/anaconda3/envs/braker3 --with-mysql-lib=/data/wangxingbin/anaconda3/envs/braker3/lib --with-mysql-include=/data/wangxingbin/anaconda3/envs/braker3/include

make
make install

修改Augustus配置,将commom.mk 中的INCLUDE_PATH_MYSQL添加上你安装的mysql++、mysql.h的位置:

INCLUDE_PATH_MYSQL := -I/data/wangxingbin/SoftWare/tools/mysql++/include -I/data/wangxingbin/SoftWare/anaconda3/envs/braker3/include
LIBRARY_PATH_MYSQL := -L/data/wangxingbin/SoftWare/tools/mysql++/lib -Wl,-rpath,/data/wangxingbin/SoftWare/tools/mysql++/lib

2.5.1.2. LPsolve安装

这边安装发现又差了 #include “lp_lib.h”这个,然后这边安装LPsolve

conda install -c conda-forge lpsolve55 --override-channels

还是添加到commom.mk 中:

INCLUDE_PATH_LPSOLVE := -I/data/wangxingbin/SoftWare/anaconda3/envs/braker3/include/lpsolve

# 设置 LPSOLVE 的库文件路径
LIBRARY_PATH_LPSOLVE := -L/data/wangxingbin/SoftWare/anaconda3/envs/braker3/lib -llpsolve55

2.5.1.3. suitesparse

conda install -c conda-forge suitesparse
conda list suitesparse

还是设置common.mk配置:

INCLUDE_PATH_SUITESPARSE := -I$(CONDA_PREFIX)/include
LIBRARY_PATH_SUITESPARSE := -L$(CONDA_PREFIX)/lib -Wl,-rpath,$(CONDA_PREFIX)/lib

然后再编译:

make clean
make augustus

编译完成后运行一下命令,出现完整的话说明安装成功了:

./bin/augustus 

接下来就是编译附属程序:

make auxprogs #编译附属软件

2.5.1.3. bamtools

缺少#include <api/BamReader.h>,安装bamtools:

conda install -c bioconda bamtools
conda list bamtools

设置到common.mk:

INCLUDE_PATH_BAMTOOLS := -I/data/wangxingbin/SoftWare/anaconda3/envs/braker3/include/bamtools
LIBRARY_PATH_BAMTOOLS := -L/data/wangxingbin/SoftWare/anaconda3/envs/braker3/lib -Wl,-rpath,/data/wangxingbin/SoftWare/anaconda3/envs/braker3/lib

2.5.1.4. htslib安装:

conda install -c bioconda htslib
conda list htslib

设置到common.mk:

INCLUDE_PATH_HTSLIB      := -I/data/wangxingbin/SoftWare/anaconda3/envs/braker3/include/htslib
LIBRARY_PATH_HTSLIB      := -L/data/wangxingbin/SoftWare/anaconda3/envs/braker3/lib -Wl,-rpath,/data/wangxingbin/SoftWare/anaconda3/envs/braker3/lib

2.5.1.5. seqlib安装:

conda install -c bioconda seqlib
conda list seqlib

设置到common.mk:

INCLUDE_PATH_SEQLIB := -I/data/wangxingbin/SoftWare/anaconda3/envs/braker3/include/seqlib
LIBRARY_PATH_SEQLIB := -L/data/wangxingbin/SoftWare/anaconda3/envs/braker3/lib -Wl,-rpath,/data/wangxingbin/SoftWare/anaconda3/envs/braker3/lib

2.5.1.6. gsl安装:

conda install -c conda-forge gsl
conda list gsl

设置:

INCLUDE_PATH_GSL := -I/data/wangxingbin/SoftWare/anaconda3/envs/braker3/include/gsl
LIBRARY_PATH_GSL:= -L/data/wangxingbin/SoftWare/anaconda3/envs/braker3/lib -Wl,-rpath,/data/wangxingbin/SoftWare/anaconda3/envs/braker3/lib

2.5.1.7. boost安装:

conda install -c conda-forge boost
conda list coost

设置:

INCLUDE_PATH_BOOST := -I/data/wangxingbin/SoftWare/anaconda3/envs/braker3/include/boost
LIBRARY_PATH_BOOST:= -L/data/wangxingbin/SoftWare/anaconda3/envs/braker3/lib -Wl,-rpath,/data/wangxingbin/SoftWare/anaconda3/envs/braker3/lib

2.5.1.8. zlib安装:

conda install -c conda-forge zlib
conda list zlib

设置:

INCLUDE_PATH_ZLIB := -I/data/wangxingbin/SoftWare/anaconda3/envs/braker3/include/zlib
LIBRARY_PATH_ZLIB:= -L/data/wangxingbin/SoftWare/anaconda3/envs/braker3/lib -Wl,-rpath,/data/wangxingbin/SoftWare/anaconda3/envs/braker3/lib

全部安装设置好后再次make:

make auxprogs #编译附属软件

2.5.1.9. 添加环境变量:

最终common.mk:

INCLUDE_PATH_ZLIB := -I/data/wangxingbin/anaconda3/envs/braker3/include/zlib
LIBRARY_PATH_ZLIB := -L/data/wangxingbin/anaconda3/envs/braker3/lib -Wl,-rpath,/data/wangxingbin/anaconda3/envs/braker3/lib

INCLUDE_PATH_BOOST := -I/data/wangxingbin/anaconda3/envs/braker3/include/boost
LIBRARY_PATH_BOOST := -L/data/wangxingbin/anaconda3/envs/braker3/lib -Wl,-rpath,/data/wangxingbin/anaconda3/envs/braker3/lib

INCLUDE_PATH_LPSOLVE := -I/data/wangxingbin/anaconda3/envs/braker3/include/lpsolve
LIBRARY_PATH_LPSOLVE := -L/data/wangxingbin/anaconda3/envs/braker3/lib -llpsolve55 -Wl,-rpath,/data/wangxingbin/anaconda3/envs/braker3/lib

INCLUDE_PATH_SUITESPARSE := -I/data/wangxingbin/anaconda3/envs/braker3/include
LIBRARY_PATH_SUITESPARSE := -L/data/wangxingbin/anaconda3/envs/braker3/lib -Wl,-rpath,/data/wangxingbin/anaconda3/envs/braker3/lib

INCLUDE_PATH_GSL := -I/data/wangxingbin/anaconda3/envs/braker3/include/gsl
LIBRARY_PATH_GSL := -L/data/wangxingbin/anaconda3/envs/braker3/lib -Wl,-rpath,/data/wangxingbin/anaconda3/envs/braker3/lib

# MySQL 路径
INCLUDE_PATH_MYSQL := -I/data/wangxingbin/Software/mysql/usr/include -I/data/wangxingbin/Software/mysql/usr/include/mysql
LIBRARY_PATH_MYSQL := -L/data/wangxingbin/Software/mysql/usr/lib/x86_64-linux-gnu -Wl,-rpath,/data/wangxingbin/Software/mysql/usr/lib/x86_64-linux-gnu

INCLUDE_PATH_SQLITE      := -I/data/wangxingbin/anaconda3/envs/braker3/include/sqlite
LIBRARY_PATH_SQLITE      := -L/data/wangxingbin/anaconda3/envs/braker3/lib -Wl,-rpath,/data/wangxingbin/anaconda3/envs/braker3/lib

# BAMTools 和 HTSlib 库路径
INCLUDE_PATH_BAMTOOLS := -I/data/wangxingbin/anaconda3/envs/braker3/include/bamtools
LIBRARY_PATH_BAMTOOLS := -L/data/wangxingbin/anaconda3/envs/braker3/lib -Wl,-rpath,/data/wangxingbin/anaconda3/envs/braker3/lib

INCLUDE_PATH_HTSLIB := -I/data/wangxingbin/anaconda3/envs/braker3/include/htslib
LIBRARY_PATH_HTSLIB := -L/data/wangxingbin/anaconda3/envs/braker3/lib -Wl,-rpath,/data/wangxingbin/anaconda3/envs/braker3/lib

# Seqlib 库路径
INCLUDE_PATH_SEQLIB := -I/data/wangxingbin/anaconda3/envs/braker3/include/seqlib
LIBRARY_PATH_SEQLIB := -L/data/wangxingbin/anaconda3/envs/braker3/lib -Wl,-rpath,/data/wangxingbin/anaconda3/envs/braker3/lib

echo 'export PATH=~/SoftWare/tools/Augustus/bin:/data/wangxingbin/SoftWare/tools/Augustus/scripts:$PATH' >> ~/.bashrc
echo 'export AUGUSTUS_CONFIG_PATH=/data/wangxingbin/SoftWare/tools/Augustus/config' >> ~/.bashrc
echo 'export PATH=/data/wangxingbin/SoftWare/tools/Augustus/bin:$PATH' >> ~/.bashrc
source ~/.bashrc

2.5.2. diamon安装

conda install -c bioconda diamond

2.5.3. TSEBRA安装

  git clone https://github.com/Gaius-Augustus/TSEBRA.git
cd TSEBRA
pwd

echo 'export TSEBRA_PATH=/data/wangxingbin/SoftWare/tools/TSEBRA/bin' >> ~/.bashrc

安装compleasm:

wget https://github.com/huangnengCSU/compleasm/releases/download/v0.2.4/compleasm-0.2.4_x64-linux.tar.bz2
tar -xvjf compleasm-0.2.4_x64-linux.tar.bz2

echo 'export PATH=$PATH:/data/wangxingbin/SoftWare/tools/compleasm_kit' >> ~/.bashrc
source ~/.bashrc

依赖于pandas:

pip install pandas

再次检查还有缺少依赖软件没有:

conda activate braker3
braker.pl --checkSoftware

2.5.4. GeneMark-ETP安装:

在Braker3中使用RNA-seq数据和蛋白数据预测基因,都要用到GeneMark-ETP这个软件。但是这个软件不能直接用,需要到GeneMark网站申请和下载对应的密钥文件放在集群用户的家目录中。

安装步骤同 1.3。

-3

-2

申请完成之后获得名称为gm_key_64.gz的密钥文件,解压之后命名为.gm_key(注意点号)并上传到集群用户的家目录下即可。

2.6 测试:

cd /data/wangxingbin/Software/braker-3.0.8/example
wget http://topaz.gatech.edu/GeneMark/Braker/RNAseq.bam
cd /data/wangxingbin/Software/braker-3.0.8/example/tests
./test1.sh
./test2.sh
./test3.sh

运行以上三个脚本后,查看test1.log、test2.log、test3.log有无环境缺失和报错。

3. 总结

至此,全部安装完成,前面能用conda按装的软件基本都是安装在braker3环境中的,其他的需要手动配置环境。

-1

运行就激活braker3环境,然后就可以按照需求运行了

conda activate braker3

可查看官方使用教程,比如RNA-seq和蛋白质数据:

braker.pl --genome=genome.fa --prot_seq=orthodb.fa \
    --rnaseq_sets_ids=SRA_ID1,SRA_ID2 \
    --rnaseq_sets_dirs=/path/to/local/RNA-Seq/files/