@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ SoS author is Sebastien MORETTI @ @ moretti.sebastien [AT] gmail.com @ @ Lab. Information Genomique et Structurale - IGS @ @ CNRS - Life Sciences @ @ Marseille, France @ @ http://www.igs.cnrs-mrs.fr/ @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ SoS INSTALLATION: REQUIREMENTS * You must have perl 5.6.1, or better, to use 'SoS.pl'. * Prior to SoS execution, you must install some Perl modules: Getopt::Long DBI File::Which (or use the full path of exonerate) and all dependent modules if needed strict warnings diagnostics lib File::Copy these ones must be part of your Perl distribution They can be found at http://search.cpan.org/, or from a shell try 'perl -MCPAN -e shell' then 'install Module::Name'. You must have root privileges to fully install these modules. ==> The command 'perl -MModule::Name -e 1' allows you to know easily if the module is set. * Exonerate binary must be available in your path or the 'my $exonerate_bin=which('exonerate');' line must be changed (L8 and L22) to reflect your installation. Exonerate is licensed under the LGPL. -> We recommend Exonerate version 1.0.0 (May, 2005). You can download source code or binaries at : http://www.ebi.ac.uk/~guy/exonerate/ * Chromosomes files must be downloaded from the EnsEMBL FTP server. The wget_chr_seq.sh script allow you to quickly get theses files and to set them with the right names. -> We recommend the current EnsEMBL release. * PyMOL is required for molecular visualization of result files. -> We recommend PyMOL version 0.99 . PyMOL is open-source and available for most OS at : http://pymol.sourceforge.net/ OPTIONAL To improve execution time, you can mirror the PDB structure files and/or the EnsEMBL databases (4 databases per species) locally: * Local PDB database mirror All structure files must be in a single directory, or linked to, and must be 'Unix compressed' and written in lowercases: e.g.: pdb1hcl.ent.Z (cf. RCSB FTP server). You need about 6.5 GB to mirror PDB. To see how to set up and maintain a PDB FTP mirror, please refer to : http://www.rcsb.org/pdb/ftpproc.final.html * Local EnsEMBL server SoS sends requests to 4 EnsEMBL databases per species. You need them to set up a local EnsEMBL server for SoS. Example for human species db and EnsEMBL release 40: - snp_mart_40.sql (db core) hsapiens_snp__snp__main.txt.table (db contents) - ensembl_mart_40.sql hsapiens_gene_ensembl__gene__main.txt.table hsapiens_gene_ensembl__snp__dm.txt.table hsapiens_gene_ensembl__xref_pdb__dm.txt.table We recommend a MySQL 4.1 server, or better, to be able to set up local EnsEMBL databases. See http://www.MySQL.com/ and http://dev.MySQL.com/downloads/mysql/. You need about 5 GB for human databases. You should use EnsEMBL dumps to create databases and tables, then import tables contents. They are available, as gzip archives, at: ftp://ftp.ensembl.org/pub/current_mart/data/mysql/ See http://dev.MySQL.com/doc/mysql/en/ and 'mysqldump' chapter to know how to dump databases sql files back into your server. And the 'LOAD DATA INFILE' or 'mysqlimport' chapters to import databases contents. MODIFICATIONS PRIOR TO USE SoS.pl 0. Exonerate and PyMOL must be installed Perl modules required by SoS must be installed too. 1. Uncompress and untar the SoS archive: gunzip SoS_[release].tar.gz | tar xvf - or zcat SoS_[release].tar.gz | tar xvf - 2. Change current directory and enter SoS directory: cd SoS_[release]/ ls history.txt INSTALL loci_from_Exonerate.pm locus_at_ensembl.pm pdb2fas.pm SoS.pl species.pm wget_chr_seq.sh 3. Add execution privileges to SoS.pl and wget_chr_seq.sh: chmod u+x SoS.pl wget_chr_seq.sh 4. Download the chromosomic sequences for species you want to query: wget_chr_seq.sh Species_name it will download chromosomic sequences in the current directory. 5. Edit SoS.pl to satisfy your local settings: - L1 '#!/usr/bin/env perl' must work for most Linux OS If not, change for the path of your perl program - If you don't want to use the File::Which perl module (not recommended), comment L8 and add the full path of exonerate in L22. - L11 MUST reflect the SoS directory location to be able to reach SoS modules use lib '/your/path/to/SoS_x.x.x'; - L21 can be changed if you want results and temporary files be in another directory than current one my $cache="./"; - L24 MUST reflect the directory where chromosomic sequences are. my $db='/my/banks/'; - L28 will be changed by ourselves to follow EnsEMBL databases evolutions. my $ensembl_release=40; (Only release number must be changed, if database structures are kept between releases.) OPTIONAL 6. - If you have a local PDB mirror, change L26 to reflect your local installation. my $local_pdb='/banks/_Structures/PDB/all/pdb/'; - If you want to use local EnsEMBL databases, change L29 and L30 to reflect your local settings. my $host='ensembldb.ensembl.org'; my $user='anonymous'; RUNNING SoS ./SoS.pl will show basic help perldoc SoS.pl will show full documentation ./SoS.pl --pdb=1xxx 2>/dev/null to remove full screen output