.. _developer-guidelines: Interpretation for developers ============================= .. contents:: Contents Here we include some guidelines and interpretation of intermediate results for developing the ``BacterialTyper`` project. .. _ARIBA-explained: ARIBA results description ------------------------- There is a lot of information generated by ARIBA that is common for all databases. Here we listed as an example. * assembled_seqs.fa.gz: reference sequences identified * assemblies.fa.gz: query sequences retrieved from sample * assembled_genes.fa.gz: encoding genes from assemblies.fa * debug.report.tsv: initial report.tsv before filtering * log.clusters.gz: log details for each cluster * report.tsv: report for each sample * version_info.txt: version and additional information File :file:`devel/info/ARIBA_explained.csv` contains the description of the columns in the ARIBA result file generated. .. csv-table:: :widths: 4,25 :header: "Column", "Description" :file: info/ARIBA_explained.csv .. _Prokka-output-files: Prokka output files description ------------------------------- File :file:`devel/info/prokka_output_files.csv` contains the description of the different output files generated by Prokka_. See additional details in: https://github.com/tseemann/prokka#output-files> .. csv-table:: :widths: 4,25 :header: "Extension", "Description" :file: info/prokka_output_files.csv .. _snippy-output-files: Snippy output files description ------------------------------- File :file:`devel/info/snippy_output_files.csv` contains the description of the different output files generated by Snippy_. See additional details in: https://github.com/tseemann/snippy#output-files> .. csv-table:: :widths: 4,25 :header: "Extension", "Description" :file: info/snippy_output_files.csv .. _PhiSpy-documentation: PhiSpy ------ PhiSpy_ identifies prophages in Bacterial (and probably Archaeal) genomes. Given an annotated genome it will use several approaches to identify the most likely prophage regions. .. _PhiSpy-training-sets: PhiSpy training Sets available ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File :file:`devel/info/PhiSpy_training-sets.txt` contains the description of the different training sets available for bacteriophage analysis using PhiSpy_. .. csv-table:: :widths: 10,15,15,15,15,20 :header-rows: 1 :file: info/PhiSpy_training-sets.txt .. _PhiSpy-results: PhiSpy results ^^^^^^^^^^^^^^ Results generated by PhiSpy are text files containing the annotation and information regarding the identified inserted bacteriophages. There are some limitations and we implemented some improvements for a better clarification and interpretation. Original results """""""""""""""" See original details in: https://github.com/linsalrob/PhiSpy#output-files There are several files generated: * prophage.tbl: This file has two columns separated by tabs [id, location]. The id is in the format: pp_number, where number is a sequential number of the prophage (starting at 1). Location is in the format: contig_start_stop that encompasses the prophage. * prophage_tbl.tsv: This is a tab seperated file. The file contains all the genes of the genome. The tenth colum represents the status of a gene. If this column is 1 then the gene is a phage like gene; otherwise it is a bacterial gene. This file has 16 columns: * (i) fig_no: the id of each gene; * (ii) function: function of the gene; * (iii) contig; * (iv) start: start location of the gene; * (v) stop: end location of the gene; * (vi) position: a sequential number of the gene (starting at 1); * (vii) rank: rank of each gene provided by random forest; * (viii) my_status: status of each gene based on random forest; * (ix) pp: classification of each gene based on their function; * (x) Final_status: the status of each gene. For prophages, this column has the number of the prophage as listed in prophage.tbl above; If the column contains a 0 we believe that it is a bacterial gene. * (xi) start of attL; * (xii) end of attL; * (xiii) start of attR; * (xiv) end of attR; * (xv) sequence of attL; * (xvi) sequence of attR. * prophage_coordinates.tsv: This file has the prophage ID, contig, start, stop, and potential att sites identified for the phages. * prophage.gff3: Gene Feature Format file (v3) containing the annotation of the phages identified. This is a contribution that ``BacterialTyper`` developer (Jose F. Sanchez-Herrero) pulled to original PhiSpy code: * https://github.com/linsalrob/PhiSpy/PhiSpyModules/writers.py * https://github.com/linsalrob/PhiSpy/pull/10 * testSet.txt: Results of the Shannon score generated during the makeTest module of PhiSpy and necessary for the following randomforest classifier. * classify.tsv: Results of the randomforest classifier call within the classification module of PhiSpy. Modified results """""""""""""""" All original files generated are named independently of the sample name as `prophage` or `classify`. Also, some samples are not necessary for a regular user to interpret results and obtain the number of prophage regions and details. Within ``BacterialTyper``, we rename original PhiSpy result files according to sample names provided and as some tab files do not contain headers, we generate either tab files with headers and a summary excel files for a better interpretation and integration of results. File conversion: - prophage_tbl.tsv: Rename it to 'SampleName'_PhiSpy-classification_genes.tsv Include it in a summary excel file. - prophage.gff3: Rename it to 'SampleName'_PhiSpy-prophage.gff3 - prophage_coordinates.tsv: Rename it to 'SampleName'_PhiSpy-prophage-coordinates.tsv' Add header containing the following columns: * prophage_ID * Contig * Start * End * attL_Start * attL_End * attR_Start * attR_End * attL_Seq * attR_Seq * Longest_Repeat_flanking_phage Include it in a summary excel file. - prophage.tbl: Move it to a temporary folder generated. Redundant information - classify.tsv: Move it to a temporary folder generated - testSet.txt: Move it to a temporary folder generated - Additional excel file: 'SampleName'_bacteriophage_summary.xlsx .. include:: ../links.inc