Readme for: *Data files associated with "Plasmid fitness costs are caused by specific genetic conflicts" by Hall, Wright, Harrison, Muddiman, Wood, Paterson, and Brockhurst*
================

April 2021, updated July 2021, updated September 2021

Full analysis scripts associated with this raw data can be found at [github.com/jpjh/COMPMUT](github.com/jpjh/COMPMUT).

[Now published in PLoS Biology](https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001225):

Hall, J. P. J., Wright, R. C. T., Harrison, E., Muddiman, K. J., Jamie Wood, A., Paterson, S., & Brockhurst, M. A. (2021). Plasmid fitness costs are caused by specific genetic conflicts enabling resolution by compensatory mutation. *PLoS Biology*, *19*(10), e3001225. https://doi.org/10.1371/journal.pbio.3001225

### Experimental measurements of relative fitness, plasmid cost, etc

- [`COMPMUT_exp_data_1.csv`](COMPMUT_exp_data_1.csv). Competition data from knockout strains. Columns indicate experimental replicate (`experiment`), transconjugant replicate (`replicate`), host genotype (`host`), plasmid variant (`plasmid`), whether measurement was from the start or the end of the competition (`timepoint`), dilution factor (`dilution`), volume spread in µl (`spread`) and the counts of white and blue colonies on that plate (`count_white`, `count_blue`).
- [`COMPMUT_exp_data_2.csv`](COMPMUT_exp_data_2.csv). Competition data from plasmid mutants. Columns are the same as for `COMPMUT_exp_data_1.csv`, except `marker` indicates the antibiotic marker used in the plasmid-bearing strain. Gm = gentamicin-resistant, Sm = streptomycin resistant.
- [`COMPMUT_exp_data_3.csv`](COMPMUT_exp_data_3.csv). Competition data from knockouts and plasmid mutants. Columns are the same as `COMPMUT_exp_data_1.csv`.
- [`COMPMUT_exp_data_3-1.csv`](COMPMUT_exp_data_3-1.csv). Conjugation data from knockouts and plasmid mutants. Columns are the same as from `COMPMUT_exp_data_1.csv` except `recipient` refers to the species of the recipient (Pf = *P. fluorescens*, Pp = *P. putida*), and `timepoint` explains not only whether the plate contained start or end counts, but also what it was selecting for (DR = donors and recipients, T = transconjugants). 
- [`COMPMUT_exp_data_4.csv`](COMPMUT_exp_data_4.csv). Competition data from expressing PQBR57_0059 variants *in trans*. Columns are the same as `COMPMUT_exp_data_1.csv` , except `exp_rep` refers to experimental replicate and `tc_rep` refers to transconjugant replicate. `plasmid_pucp` refers to the PQBR57_0059 allele on the pUCP18 variant, whereas `plasmid_pq` refers to the pQBR57 variant.
- [`COMPMUT_exp_data_5.csv`](COMPMUT_exp_data_5.csv). Competition data from expressing pQBR57 *par* genes *in trans* alongside pQBR57. Columns are the same as `COMPMUT_exp_data_4.csv` . [`COMPMUT_exp_data_5-1.csv`](COMPMUT_exp_data_5-1.csv) is similar but from expressing alongside pQBR103.
- [`COMPMUT_exp_data_6.csv`](COMPMUT_exp_data_6.csv). Count data from medium-term cultures of different plasmid variants in soil. Data was collected by selective plating and replica plating. `replicate` indicates different transconjugants; `variant` is an abbreviated name referring to the population from which the clone came; `marker` indicates the antibiotic marker of the plasmid-bearing strain (Gm = plasmid initially  in the gentamicin-resistant background; Sm = plasmid initially in the streptomycin-resistant *lacZ* background); `media` indicates the selective media (KB = no antibiotics, Gm3 = gentamicin at 3 µg/ml, Sm50 = streptomycin at 50 µg/ml); `dilution` indicates the dilution factor; `spread` indicates the volume spread in µl; `count_white` indicates the number of white (gentamicin-resistant) colonies; `count_blue` indicates the number of blue (streptomycin-resistant) colonies; `hg_kept` indicates the number of colonies from the originally-plasmid-free strain that had become mercury resistant; `hg_lost` indicates the number of colonies from the originally-plasmid-bearing strain that had become mercury sensitive.
- [`COMPMUT_exp_data_7.csv`](COMPMUT_exp_data_7.csv). Growth curve data. `exp` refers to the plate in which the experiment was performed; `well` refers to plate well; `cycle` is the measurement cycle; `time` is time in seconds; `t_rep` is transformant replicate; `exp_rep` is experimental replicate; `pME6032` is the insert (if any); `IPTG` is IPTG concentration (0 µM or 100 µM); `OD600_corr` is the corrected OD~600~ (OD~600~ minus the mean blank well measurement).
- [`COMPMUT_exp_data_8.csv`](COMPMUT_exp_data_8.csv). pQBR57 par segregation data. `code` is a shorthand for each experimental condition, `timepoint` is transfer number, `replicate` is independent transconjugant/transformant replicate, `par` is whether pUCP18 was empty or whether it carried pQBR57 par, `plasmid` is whether it carried pQBR57 or pQBR103, `spread` is volume of dilution spread on plate, `dilution` is (log10) dilution factor), `count` is total colonies counted, `count_hgs` is sensitive colonies as assessed by replica plating.
- [`COMPMUT_exp_data_9.csv`](COMPMUT_exp_data_9.csv). Ectopic PFLU4242 expression in ∆gacS strains growth curve data. `well` is the well in the plate, `cycle` is measurement cycle (measurements taken every 15 mins), `replicate` is independent transformant/transconjugant replicate, `pME6032` is whether pME6032 was empty or whether it carried PFLU4242, `plasmid` is the pQBR plasmid carried, `IPTG` is whether or not cells were induced with IPTG, `OD600_corr` is the OD600 reading for that well, with blank values subtracted.

### RNAseq analysis

- [`COMPMUT_RNAseq_1_de_table_chr_vanc.csv`](COMPMUT_RNAseq_1_de_table_chr_vanc.csv). Differential expression of chromosomal genes relative to the plasmid-free, unameliorated ancestral strain, as determined by edgeR. Data is in 'wide' format, with rows referring to PFLU locus tags, and columns referring to each treatment. For each treatment, there is a column for log~2~ fold-change (`logFC`), log~2~ counts-per-million-per-gene (`logCPM`), test statistic for the quasi-likelihood F test (`F`), associated p-value (`PValue`), and Benjamini-Hochberg adjusted p-value (`FDR`). 
- [`COMPMUT_RNAseq_2_de_table_chr_vanc.csv`](COMPMUT_RNAseq_2_de_table_chr_comp.csv). Differential expression of chromosomal genes relative to the plasmid-bearing, unameliorated ancestral strain, as determined by edgeR. Columns are similar to `COMPMUT_RNAseq_1_de_table_chr_vanc.csv`.
- [`COMPMUT_RNAseq_3_de_table_pQ57.csv`](COMPMUT_RNAseq_3_de_table_pQ57.csv). Differential expression of pQBR57 genes, relative to the unameliorated condition. Columns are similar to `COMPMUT_RNAseq_1_de_table_chr_vanc.csv`.
- [`COMPMUT_RNAseq_4_de_table_pQ103.csv`](COMPMUT_RNAseq_4_de_table_pQ103.csv). Differential expression of pQBR103 genes, relative to the unameliorated condition. Columns are similar to `COMPMUT_RNAseq_1_de_table_chr_vanc.csv`.
- [`COMPMUT_RNAseq_5_tpm.csv`](COMPMUT_RNAseq_5_tpm.csv). Table with values of mean transcripts-per-million (TPM) for each gene, averaged over replicates. Column `pc` refers to whether the gene is on the plasmid (`pla`) or chromsome (`chr`); `amelioration` refers to the amelioration; `plasmid` refers to the plasmid, `trt` is a code referring to the combination of plasmid and amelioration; `locus_tag` refers to the locus tag for the corresponding GenBank file; `group` is a combination of the `amelioration` and `plasmid` columns; `tpm` is mean transcripts-per-million across replicates; `n` gives number of replicates; `se` is standard error; `ci` is 95% confidence intervals; and `log2tpm` is a log~2~ transformation of `tpm`.

### Compensatory mutations from previous studies

- [`COMPMUT_mutations.csv`](COMPMUT_mutations.csv). Compensatory mutations aggregated from published studies. Note: data was not available for all columns for all mutations. 

### Predicted RsmA binding sites

- [`COMPMUT_RsmA_prediction.txt`](COMPMUT_RsmA_prediction.txt). Predicted RsmA binding sites in the *P. fluorescens* SBW25 chromosome. Output from running the script `CSRA_TARGET.pl` from [Kulkarni et al. 2014](http://dx.doi.org/10.1093/nar/gku309).