For BGC website, see: Bayesian Genomic Clines
The genotype data for parental and admixed populations for BGC is in the form of counts of alleles per locus. A separate file for each population is needed. First convert your data to structure format and read into R as an adegenet object. Then convert genind to genpop to obtain allele counts.
mygenind <- import2genind('input.str', onerowperind=F, n.ind=10, n.loc=20, row.marknames=1, col.lab=1, col.pop=2, ask=F)
mygenpop <- genind2genpop(mygenind)
mygenpop@tab contains a table of allele counts per locus and per population. Now transpose this table, save it, then read it back (probably not necessary, but I hate dealing with atomic vector errors).
write.table(mygenpop@tab, 'AllCountTable.txt', quote=F, sep='\t')
ct <- read.table('AllCountTable.txt', header=T)
Allele pop1 pop2
1 L0001.1 7 3
2 L0001.2 3 1
3 L0002.1 10 4
4 L0002.2 0 0
We would like to retain the locus names from the first column (ct$Allele), which R probably does not recognize as a character vector yet. Let’s get that out of the way first.
ctall <- as.character(ct$Allele)
Split the locus names from the composite loc+allele names.
ctall_split <- lapply(strsplit(ctall, '.', fixed=TRUE), '[[', 1)
We will need to get rid of duplicate entries. We are interested in only every other entry.
locnames <- ctall_split[c(TRUE, FALSE),]
Now select only odd rows (containing allele 1 count at each locus in each pop), then even rows. This assumes that you do not have any monomorphic sites in your data. If you do, this and the previous steps will generate errors. Make sure you have exactly twice as many rows as you have loci. Should you find odd number of rows, that’s a clear indication that you have at least one locus with only one allele in your data. You will need to recreate structure file to get rid of such loci before proceeding again.
ct_odd <- ct[seq(1, length(ct$Allele), 2), ]
ct_evn <- ct[seq(2, length(ct$Allele), 2), ]
At this point, we have all the data we need i.e. locus names and allele counts in each population. I will demonstrate putting all this information together for one population.
1. Create an vector of length equal to number of loci and fill it with any symbol.
filler <- rep('#', length(locnames))
2. Create a new data frame by cbinding all components together.
df <- data.frame(locnames, filler, ct_odd$pop1, ct_evn$pop1)
locnames filler pop1 pop1
1 loc01 # 7 3
2 loc02 # 8 7
3 loc03 # 3 5
4 loc04 # 4 2
5 loc05 # 2 6
4. Save this table
write.table(df, 'pop1_allct_bgc.txt', row.names=F, col.names=F, quote=F, sep='\t')
5. Finally, open the saved table in
vi and perform this final operation:
You can do
^M by first pressing
Ctrl-V, then quickly hitting
That’s it. Check the file to make sure everything looks ok.
At this point, you should be done coding the parental population files. Remember, only two parental populations are allowed. Thus, if you have data from two species with multiple pops each, you will need to collapse individual pops into a composite population for each species.
In the next blog post, I will summarize creating allele count data files for admixed populations, the format for which is somewhat different. Stay tuned.