top of page

Manipulating the barcode file

Writer's picture: Kim VincentKim Vincent

Updated: Feb 14, 2020

The main goal here is to match the barcodes in the index.fastq file with the barcodes for your project. Sometimes the barcodes in the index file are 12 base pairs (bps) long, sometimes they are 13bps long. The length of the barcode in the barcode file needs to match the length of the barcode in the index file.


There are many ways to manipulate the barcode file, but here is an example of how to process it in Excel so it can be uploaded to the DADA2 pipeline. Remember the barcode files will differ for 16S, 18S, and ITS, so make sure you have the correct file open for each primer.


  1. Open the full barcode file for the run in Excel.

  2. Move the barcode column to the first column and the "#SampleID" to the second.

  3. Remove the # from the front of the Sample ID header so it now reads "SampleID", not "#SampleID".

  4. Sort the sample names.

  5. Delete all rows (samples) from projects other than the one of interest.

  6. If you need to reverse complement, see below to do so now.

  7. Add N to the end of each barcode if they are 12 bps long. Barcodes need to be 13 bps long; some come this way from the sequencing center; others don't. Count the length and add an "N" if they are only 12 bps long.

  • Insert a column of all N's.

  • Concatenate the barcode and the N into a new column. (=CONCATENATE(A2, B2) in Excel)

  • Copy and paste special (values) so the equation doesn't appear in the cell.


Reverse complimenting (18S and ITS)


Generally 16S does not need to be reverse complimented, but ITS and 18S almost always do. The barcodes in the barcode file need to be reverse complimented before they are used for demultiplexing for ITS (and some 18S) runs.

  1. Open the full ITS barcode file for the run in Excel.

  2. Move the barcode column to the first column and the "#SampleID" to the second.

  3. Remove the # from the front of the Sample ID header so it now reads "SampleID", not "#SampleID".

  4. Delete all rows (samples) from projects other than the one of interest.

  5. Delete the last two columns: "LinkerPrimerSequence" and "Description".

  6. REVERSE COMPLIMENT before adding the N to the barcodes. This website reverse compliments whole columns at a time: http://arep.med.harvard.edu/labgc/adnan/projects/Utilities/revcomp.html Copy the entire column of barcodes and paste it into the window. Then click on the reverse compliment button. The output can be copied and directly pasted into an Excel column.

  7. Now add an "N" to the end of each barcode if only 12 bps long.





86 views0 comments

Comments


© 2017 by Kim Vincent 

bottom of page