Introduction to FASTQ Format/Files
A FASTQ file is a text file that stores the sequence data from clusters that pass the flow cell's filter. Demultiplexing is the first phase in creating a FASTQ file if specimens were multiplexed. Demultiplexing assigns clusters to a specimen based on the index sequence of each cluster (s). The assembled sequences are published to FASTQ files per specimen after demultiplexing. If specimens were not multiplexed, the demultiplexing step is skipped, and all clusters are allocated to a single specimen for each flow cell lane.
FASTQ Processing Tools
Before downstream processing, many analysis pipelines require info manipulation. Simple tasks like observing the first few reads in a file or verifying the distribution of read lengths frequently necessitate scripting or data loading in tools that are quite slow for large databases. When data is re-used in new analyses, these file manipulations become much more common. Individual researchers frequently write scripts to carry out these tasks. FASTQ processing can be done with a variety of tools, including the fastx-toolkit, bio-awk, fastq-tools, fast, seqmagick, and seq-tk. None of them offer a comprehensive series of common manipulations needed for most analyses.
Most FASTQ processing equipment fails to analyze reads with multiple lines of sequence data. Because human readability is greatly lessened by extremely long lines, this is likely to become a problem as read lengths from advanced sequencing technologies continue to increase.
Because bioinformatics pipelines are frequently automated, identifying invalid input is critical. If input errors are not identified early, significant computation and analysis time can be ruined. As a result, a reliable FASTQ manipulation tool should flag invalid files. Similarly, instruments should be capable of processing the entire range of valid inputs correctly.
The fqtools suite was created to meet the demand for fast and reliable viewing, manipulation, and summarization of FASTQ data before it is pre-processed. SAM and BAM-formatted data, as well as compacted and plain FASTQ, can be analyzed. File pairs or interleaved formats are used to handle paired-end sequence data.
FASTQ SV Caller
- Structure variation (deletions, insertions, and duplications) within a human whole-genome dataset compared to the reference genome in a single click.
- Output: Report with a searchable list of variants and their genomic locations, as well as a VCF file containing the results.
FASTQ Reference Upload
- Upload a custom reference FASTA file to EPI2ME for read alignment later with the Custom Alignment workflow.
- Output: Alignment report success
FASTQ Custom Alignment
- By uploading and aligning to a custom FASTA reference, you can tailor sequencing analysis to your particular criteria without the need for complicated bioinformatics pipelines. The minimap2 aligner is used to align reads to a custom FASTA reference that has been uploaded.
- Output: A report detailing the alignment's success, such as the depth of coverage across the reference, alignment accuracies, and the number of reads analyzed per barcode.
FASTQ DNA Control Experiment
- DNA quality control. Before sequencing your own specimens, run a Lambda control experiment to test your nanopore sequencing system.
- Output: A report detailing the length of the sequence, its accuracy, the quality score, and the amount of information produced.
FASTQ RNA Control Experiment
- RNA quality control. Before sequencing your own specimens, it's a good idea to run an RNA control experiment to test out your nanopore sequencing system.
- Output: A report detailing the length of the sequence, its accuracy, the quality score, and the amount of data produced.
About CD Genomics Bioinformatics Analysis
The bioinformatics analysis department of CD Genomics provides novel solutions for data-driven innovation aimed at discovering the hidden potential in biological data, tapping new insights related to life science research, and predicting new prospects.
References
- Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018, 34(17).
- Droop AP. fqtools: an efficient software suite for modern FASTQ file manipulation. Bioinformatics. 2016, 32(12).
- Cock PJ, Fields CJ, Goto N, et al. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic acids research. 2010, 38(6).