#maximal read length #全局配置参数,只要高于这个参数的序列都会被截取到这个长度 max_rd_len=150 #文库配置以[LIB]开头 [LIB] #average insert size #文库插入片段的平均长度,在实际设置时,可以参考文库size分布图,取峰值(默认200) avg_ins=200 #if sequence needs to be reversed #是否需要将序列反向互补,对于pair-end数据,不需要反向互补,设置为0;对于mate-pair数据,需要反向互补,设置为1 reverse_seq=0 #inwhich part(s) the reads are used #1表示只组装contig,2表示只组装scaffold,3表示同时组装contig和scaffold,4表示只补gap asm_flags=3 #use only first 100 bps of each read #序列长度阈值,作用和max_rd_len相同,大于该长度的序列会被切除到该长度 rd_len_cutoff=150 #inwhich order the reads are used while scaffolding #设置不同文库数据的优先级顺序,取值范围为整数,rank值相同的多个文库,在组装scaffold时,会同时使用。 rank=1 #cutoff of pair number for a reliable connection (at least 3 for short insert size) #contig或者scaffold之前的最小overlap个数,对于pair-end数据,默认值为3;对于mate-paird数据,默认值为5 pair_num_cutoff=3 #minimum aligned length to contigs for a reliable read location (at least 32 for short insert size) #比对长度的最小阈值,对于pair-end数据,默认值为32;对于mate-pair数据,默认值为35 map_len=32 #a pair of fastq file, read 1 file should always be followed by read 2 file #过滤后的双端测序数据文件路径,q为fastq格式,f为fasta格式,b为bam格式 q1=/public/home/wlxie/luobuma/luobuma/sample_1_rawdata/Second-generation_sequencing/20211106-BaiYiHuiNeng01/01.rawFq/00.mergeRawFq/1/clean_data/1_r aw_1_val_1.fq q2=/public/home/wlxie/luobuma/luobuma/sample_1_rawdata/Second-generation_sequencing/20211106-BaiYiHuiNeng01/01.rawFq/00.mergeRawFq/1/clean_data/1_r aw_2_val_2.fq
1. Output files from the command "pregraph" a. *.kmerFreq Each row shows the number of Kmers with a frequency equals the row number. Note that those peaks of frequencies which are the integral multiple of 63 are due to the data structure. b. *.edge Each record gives the information of an edge in the pre-graph: length, Kmers on both ends, average kmer coverage, whether it's reverse-complementarily identical and the sequence. c. *.markOnEdge & *.path These two files are for using reads to solve small repeats. e. *.preArc Connections between edges which are established by the read paths. f. *.vertex Kmers at the ends of edges. g. *.preGraphBasic Some basic information about the pre-graph: number of vertex, K value, number of edges, maximum read length etc. 2. Output files from the command "contig" a. *.contig Contig information: corresponding edge index, length, kmer coverage, whether it's tip and the sequence. Either a contig or its reverse complementry counterpart is included. Each reverse complementary contig index is indicated in the *.ContigIndex file. b. *.Arc Arcs coming out of each edge and their corresponding coverage by reads c. *.updated.edge Some information for each edge in graph: length, Kmers at both ends, index difference between the reverse-complementary edge and this one. d. *.ContigIndex Each record gives information about each contig in the *.contig: it's edge index, length, the index difference between its reverse-complementary counterpart and itself. 3. Output files from the command "map" a. *.peGrads Information for each clone library: insert-size, read index upper bound, rank and pair number cutoff for a reliable link. This file can be revised manually for scaffolding tuning. b. *.readOnContig Reads' locations on contigs. Here contigs are referred by their edge index. Howerver about half of them are not listed in the *.contig file for their reverse-complementary counterparts are included already. c. *.readInGap This file includes reads that could be located in gaps between contigs. This information will be used to close gaps in scaffolds if "-F" is set. 4. Output files from the command "scaff" a. *.newContigIndex Contigs are sorted according their length before scaffolding. Their new index are listed in this file. This is useful if one wants to corresponds contigs in *.contig with those in *.links. b. *.links Links between contigs which are established by read pairs. New index are used. c. *.scaf_gap Contigs in gaps found by contig graph outputted by the contiging procedure. Here new index are used. d. *.scaf Contigs for each scaffold: contig index (concordant to index in *.contig), approximate start position on scaffold, orientation, contig length, and its links to others contigs. e. *.gapSeq Gap sequences between contigs. f. *.scafSeq Sequences of each scaffolds. g. *.contigPosInscaff Contigs' positions in each scaffold. h. *.bubbleInScaff Contigs that form bubble structures in scaffolds. Every two contigs form a bubble and the contig with higher coverage will be kept in scaffold. i. *.scafStatistics Statistic information of final scaffold and contig.