# this file is intended to hold all valid attrib_type # table entries for all ensembl databases that we release # # If you use the provided upload script, commentlines and # emptry lines should be automatically removed, all # other lines should contain tab delimited database entries # for the attrib_type table # each attribute type should be preceeded with a comment that # describes its uses, unless its description field is deemed to be # expressive enough # need to document and find out about each attrib_type 1 embl_acc EMBL accession 2 status Status 3 synonym Synonym 4 name Name Alternative/long name 5 type Type of feature # A seq_region that is not represented in a more global coordinate system # should get the toplevel attribute and value 1 # If you have more than one assembly in you database, this feature will # not work as expected. You should then explicitly request features in a specific # cordinate system 6 toplevel Top Level Top Level Non-Redundant Sequence Region # The number of genes on each seq_region is counted and stored under this # seq_region_attribute to be displayed on mapview. Mainly web code uses this. 7 GeneCount Gene Count Total Number of Genes # Same as above for known genes 8 KnownGeneCount Known Gene Count Total Number of Known Genes # same as above for pseudogenes. The criteria for a pseudogene is, # that the gene.type fieled matches /pseudogene/ 9 PseudoGeneCount PseudoGene Count Total Number of PseudoGenes # Snps on a seq_region. See above. 10 SNPCount SNP Count Total Number of SNPs # another seq_region attribute. When a seq_region should be used with a # different codon table this attrbutes value should contain its number. # This is a bioperl codon table, find out from there which number to use # for your seq_region # Useful for Mitochondrium and Bacteria with non standard codon tables 11 codon_table Codon Table Alternate codon table # This is an attribute for a translation. Values describe start and end # position of a seelnocystein in a Translation (Amino Acid coordinates) # Example: "123 123 U". This is the general sequence edit format. # Other attributess with sequence edits for different reasons will come # up in the future 12 _selenocysteine Selenocysteine 13 bacend bacend # Contains the htg phase for clones. 14 htg htg High Throughput phase attribute 15 miRNA Micro RNA Coordinates of the mature miRNA # A sequence region that you consider not part of the reference genome should # be tagged as non_ref in seq_region_attrib. Chromosome 6 haplotypes in human # are exmaples of that. 16 non_ref Non Reference Non Reference Sequence Region 17 sanger_project Sanger Project name 18 clone_name Clone name 19 fish FISH location 21 org Sequencing centre 22 method Method 23 superctg Super contig id 24 inner_start Max start value 25 inner_end Min end value 26 state Current state of clone 27 organisation Organisation sequencing clone 28 seq_len Accession length 29 fp_size FP size 30 BACend_flag BAC end flags # used by Vega web code to link WebFPC 31 fpc_clone_id fpc clone # additional gene counts for Vega (see GeneCount for general description) 32 KnwnPCCount protein_coding_KNOWN Number of Known Protein Coding 33 NovPCCount protein_coding_NOVEL Number of Novel Protein Coding 34 NovPTCount processed_transcript_NOVEL Number of Novel Processed Transcripts 35 PutPTCount processed_transcript_PUTATIVE Number of Putative Processed Transcripts 36 PredPCCount protein_coding_PREDICTED Number of Predicted Protein Coding 37 IGGeneCount IG_gene Number of IG Genes 38 IGPsGenCount IG_pseudogene Number of IG Pseudogenes 39 TotPsCount total_pseudogene Total Number of Pseudogenes #40 KnwnProcPsCount processed_pseudogene Number of Known Processed Pseudogenes #41 KnwnUnPsCount unprocessed_pseudogene Number of Known Unprocessed Pseudogenes 42 KnwnPCProgCount protein_coding_in_progress_KNOWN Number of Known Protein Coding in progress 43 NovPCProgCount protein_coding_in_progress_NOVEL Number of Novel Protein Coding in progress # Vega annotation stats 44 AnnotSeqLength Annotated sequence length Annotated Sequence 45 TotCloneNum Total number of clones Total Number of Clones 46 NumAnnotClone Fully annotated clones Number of Fully Annotated Clones # Acknowledgements for manual annotation of this seq_region 47 ack Acknowledgement Acknowledgement for manual annotation # old clone attribute 48 htg_phase High throughput phase High throughput genomic sequencing phase 49 description Description A general descriptive text attribute 50 chromosome Chromosome Chromosomal location for supercontigs that are not assembled 51 nonsense Nonsense Mutation Strain specific nonesense mutation # misc Vega attribs 52 author Author Group resonsible for Vega annotation 53 author_email Author email address Author email address 54 remark Remark Annotation remark 55 transcr_class Transcript class Transcript class 56 KnwnPTCount processed_transcript_KNOWN Number of Known Processed Transcripts 57 ccds CCDS CCDS identifier # label frameshifts modelled as short (1,2,4,5 bp) introns 59 Frameshift Frameshift Frameshift modelled as intron #more gene counts for Vega 60 PTCount processed_transcript Number of Processed Transcripts 61 PredPTCount processed_transcript_PREDICTED Number of Predicted Processed Transcripts 62 ncRNA Structure RNA secondary structure line 63 skip_clone skip clone Skip clone in align_by_clone_identity.pl # Gene counts for seq_region_stats.pl script 64 GeneNo_knwCod known protein_coding Gene Count Number of known protein_coding Genes 65 GeneNo_novCod novel protein_coding Gene Count Number of novel protein_coding Genes 66 GeneNo_rRNA rRNA Gene Count Number of rRNA Genes 67 GeneNo_pseudo pseudogene Gene Count Number of pseudogene Genes 68 GeneNo_snRNA snRNA Gene Count Number of snRNA Genes 69 GeneNo_snoRNA snoRNA Gene Count Number of snoRNA Genes 70 GeneNo_miRNA miRNA Gene Count Number of miRNA Genes 71 GeneNo_mscRNA misc_RNA Gene Count Number of misc_RNA Genes 72 GeneNo_scRNA scRNA Gene Count Number of scRNA Genes 73 GeneNo_MTrRNA Mt_rRNA Gene Count Number of Mt_rRNA Genes 74 GeneNo_MTtRNA Mt_tRNA Gene Count Number of Mt_tRNA Genes 75 GeneNo_RNA_pseu scRNA_pseudogene Gene Count Number of scRNA_pseudogene Genes 76 GeneNo_tRNA tRNA Gene Count Number of tRNA Genes 77 GeneNo_rettran retrotransposed Gene Count Number of retrotransposed Genes 78 GeneNo_snlRNA snlRNA Gene Count Number of snlRNA Genes 79 GeneNo_proc_tr processed_transcript Gene Count Number of processed transcript Genes 80 supercontig SuperContig name NULL 81 well_name Well plate name NULL # Added by fc1 26/11/06 82 bacterial Bacterial 83 NovelCDSCount Novel CDS Count 84 NovelTransCount Novel Transcript Count 85 PutTransCount Putative Transcript Count 86 PredTransCount Predicted Transcript Count 87 UnclassPsCount Unclass Ps count 88 KnwnprogCount Known prog Count 89 NovCDSprogCount Novel CDS prog count 90 bacend_well_nam BACend well name 91 alt_well_name Alt well name 92 TranscriptEdge Transcript Edge 93 alt_embl_acc Alt EMBL acc 94 alt_org Alt org # anacode attribs added by ml6 29/11/06 - seen in yeast but not others 95 intl_clone_name International Clone Name 96 embl_version EMBL Version 97 chr Chromosome Name Chromosome Name Contained in the Assembly 98 equiv_asm Equivalent EnsEMBL assembly For full chromosomes made from NCBI AGPs 99 GeneNo_ncRNA ncRNA Gene Count Number of ncRNA Genes # Ig segment gene counts for seq regions stats script ds5 2/2/07 100 GeneNo_Ig Ig Gene Count Number of Ig Genes # cat missing atts 109 HitSimilarity hit similarity percentage id to parent transcripts 110 HitCoverage hit coverage coverage of parent transcripts 111 PropNonGap proportion non gap proportion non gap 112 NumStops number of stops 113 GapExons gap exons number of gap exons 114 SourceTran source transcript source transcript 115 EndNotFound end not found end not found 116 StartNotFound start not found start not found 117 Frameshift Fra Frameshift modelled as intron # Other Vega attribs 118 ensembl_name Ensembl name Name of equivalent Ensembl chromosome 119 NoAnnotation NoAnnotation Clones without manual annotation 120 hap_contig Haplotype contig Contig present on a haplotype # loutre attribs added by ml6 121 annotated Clone Annotation Status 122 keyword Clone Keyword 123 hidden_remark Hidden Remark 124 mRNA_start_NF mRNA start not found 125 mRNA_end_NF mRNA end not found 126 cds_start_NF CDS start not found 127 cds_end_NF CDS end not found 128 write_access Write access for Sequence Set 1 for writable , 0 for read-only 129 hidden Hidden Sequence Set # loutre attribs for vega production (st3) 130 vega_name Vega name Vega seq_region.name 131 vega_export_mod Export mode E (External), I (Internal) etc 132 vega_release Vega release Vega release number # loutre attribs for assembly_tags (ck1) 133 atag_CLE Clone_left_end Clone_lef_end feature marked in GAP database 134 atag_CRE Clone_right_end Clone_right_end feature marked in GAP database 135 atag_Misc Misc miscellaneous feature marked in GAP database 136 atag_Unsure Unsure region of uncertain DNA sequence marked in GAP database 137 MultAssem Multiple Assembled seq region Part of Seq Region is part of more than one assembly 140 wgs WGS contig WGS contig integrated into the map 141 bac AGP clones tiling path of clones # Attribute for per-gene GC percentage 142 GeneGC Gene GC Percentage GC content for this gene # vega 143 TotAssemblyLeng Finished sequence length Length of the assembly not counting sequence gaps # Drosophila, only where the translation provided by flybase differs from that in our database by ONE amino acid 144 amino_acid_sub Amino acid substitution Some translations have been manually curated for amino acid substitiutions. For example a stop codon may be changed to an amino acid in order to prevent premature truncation, or one amino acid can be substituted for another. # Drosophila. Sometimes sequences have been manually altered to remove one base, and this alters the whole translation 145 _rna_edit rna_edit RNA edit #genebuild - databases of removed transcripts 146 kill_reason Kill Reason Reason why a transcript has been killed 147 strip_UTR Strip UTR Transcript needs bad UTR removing # vega 148 TotAssLength Finished sequence length Finished Sequence 149 PsCount pseudogene Number of Pseudogenes #150 KnwnPsCount known_pseudogene Number of Known Pseudogenes #151 KnwnTPsCount known_transcribed_pseudogene Number of Known Transcribed Pseudogenes 152 TotPTCount total_processed_transcript Total Number of Processed Transcripts 153 TotPCCount total_protein_coding Total Number of Protein Coding 154 NovNcCount novel_non_coding Number of Novel Non Coding 155 KnwnPolyPsCount known_polymorphic Number of Known Polymorphic Pseudogenes 156 PolyPsCount polymorphic_pseudogene Number of Polymorphic Pseudogenes 157 TotIGGeneCount total_IG_gene Total Number of IG Genes 158 ProcPsCount proc_pseudogene Number of Processed Pseudogenes 159 UnPsCount unproc_pseudogene Number of Unprocessed Pseudogenes 160 TPsCount transcribed_pseudogene Number of Transcribed Pseudogenes 161 TECCount TEC Number of TEC Genes 162 KnwnIGGeneCount IG_gene_KNOWN Number of Known IG Genes 163 KnwnIGPsGeCount IG_pseudogene_KNOWN Number of Known IG Pseudogenes #pepstats attributes: will be calculated by release coordinator for the protview page 164 IsoPoint Isoelectric point Pepstats attributes 165 Charge Charge Pepstats attributes 166 MolecularWeight Molecular weight Pepstats attributes 167 NumResidues Number of residues Pepstats attributes 168 AvgResWeight Ave. residue weight Pepstats attributes #old attribute used by the API to translate the sequence 170 initial_met Initial methionine Set first amino acid to methionine #added for procavia capensis in release 51 171 NonGapHCov NonGapHCov # attribute that shows if a supporting evidence was also used to build a Vega gene model 172 otter_support otter support Evidence ID that was used as supporting feature for building a gene in Vega # Temporary attrib using during the merge of the havana and ensembl gene sets to stablish a relation between a # havana transcript that shares CDS with an Ensembl transcript but the have different UTR structure 173 enst_link enst link Code to link a OTTT with an ENST when they both share the CDS of ENST # Attribute that show the start genomic position of an alternative ATG found for a transcript. Checks are made # up to 200 bases upstream for genes with no UTR, and for genes with UTR is made up to 200 bases of the UTR sequence # or to the max extent of the UTR if this is shorter than 200. 174 upstream_ATG upstream ATG Alternative ATG found upstream of the defined as start ATG for the transcript #more vega gene types 175 TPPsCount transcribed_processed_pseudogene Number of Transcribed Processed Pseudogenes 176 TUPsCount transcribed_unprocessed_pseudogene Number of Transcribed Unprocessed Pseudogenes 177 UniPsCount unitary_pseudogene Number of Unitary Pseudogenes 178 KnwnTECCount TEC_KNOWN Number of Known TEC genes 179 TotTECGeneCount TEC_all Total number of TEC genes 180 TUyPsCount transcribed_unitary_pseudogene Number of Transcribed Unitary Pseudogenes 181 PolyCount polymorphic Number of Polymorphic Genes 182 KnwnPolyCount polymorphic Number of Known Polymorphic Genes 183 KnwnTRCount TR_gene_known Number of Known TR Genes 184 TRGeneCount TR_gene Number of TR Genes 185 TRPsCount TR_pseudo Number of TR Pseudogenes # attribute that shows if a supporting evidence was also used to build a # Vega gene model 186 tp_ott_support otter protein transcript support Evidence ID that was used as supporting feature for building a gene in Vega 187 td_ott_support otter dna transcript support Evidence ID that was used as supporting feature for building a gene in Vega 188 ep_ott_support otter protein exon support Evidence ID that was used as supporting feature for building a gene in Vega 189 ed_ott_support otter dna exon support Evidence ID that was used as supporting feature for building a gene in Vega # Attributes like split_tscript, incons_strands, incons_phases, # is_folded, unwanted_evidence, exon_too_long, contains_stops, # borked_coords, low_complex, and evi_coverage are added by # BlastMiniGenewise in the build process. They can be deleted out of # transcript_attrib and attrib_type table 190 GeneNo_lincRNA lincRNA Gene Count Number of lincRNA Genes # StopGained and StopLost are transcript attributes. 191 StopGained SNP causes stop codon to be gained This transcript has a variant that causes a stop codon to be gained in at least 10 percent of a HapMap population 192 StopLost SNP causes stop codon to be lost This transcript has a variant that causes a stop codon to be lost in at least 10 percent of a HapMap population # For Ensembl Genomes dictyBase (2010-02-16). # (code 193 should have a trailing '_' as it is auto-generated) 193 GeneNo_class_I_ class_I_RNA Gene Count Number of class_I_RNA Genes 194 GeneNo_SRP_RNA SRP_RNA Gene Count Number of SRP_RNA Genes 195 GeneNo_class_II class_II_RNA Gene Count Number of class_II_RNA Genes 196 GeneNo_P_RNA RNase_P_RNA Gene Count Number of RNase_P_RNA Genes 197 GeneNo_RNase_MR RNase_MRP_RNA Gene Count Number of RNase_MRP_RNA Genes # Intron loss due to a frameshift on the query genome 198 lost_frameshift lost_frameshift Frameshift on the query sequence is lost in the target sequence #ÊAttributes added as gene_attribs to Ensembl genes that fall within a genomic region where a LRG record exists 216 GeneInLRG Gene in LRG This gene is contained within an LRG region 217 GeneOverlapLRG Gene overlaps LRG This gene is partially overlapped by a LRG region (start or end outside LRG)