# this file is intended to hold all valid attrib_type # table entries for all ensembl databases that we release # # If you use the provided upload script, commentlines and # emptry lines should be automatically removed, all # other lines should contain tab delimited database entries # for the attrib_type table # each attribute type should be preceeded with a comment that # describes its uses, unless its description field is deemed to be # expressive enough # need to document and find out about each attrib_type 1 embl_acc EMBL accession 2 status Status 3 synonym Synonym 4 name Name Alternative/long name 5 type Type of feature # A seq_region that is not represented in a more global coordinate system # should get the toplevel attribute and value 1 # If you have more than one assembly in you database, this feature will # not work as expected. You should then explicitly request features in a specific # cordinate system 6 toplevel Top Level Top Level Non-Redundant Sequence Region # The number of genes on each seq_region is counted and stored under this # seq_region_attribute to be displayed on mapview. Mainly web code uses this. 7 GeneCount Gene Count Total Number of Genes # Same as above for known genes 8 KnownGeneCount Known Gene Count Total Number of Known Genes # same as above for pseudogenes. The criteria for a pseudogene is, # that the gene.type fieled matches /pseudogene/ 9 PseudoGeneCount PseudoGene Count Total Number of PseudoGenes # Snps on a seq_region. See above. 10 SNPCount SNP Count Total Number of SNPs # another seq_region attribute. When a seq_region should be used with a # different codon table this attrbutes value should contain its number. # This is a bioperl codon table, find out from there which number to use # for your seq_region # Useful for Mitochondrium and Bacteria with non standard codon tables 11 codon_table Codon Table Alternate codon table # This is an attribute for a translation. Values describe start and end # position of a seelnocystein in a Translation (Amino Acid coordinates) # Example: "123 123 U". This is the general sequence edit format. # Other attributess with sequence edits for different reasons will come # up in the future 12 _selenocysteine Selenocysteine 13 bacend bacend # Contains the htg phase for clones. 14 htg htg High Throughput phase attribute 15 miRNA Micro RNA Coordinates of the mature miRNA # A sequence region that you consider not part of the reference genome should # be tagged as non_ref in seq_region_attrib. Chromosome 6 haplotypes in human # are exmaples of that. 16 non_ref Non Reference Non Reference Sequence Region 17 sanger_project Sanger Project name 18 clone_name Clone name 19 fish FISH location 21 org Sequencing centre 22 method Method 23 superctg Super contig id 24 inner_start Max start value 25 inner_end Min end value 26 state Current state of clone 27 organisation Organisation sequencing clone 28 seq_len Accession length 29 fp_size FP size 30 BACend_flag BAC end flags # used by Vega web code to link WebFPC 31 fpc_clone_id fpc clone # additional gene counts for Vega (see GeneCount for general description) 32 KnwnPCCount protein_coding_KNOWN Number of Known Protein Coding 33 NovPCCount protein_coding_NOVEL Number of Novel Protein Coding 34 NovPTCount processed_transcript_NOVEL Number of Novel Processed Transcripts 35 PutPTCount processed_transcript_PUTATIVE Number of Putative Processed Transcripts 36 PredPCCount protein_coding_PREDICTED Number of Predicted Protein Coding 37 NovIGGeneCount IG_gene_NOVEL Number of Novel IG Genes 38 NovIGPsGenCount IG_pseudogene_NOVEL Number of Novel IG Pseudogenes 39 TotPsCount total_pseudogene Total Number of Pseudogenes 40 KnwnProcPsCount processed_pseudogene Number of Known Processed Pseudogenes 41 KnwnUnPsCount unprocessed_pseudogene Number of Known Unprocessed Pseudogenes 42 KnwnPCProgCount protein_coding_in_progress_KNOWN Number of Known Protein Coding in progress 43 NovPCProgCount protein_coding_in_progress_NOVEL Number of Novel Protein Coding in progress # Vega annotation stats 44 AnnotSeqLength Annotated sequence length Annotated Sequence 45 TotCloneNum Total number of clones Total Number of Clones 46 NumAnnotClone Fully annotated clones Number of Fully Annotated Clones # Acknowledgements for manual annotation of this seq_region 47 ack Acknowledgement Acknowledgement for manual annotation # old clone attribute 48 htg_phase High throughput phase High throughput genomic sequencing phase 49 description Description A general descriptive text attribute 50 chromosome Chromosome Chromosomal location for supercontigs that are not assembled 51 nonsense Nonsense Mutation Strain specific nonesense mutation # misc Vega attribs 52 author Author Group resonsible for Vega annotation 53 author_email Author email address Author email address 54 remark Remark Annotation remark 55 transcr_class Transcript class Transcript class 56 KnwnPTCount processed_transcript_KNOWN Number of Known Processed Transcripts 57 ccds CCDS CCDS identifier # make first amino acid methionine 58 initial_met Initial methionine Set first amino acid to methionine # label frameshifts modelled as short (1,2,4,5 bp) introns 59 Frameshift Frameshift Frameshift modelled as intron #more gene counts for Vega 60 PTCount processed_transcript_UNKNOWN Number of Processed Transcripts 61 PredPTCount processed_transcript_PREDICTED Number of Predicted Processed Transcripts 62 ncRNA Structure RNA secondary structure line 63 skip_clone skip clone Skip clone in align_by_clone_identity.pl # Gene counts for seq_region_stats.pl script 64 GeneNo_knwCod known protein_coding Gene Count Number of known protein_coding Genes 65 GeneNo_novCod novel protein_coding Gene Count Number of novel protein_coding Genes 66 GeneNo_rRNA rRNA Gene Count Number of rRNA Genes 67 GeneNo_pseudo pseudogene Gene Count Number of pseudogene Genes 68 GeneNo_snRNA snRNA Gene Count Number of snRNA Genes 69 GeneNo_snoRNA snoRNA Gene Count Number of snoRNA Genes 70 GeneNo_miRNA miRNA Gene Count Number of miRNA Genes 71 GeneNo_mscRNA misc_RNA Gene Count Number of misc_RNA Genes 72 GeneNo_scRNA scRNA Gene Count Number of scRNA Genes 73 GeneNo_MTrRNA Mt_rRNA Gene Count Number of Mt_rRNA Genes 74 GeneNo_MTtRNA Mt_tRNA Gene Count Number of Mt_tRNA Genes 75 GeneNo_RNA_pseu scRNA_pseudogene Gene Count Number of scRNA_pseudogene Genes 76 GeneNo_tRNA tRNA Gene Count Number of tRNA Genes 80 supercontig SuperContig name NULL 81 well_name Well plate name NULL # Added by fc1 26/11/06 82 bacterial Bacterial 83 NovelCDSCount Novel CDS Count 84 NovelTransCount Novel Transcript Count 85 PutTransCount Putative Transcript Count 86 PredTransCount Predicted Transcript Count 87 UnclassPsCount Unclass Ps count 88 KnwnprogCount Known prog Count 89 NovCDSprogCount Novel CDS prog count 90 bacend_well_nam BACend well name 91 alt_well_name Alt well name 92 TranscriptEdge Transcript Edge 93 alt_embl_acc Alt EMBL acc 94 alt_org Alt org # anacode attribs added by ml6 29/11/06 - seen in yeast but not others 95 intl_clone_name International Clone Name 96 embl_version EMBL Version 97 chr Chromosome Name Chromosome Name Contained in the Assembly 98 equiv_asm Equivalent EnsEMBL assembly For full chromosomes made from NCBI AGPs 99 GeneNo_ncRNA ncRNA Gene Count Number of ncRNA Genes # Ig segment gene counts for seq regions stats script ds5 2/2/07 100 GeneNo_IgSeg Ig segment Gene Count Number of Ig segment Genes # cat missing atts 109 HitSimilarity hit similarity percentage id to parent transcripts 110 HitCoverage hit coverage coverage of parent transcripts 111 PropNonGap proportion non gap proportion non gap 112 NumStops number of stops 113 GapExons gap exons number of gap exons 114 SourceTran source transcript source transcript 115 EndNotFound end not found end not found 116 StartNotFound start not found start not found 117 Frameshift Fra Frameshift modelled as intron # Other Vega attribs 118 ensembl_name Ensembl name Name of equivalent Ensembl chromosome 119 NoAnnotation NoAnnotation Clones without manual annotation 120 hap_contig Haplotype contig Contig present on a haplotype # loutre attribs added by ml6 121 annotated Clone Annotation Status 122 keyword Clone Keyword 123 hidden_remark Hidden Remark 124 mRNA_start_NF mRNA start not found 125 mRNA_end_NF mRNA end not found 126 cds_start_NF CDS start not found 127 cds_end_NF CDS end not found 128 write_access Write access for Sequence Set 1 for writable , 0 for read-only 129 hidden Hidden Sequence Set # loutre attribs for vega production (st3) 130 vega_name Vega name Vega seq_region.name 131 vega_export_mod Export mode E (External), I (Internal) etc 132 vega_release Vega release Vega release number # loutre attribs for assembly_tags (ck1) 133 atag_CLE Clone_left_end Clone_lef_end feature marked in GAP database 134 atag_CRE Clone_right_end Clone_right_end feature marked in GAP database 135 atag_Misc Misc miscellaneous feature marked in GAP database 136 atag_Unsure Unsure region of uncertain DNA sequence marked in GAP database 137 MultAssem Multiple Assembled seq region Part of Seq Region is part of more than one assembly 140 wgs WGS contig WGS contig integrated into the map 141 bac AGP clones tiling path of clones # Attribute for per-gene GC percentage 142 GeneGC Gene GC Percentage GC content for this gene # vega 143 TotAssemblyLeng Finished sequence length Length of the assembly not counting sequence gaps # Drosophila, only where the translation provided by flybase differs from that in our database by ONE amino acid 144 amino_acid_sub Amino acid substitution In drosophila, some translations have been manually curated by FlyBase and a stop codon has been changed to an amino acid in order to prevent premature truncation. # Drosophila. Sometimes sequences have been manually altered to remove one base, and this alters the whole translation 145 _rna_edit rna_edit RNA edit #genebuild - databases of removed transcripts 146 kill_reason Kill Reason Reason why a transcript has been killed 147 strip_UTR Strip UTR Transcript needs bad UTR removing # vega 148 TotAssLength Finished sequence length Finished Sequence 149 NovPsCount novel_pseudogene Number of Novel Pseudogenes 150 KnwnPsCount known_pseudogene Number of Known Pseudogenes 151 KnwnTPsCount known_transcribed_pseudogene Number of Known Transcribed Pseudogenes 152 TotPTCount total_processed_transcript Total Number of Processed Transcripts 153 TotPCCount total_protein_coding Total Number of Protein Coding 154 NovNcCount novel_non_coding Number of Novel Non Coding 155 KnwnPolyCount known_polymorphic Number of Known Polymorphic 156 NovPolyCount novel_polymorphic Number of Novel Polymorphic 157 TotIGGeneCount total_IG_gene Total Number of IG Genes 158 NovProcPsCount novel_processed_pseudogene Number of Novel Processed Pseudogenes 159 NovUnPsCount novel_unprocessed_pseudogene Number of Novel Unprocessed Pseudogenes 160 NovTPsCount novel_transcribed_pseudogene Number of Novel Transcribed Pseudogenes 161 NovTECCount novel_TEC Number of Novel TEC Genes 162 KnwnIGGeneCount IG_gene_KNOWN Number of Known IG Genes 163 KnwnIGPsGeCount IG_pseudogene_KNOWN Number of Known IG Pseudogenes