From 109b37792d012589c9cbaeed4326cefb7d4c7df6 Mon Sep 17 00:00:00 2001 From: edgrif <edgrif> Date: Thu, 15 Jan 2009 17:35:10 +0000 Subject: [PATCH] first version --- ZMAP_LACE_PROJECT/zmap_lace.2009_01_15 | 373 +++++++++++++++++++++++++ 1 file changed, 373 insertions(+) create mode 100755 ZMAP_LACE_PROJECT/zmap_lace.2009_01_15 diff --git a/ZMAP_LACE_PROJECT/zmap_lace.2009_01_15 b/ZMAP_LACE_PROJECT/zmap_lace.2009_01_15 new file mode 100755 index 000000000..4482f9e67 --- /dev/null +++ b/ZMAP_LACE_PROJECT/zmap_lace.2009_01_15 @@ -0,0 +1,373 @@ +============================================================================== +ZMap/Otterlace Development + + +Date: Thursday 4th Dec 2008 + +Attendees: jgrg, jla1, lw2, kj2, edgrif, st3 + + +------------------------------------------------------------------------------ +CURRENT ITEMS + + +Items Completed +--------------- + + +1/ Dumping features + +kj2 asked if zmap could dump features, edgrif said that it could dump in GFFv2 +format but that some work was needed to dump subsets of features (e.g. dump +all the features from a search results window), this is done via some testing +and more importantly a tidy up of SO terms. + +2/ Pfam on the fly. + +- jla1 asked about doing pfam analysis on the fly, Rob Finn and James have +spoken about this, Mustapha is working on this and there is a prototype in +test_otterlace. DONE...needs testing. + + + + + +High priority +------------- + +1/ Tick boxed for controlled vocabulary + +jla1 said there is an urgent need to add "tick boxes" to the lace interface to +ensure that certain properties of annotated features can only be chosen from +a controlled vocabulary. + +1a/ Locus Finished button + +st3 asked if there could be a tag on a Locus to say it was Finished, +implemented via a button so that the correct tag(s) were automatically +entered. jgrg to implement. + +1b/ Clone Finished button + +st3 would like a "Clone finished" button with same function as Locus Finished +button. jgrg to implement. + + +2/ Clone summary info/Automating DE line creation / Quality Control + +There is a script for automating this which kj2 wrote for zebrafish, +currently it must be run from the command line but jgrg is integrating +into the clone editing window in lace. + +Following on jla1 also suggested that it would be good to have +automated QC scripts trawling through the database regularly looking for +duff data. Tina Eyre wrote one that could be co-opted and st3 also has +some. This is becoming an important issue for Havana to ensure really +good quality data. Add automated checking against SwissProt for CDS. + +jgrg said that much of the checking was done for annotation and he will circulate +an email summarising this. QC for save to data back to Otterlace need doing though. + + +3/ Data for zebra fish DAS tracks needs mapping between assemblies, jgrg +said Mustapha has done this but he is not sure for which assemblies. + + +4/ SNP tracks + +jla1 would like some of the DAS tracks currently available to be put into +lace and hence zmap. jgrg said that this is not immediately straight forward +as they don't all say which assembly they are based on but some can be done +fairly soon. e.g. comparacon ? jgrg to investigate. + + +5/ Styles + +James working to introduce this now, zmap code is all there. + + +6/ Solexa reads + +kj2 and jla1 would like to get Solexa reads into pipeline but this is a lot +of data and will require zmap to be able to do dynamic fetches of subranges of +data otherwise we will be swamped by it. rds is to do some design work on the +dynamic loading. Initially we could only load those alignments within a marked +range. Mustapha is working on this. + +As an addition to this edgrif and rds will think about how we might give some +kind of "overview" for alignment columns that could show where the aligns are +without drawing them all. + +In fact John Collins has initial data for gene models and confirmed introns +that can be added now without code changes. + + + +6/ Aliass/renaming of Loci + +HUGO old data was overwriting deliberate manual changes to locus by annotators. +Fixed now ? lw2 to contact MGI as there are problems with IDs from them. + + +8/ clone path + +lw2 would like the full clone extents displayed with the non-golden sections displayed. +Do we need the clone ends information for this, edgrif to check ?? Check with Leo's +smapped example with several sections of a single clone... + +Would like this info. in navigator panel + navigator panel needs to display both +the foocanvas scrolled window area _and_ the actual area on the screen, and both +should be draggable... + +edgrif will do tile path information/display, rds will do navigator bit. + + +9/ multi-view interactions + +kj2 would like a way to check positioning of multiple genes, currently would require +multiple lace sessions, requires more discussion. Possible now ?? + +kj2 would like to click on a feature in one view and see it highlighted in another +so that she can look for genes present in more than one clone. + + + +Medium priority +--------------- + +0/ new column bump to show inconsistent matches + +Often annotator has many matches that fit against an existing transcript, be good +to have a mode that hid these and only showed the ones inconsistent with the +transcripts splices. + + +- removing evidence already used ************* + +annotators would like to be able to remove from display homologies that +have already been used to annotate variants etc. Does this need to be +persistent in the database in some way ?? edgrif & jgrg will get +together to arrange this via styles so it can persist in a natural way +in the database. + +**24526: Showing which evidence has been used +Differential coloring of matches that have been used already as evidence +for a transcript + +mainly requires jgrg to mark features and then tell zmap to move the features +to a new column or repaint them with a new style. + + +1/ Locus list + +jgrg to provide a list of loci as another tab window. + searching on ensembl ids. + + + +2/ 5'and 3' EST read pairs + +we need these to be marked in zmap as in acedb, requires new tags in database in +the same way as in worm database. + +edgrif explained that acedb loses the match strand information which will be +to implement this cleanly. edgrif is changing acedb code so it holds this +information and also dumps it in gff v2 and v3 (it is required for the latter). + +kj2 would also like DITAG information displayed. + +We can use worm tags but adjust to be more generic, e.g. "Read_pairs" + + + +3/ bug in acedb server + +jgrg raised a bug in the server which was causing it run out of memory, edgrif +to investigate. There is a ticket for this: 51894 + +edgrif to make jgrg has up to date binaries for dotter etc. + + +4/ popups/labels for transcripts + +jla1 said that apollo had a neat way of showing a label for a transcript +that remained in one place on the screen as the window was scrolled. edgrif +to investigate + look at "tool tips" for transcripts....especially with +locus information. + + +5/ Naming of Alternative Alleles + +-st3 asked about naming of alternative alleles in different mouse strains / human +haplotypes. For loci that don't have HGNC/MGI names, these are incorrectly named after +the clones on the reference seqiuence. jla1 suggested correctly naming them after the +clones they are on, but making sure that the annotators can see the associated +'reference assembly' gene. st3 said this could be done via the alt_allele table, and +if it were done across the board, ie including KNOWN genes, then this would make Vega +prep easier + + +6/ Best in Genome matches + +jla1 also said she would like to "best in genome" displayed. jgrg said this +is not easy as Otterlace works on a clone by clone basis. It was agreed +that would be worthwhile to show at least "best in clone" or better to do +a crude "best in genome". + + + + +------------------------------------------------------------------------------ +BACK-BURNER ITEMS + + + +ZMap/acedb +---------- + +1/ Interface issues: + + +jla1 and lw2 said they would like the marked area to be less obvious an also to +be a "greying" out rather than blue and with less dense dots. edgrif to implement. + + +jla1 said she would like to be able to click on an exon and see evidence (and +transcripts ?) with the same splice be highlighted. laurens also wants this +as it would often avoid having to open dotter to check. + + + +2/ Display of multiple compara alignments + +multiple alignments: edgrif is about a third of the way through implementing a +more general way of displaying arbitrary blocks. This will become a high +priority item as we move to haplotypes etc. + +th said this would be needed soon so it should be moved up the priority list. +jgrg said they have mappings in lace that could be passed on to zmap easily +and also said that annotators can already annotate assemblies from variants +and different species alongside each other as needed. + +We need to decide on the format for specifying the alignments. + + + +3/ alternative translations: edgrif about half way through code to do this. + +edgrif is doing this as part of the protein search code since this code +does translations itself. edgrif will talk to jgrg about how alternative +genetic codes can be specified with acedb. + +We need a test database for this. jgrg said this would come soon. + +edgrif will add field to transcript feature to hold alternative translation +table. + + +4/ Blixem enhancements + +two areas: + +- display multiple overlapping transcripts better (includes removing the many +yellow lines introduced by this...clarify this point), have a scrolled window +of the transcripts. jgrg said that perhaps only the transcripts made by havana +should be displayed. jla1 said she would like to be able to dynamically update +the transcripts displayed. + +- better interaction with zmap, e.g. click on things in zmap and see them +highlighted in blixem and vice versa.... + +we had better have a more generalised protocol for communicating with external +programs.... + +- blixem: dna searching is NOT DONE, edgrif to expedite. Also protein searches +will be added. + +Perhaps one way to get this done would be employ a good C programmer on a +short contract. + + +5/ acedb server performance + +edgrif investigating two possibilities for improving performance: + + - make sgifaceserver stream data rather than batch it up, would + save a lot of memory. + + - deferred loading, only load features when needed and load in + zone requested by user....design done...now need to implement. + + +6/ A new canvas + +rds has been looking at alternative canvas implementations which offer an MVC +model. He has managed to get goocanvas developers to fix some bugs and make +some changes to support our needs. + +the goocanvas MVC model will mean we do not have to copy data to split windows +meaning greatly reduced memory usage. + +the goocanvas will cope automatically with the X Windows window size limit, this +combined with changes in the gtk scrolling model means we will be able to do away +with having two scroll bars. + +We will introduce the new canvas this year. + + + + +Otterlace +--------- + +1/ Alternative alignment programs + +There has been some discussion about using splice aware alignment programs. +jgrg is waiting for a fix to exonerate to support the new pipeline mustapha +has written. + +edgrif and jgrg both commented that some changes to acedb data structures +would be needed to represent both HSP's that are "joined up" but also +protein matches that start part of the way through a peptide. BUT one +possibility would be for zmap to access this data directly from a mysql +database thus sidestepping the need to put it in acedb first. gffv3 will also +be needed to represent this kind of joined up HSP data in a natural and +robust way. + +Changes will also be required to represent codons that are spliced across +introns as perhaps surprisingly none of the acedb programs can cope with +this currently (and neither can zmap). + + +2/ Spell checker + +jla1 reported a problem that free text fields and some fixed text fields +have misspellings (is that a mis-spelling ?) and it would be good to have +some autocorrection facility. The ideal would be to have some widget that +allowed other dictionaries (e.g. science) to be attached to it and could thus +be used as a general text entry tool. + + + +3/ Sequence exceptions + +kj2 raised the subject of how to indicate sequence exceptions, +e.g. when bases are skipped in translations. kj2 wondered if alternative +translations could be registered as sequence exceptions, edgrif said he +prefer a separate mechanism as much of the code is already done for this. +We should therefore include a mechanism in zmap for sequence exceptions, +this would require a similar mechanism in acedb. This is yet another reason +for GFF 3 which has standards for frame shifts and other things. + +There should be a way of tagging transcripts where there are sequence +exceptions. + + + + +------------------------------------------------------------------------------ +Next Meeting + +Will be at 2pm, 23rd October 2008 + + +============================================================================== -- GitLab