moved to 2009

22b2e2a1 · edgrif · ba6a8c64 · ba6a8c64 · ba6a8c64
Commit 22b2e2a1 authored 16 years ago by edgrif
--- a/ZMAP_LACE_PROJECT/zmap_lace.2009_01_15
+++ b/ZMAP_LACE_PROJECT/zmap_lace.2009_01_15
-==============================================================================
-ZMap/Otterlace Development
-
-
-Date:  Thursday 4th Dec 2008
-
-Attendees: jgrg, jla1, lw2, kj2, edgrif, st3
-
-
------------------------------------------------------------------------------
-CURRENT ITEMS
-
-
-Items Completed
---------------
-
-
-1/ Dumping features
-
-kj2 asked if zmap could dump features, edgrif said that it could dump in GFFv2
-format but that some work was needed to dump subsets of features (e.g. dump
-all the features from a search results window), this is done via some testing
-and more importantly a tidy up of SO terms.
-
-2/ Pfam on the fly.
-
- jla1 asked about doing pfam analysis on the fly, Rob Finn and James have
-spoken about this, Mustapha is working on this and there is a prototype in
-test_otterlace. DONE...needs testing.
-
-
-
-
-
-High priority
-------------
-
-1/ Tick boxed for controlled vocabulary
-
-jla1 said there is an urgent need to add "tick boxes" to the lace interface to
-ensure that certain properties of annotated features can only be chosen from
-a controlled vocabulary.
-
-1a/ Locus Finished button
-
-st3 asked if there could be a tag on a Locus to say it was Finished,
-implemented via a button so that the correct tag(s) were automatically
-entered. jgrg to implement.
-
-1b/ Clone Finished button
-
-st3 would like a "Clone finished" button with same function as Locus Finished
-button. jgrg to implement.
-
-
-2/ Clone summary info/Automating DE line creation / Quality Control
-
-There  is  a script for automating this  which kj2 wrote  for zebrafish, 
-currently it must be run from the command line but jgrg is integrating
-into  the clone editing  window in lace.
-
-Following on jla1 also suggested that it would be good to have
-automated QC scripts trawling through the database regularly looking for
-duff data. Tina Eyre wrote one that could be co-opted and st3 also has
-some. This is becoming an important issue for Havana to ensure really
-good quality data. Add automated checking against SwissProt for CDS.
-
-jgrg said that much of the checking was done for annotation and he will circulate
-an email summarising this. QC for save to data back to Otterlace need doing though.
-
-
-3/ Data for zebra fish DAS tracks needs mapping between assemblies, jgrg
-said Mustapha has done this but he is not sure for which assemblies.
-
-
-4/ SNP tracks
-
-jla1 would like some of the DAS tracks currently available to be put into
-lace and hence zmap. jgrg said that this is not immediately straight forward
-as they don't all say which assembly they are based on but some can be done
-fairly soon. e.g. comparacon ? jgrg to investigate.
-
-
-5/ Styles
-
-James working to introduce this now, zmap code is all there.
-
-
-6/ Solexa reads
-
-kj2 and jla1 would like to get Solexa reads into pipeline but this is a lot
-of data and will require zmap to be able to do dynamic fetches of subranges of
-data otherwise we will be swamped by it. rds is to do some design work on the
-dynamic loading. Initially we could only load those alignments within a marked
-range. Mustapha is working on this.
-
-As an addition to this edgrif and rds will think about how we might give some
-kind of "overview" for alignment columns that could show where the aligns are
-without drawing them all.
-
-In fact John Collins has initial data for gene models and confirmed introns
-that can be added now without code changes.
-
-
-
-6/ Aliass/renaming of Loci
-
-HUGO old data was overwriting deliberate manual changes to locus by annotators.
-Fixed now ? lw2 to contact MGI as there are problems with IDs from them.
-
-
-8/ clone path
-
-lw2 would like the full clone extents displayed with the non-golden sections displayed.
-Do we need the clone ends information for this, edgrif to check ?? Check with Leo's
-smapped example with several sections of a single clone...
-
-Would like this info. in navigator panel + navigator panel needs to display both
-the foocanvas scrolled window area _and_ the actual area on the screen, and both
-should be draggable...
-
-edgrif will do tile path information/display, rds will do navigator bit.
-
-
-9/ multi-view interactions
-
-kj2 would like a way to check positioning of multiple genes, currently would require
-multiple lace sessions, requires more discussion. Possible now ??
-
-kj2 would like to click on a feature in one view and see it highlighted in another
-so that she can look for genes present in more than one clone.
-
-
-
-Medium priority
---------------
-
-0/ new column bump to show inconsistent matches
-
-Often annotator has many matches that fit against an existing transcript, be good
-to have a mode that hid these and only showed the ones inconsistent with the
-transcripts splices.
-
-
- removing evidence already used *************
-
-annotators would like to be able to remove from display homologies that
-have already been used to annotate variants etc. Does this need to be
-persistent in the database in some way ?? edgrif & jgrg will get
-together to arrange this via styles so it can persist in a natural way
-in the database.
-
-**24526: Showing which evidence has been used
-Differential coloring of matches that have been used already as evidence
-for a transcript
-
-mainly requires jgrg to mark features and then tell zmap to move the features
-to a new column or repaint them with a new style.
-
-
-1/ Locus list
-
-jgrg to provide a list of loci as another tab window. + searching on ensembl ids.
-
-
-
-2/ 5'and 3' EST read pairs
-
-we need these to be marked in zmap as in acedb, requires new tags in database in
-the same way as in worm database.
-
-edgrif explained that acedb loses the match strand information which will be
-to implement this cleanly. edgrif is changing acedb code so it holds this
-information and also dumps it in gff v2 and v3 (it is required for the latter).
-
-kj2 would also like DITAG information displayed.
-
-We can use worm tags but adjust to be more generic, e.g. "Read_pairs"
-
-
-
-3/ bug in acedb server
-
-jgrg raised a bug in the server which was causing it run out of memory, edgrif
-to investigate. There is a ticket for this: 51894
-
-edgrif to make jgrg has up to date binaries for dotter etc.
-
-
-4/ popups/labels for transcripts
-
-jla1 said that apollo had a neat way of showing a label for a transcript
-that remained in one place on the screen as the window was scrolled. edgrif
-to investigate + look at "tool tips" for transcripts....especially with
-locus information. 
-
-
-5/ Naming of Alternative Alleles
-
-st3 asked about naming of alternative alleles in different mouse strains / human
-haplotypes. For loci that don't have HGNC/MGI names, these are incorrectly named after
-the clones on the reference seqiuence.  jla1 suggested correctly naming them after the
-clones they are on, but making sure that the annotators can see the associated
-'reference assembly' gene. st3 said this could be done via the alt_allele table, and
-if it were done across the board, ie including KNOWN genes, then this would make Vega
-prep easier
-
-
-6/ Best in Genome matches
-
-jla1 also said she would like to "best in genome" displayed. jgrg said this
-is not easy as Otterlace works on a clone by clone basis. It was agreed
-that would be worthwhile to show at least "best in clone" or better to do
-a crude "best in genome".
-
-
-
-
------------------------------------------------------------------------------
-BACK-BURNER ITEMS
-
-
-
-ZMap/acedb
----------
-
-1/ Interface issues:
-
-
-jla1 and lw2 said they would like the marked area to be less obvious an also to
-be a "greying" out rather than blue and with less dense dots. edgrif to implement.
-
-
-jla1 said she would like to be able to click on an exon and see evidence (and
-transcripts ?) with the same splice be highlighted. laurens also wants this
-as it would often avoid having to open dotter to check.
-
-
-
-2/ Display of multiple compara alignments
-
-multiple alignments: edgrif is about a third of the way through implementing a
-more general way of displaying arbitrary blocks.  This will become a high
-priority item as we move to haplotypes etc.
-
-th said this would be needed soon so it should be moved up the priority list.
-jgrg said they have mappings in lace that could be passed on to zmap easily
-and also said that annotators can already annotate assemblies from variants
-and different species alongside each other as needed.
-
-We need to decide on the format for specifying the alignments.
-
-
-
-3/ alternative translations: edgrif about half way through code to do this.
-
-edgrif is doing this as part of the protein search code since this code
-does translations itself. edgrif will talk to jgrg about how alternative
-genetic codes can be specified with acedb.
-
-We need a test database for this. jgrg said this would come soon.
-
-edgrif will add field to transcript feature to hold alternative translation
-table.
-
-
-4/ Blixem enhancements
-
-two areas:
-
- display multiple overlapping transcripts better (includes removing the many
-yellow lines introduced by this...clarify this point), have a scrolled window
-of the transcripts. jgrg said that perhaps only the transcripts made by havana
-should be displayed. jla1 said she would like to be able to dynamically update
-the transcripts displayed.
-
- better interaction with zmap, e.g. click on things in zmap and see them 
-highlighted in blixem and vice versa....
-
-we had better have a more generalised protocol for communicating with external
-programs....
-
- blixem: dna searching is NOT DONE, edgrif to expedite. Also protein searches
-will be added.
-
-Perhaps one way to get this done would be employ a good C programmer on a
-short contract.
-
-
-5/ acedb server performance
-
-edgrif investigating two possibilities for improving performance:
-
-	- make sgifaceserver stream data rather than batch it up, would
-	  save a lot of memory.
-
-	- deferred loading, only load features when needed and load in
-	  zone requested by user....design done...now need to implement.
-
-
-6/ A new canvas
-
-rds has been looking at alternative canvas implementations which offer an MVC
-model. He has managed to get goocanvas developers to fix some bugs and make
-some changes to support our needs.
-
-the goocanvas MVC model will mean we do not have to copy data to split windows
-meaning greatly reduced memory usage.
-
-the goocanvas will cope automatically with the X Windows window size limit, this
-combined with changes in the gtk scrolling model means we will be able to do away
-with having two scroll bars.
-
-We will introduce the new canvas this year.
-
-
-
-
-Otterlace
---------
-
-1/ Alternative alignment programs
-
-There has been some discussion about using splice aware alignment programs.
-jgrg is waiting for a fix to exonerate to support the new pipeline mustapha
-has written.
-
-edgrif and jgrg both commented that some changes to acedb data structures
-would be needed to represent both HSP's that are "joined up" but also 
-protein matches that start part of the way through a peptide. BUT one
-possibility would be for zmap to access this data directly from a mysql
-database thus sidestepping the need to put it in acedb first. gffv3 will also
-be needed to represent this kind of joined up HSP data in a natural and
-robust way.
-
-Changes will also be required to represent codons that are spliced across
-introns as perhaps surprisingly none of the acedb programs can cope with
-this currently (and neither can zmap).
-
-
-2/ Spell checker
-
-jla1 reported a problem that free text fields and some fixed text fields
-have misspellings (is that a mis-spelling ?) and it would be good to have
-some autocorrection facility. The ideal would be to have some widget that
-allowed other dictionaries (e.g. science) to be attached to it and could thus
-be used as a general text entry tool.
-
-
-
-3/ Sequence exceptions
-
-kj2 raised the subject of how to indicate sequence exceptions,
-e.g. when bases are skipped in translations. kj2 wondered if alternative
-translations could be registered as sequence exceptions, edgrif said he
-prefer a separate mechanism as much of the code is already done for this.
-We should therefore include a mechanism in zmap for sequence exceptions,
-this would require a similar mechanism in acedb. This is yet another reason
-for GFF 3 which has standards for frame shifts and other things.
-
-There should be a way of tagging transcripts where there are sequence
-exceptions.
-
-
-
-
------------------------------------------------------------------------------
-Next Meeting
-
-Will be at 2pm, 29th January 2009
-
-
-==============================================================================
--- a/ZMAP_LACE_PROJECT/zmap_lace.2009_01_29
+++ b/ZMAP_LACE_PROJECT/zmap_lace.2009_01_29
-==============================================================================
-ZMap/Otterlace Development
-
-
-Date:  Thursday 29th Jan 2009
-
-Attendees: jgrg, jla1, lw2, kj2, edgrif, st3
-
-
------------------------------------------------------------------------------
-CURRENT ITEMS
-
-
-Items Completed
---------------
-
-<NONE ?>
-
-
-
-High priority
-------------
-
-1/ Tick boxed for controlled vocabulary
-
-jla1 said there is an urgent need to add "tick boxes" to the lace interface to
-ensure that certain properties of annotated features can only be chosen from
-a controlled vocabulary. lw2 to check whether "fragmented_loci" is included
-in the tags. lw2 said all other tags are in the RT ticket: NNNNNNNNN
-
-1a/ Locus Finished button
-
-st3 asked if there could be a tag on a Locus to say it was Finished,
-implemented via a button so that the correct tag(s) were automatically
-entered. jgrg to implement.
-
-1b/ Clone Finished button
-
-st3 would like a "Clone finished" button with same function as Locus Finished
-button. jgrg to implement.
-
-
-2/ Clone summary info/Automating DE line creation / Quality Control
-
-jgrg has done lots of work on this and summarised his progress in an
-email circulated to us all.
-
-There  is  a script for automating this  which kj2 wrote  for zebrafish, 
-currently it must be run from the command line but jgrg is integrating
-into  the clone editing  window in lace.
-
-Following on jla1 also suggested that it would be good to have
-automated QC scripts trawling through the database regularly looking for
-duff data. Tina Eyre wrote one that could be co-opted and st3 also has
-some. This is becoming an important issue for Havana to ensure really
-good quality data. Add automated checking against SwissProt for CDS.
-
-jgrg said that much of the checking was done for annotation and he will circulate
-an email summarising this. QC for save to data back to Otterlace need doing though.
-
-
-3/ Data for zebra fish DAS tracks needs mapping between assemblies, jgrg
-said Mustapha has done this but he is not sure for which assemblies.
-
-Not currently possible because there is no currently finished assembly.
-
-
-4/ SNP tracks
-
-jla1 would like some of the DAS tracks & other data sources currently available
-to be put into lace and hence zmap. jgrg said that this is not immediately
-straight forward as they don't all say which assembly they are based on but some
-can be done fairly soon. e.g. comparacon ? jgrg to investigate.
-
-
-5/ Styles
-
-James working to introduce this now, zmap code is all there. jgrg is to set a week
-when he can work on this and edgrif will set aside the week also.
-
-
-6/ Solexa reads
-
-kj2 and br2 would like to get Solexa reads into pipeline but this is a lot
-of data and will require zmap to be able to do dynamic fetches of subranges of
-data otherwise we will be swamped by it. rds is to do some design work on the
-dynamic loading. Initially we could only load those alignments within a marked
-range. Mustapha is working on this.
-
-As an addition to this edgrif and rds will think about how we might give some
-kind of "overview" for alignment columns that could show where the aligns are
-without drawing them all.
-
-In fact John Collins has initial data for gene models and confirmed introns
-that can be added now without code changes. This data is coming from Simon
-Whitehead.
-
-
-7/ Alias/renaming of Loci
-
-lw2 to contact MGI as there are problems with IDs from them.
-
-
-8/ clone path
-
-lw2 would like the full clone extents displayed with the non-golden sections displayed.
-Do we need the clone ends information for this, edgrif to check ?? Check with Leo's
-smapped example with several sections of a single clone...
-
-Would like this info. in navigator panel + navigator panel needs to display both
-the foocanvas scrolled window area _and_ the actual area on the screen, and both
-should be draggable...
-
-edgrif will do tile path information/display, rds will do navigator bit.
-
-
-9/ multi-view interactions
-
-kj2 would like to click on a feature in one view and see it highlighted in another
-so that she can look for genes present in more than one clone. edgrif to do this.
-
-
-10/ RT numbers
-
-It was agreed that where possible RT ticket numbers would be included in the
-meetings notes. lw2, edgrif, jgrg to look up numbers.
-
-
-11/ feature grouping tags (e.g. for 5'and 3' EST read pairs)
-
-wormdb uses paired tags specific to EST read pairs but we need a more flexible
-generalisation of this to handle multiple features and different types of
-feature. jgrg's group have been working on filtering hits in a better way and
-so have more information about grouping for display.
-
-
-
-Medium priority
---------------
-
-0/ new column bump to show inconsistent matches
-
-Often annotator has many matches that fit against an existing transcript, be good
-to have a mode that hid these and only showed the ones inconsistent with the
-transcripts splices.
-
-
-1/ dotter error messages
-
-lw2 said that sometimes dotter just does not appear. edgrif to check that dotter
-is reporting errors properly and to make sure they show in dialog windows not on
-the terminal which is often not available to the annotator.
-
-
-2/ removing evidence already used *************
-
-annotators would like to be able to remove from display homologies that
-have already been used to annotate variants etc. Does this need to be
-persistent in the database in some way ?? edgrif & jgrg will get
-together to arrange this via styles so it can persist in a natural way
-in the database.
-
-**24526: Showing which evidence has been used
-Differential coloring of matches that have been used already as evidence
-for a transcript
-
-mainly requires jgrg to mark features and then tell zmap to move the features
-to a new column or repaint them with a new style.
-
-
-3/ Locus list
-
-jgrg to provide a list of loci as another tab window. + searching on ensembl ids.
-
-
-
-
-5/ bug in acedb server
-
-jgrg raised a bug in the server which was causing it run out of memory, edgrif
-to investigate. There is a ticket for this: 51894
-
-edgrif to make jgrg has up to date binaries for dotter etc.
-
-
-6/ popups/labels for transcripts
-
-jla1 said that apollo had a neat way of showing a label for a transcript
-that remained in one place on the screen as the window was scrolled. edgrif
-to investigate + look at "tool tips" for transcripts....especially with
-locus information. 
-
-
-7/ Naming of Alternative Alleles
-
-st3 asked about naming of alternative alleles in different mouse strains / human
-haplotypes. For loci that don't have HGNC/MGI names, these are incorrectly named after
-the clones on the reference seqiuence.  jla1 suggested correctly naming them after the
-clones they are on, but making sure that the annotators can see the associated
-'reference assembly' gene. st3 said this could be done via the alt_allele table, and
-if it were done across the board, ie including KNOWN genes, then this would make Vega
-prep easier
-
-
-8/ Best in Genome matches
-
-jla1 also said she would like to "best in genome" displayed. jgrg said this
-is not easy as Otterlace works on a clone by clone basis. It was agreed
-that would be worthwhile to show at least "best in clone" or better to do
-a crude "best in genome".
-
-
-
-
------------------------------------------------------------------------------
-BACK-BURNER ITEMS
-
-
-
-ZMap/acedb
----------
-
-1/ Interface issues:
-
-
-jla1 and lw2 said they would like the marked area to be less obvious an also to
-be a "greying" out rather than blue and with less dense dots. edgrif to implement.
-
-
-jla1 said she would like to be able to click on an exon and see evidence (and
-transcripts ?) with the same splice be highlighted. laurens also wants this
-as it would often avoid having to open dotter to check.
-
-
-
-2/ Display of multiple compara alignments
-
-multiple alignments: edgrif is about a third of the way through implementing a
-more general way of displaying arbitrary blocks.  This will become a high
-priority item as we move to haplotypes etc.
-
-th said this would be needed soon so it should be moved up the priority list.
-jgrg said they have mappings in lace that could be passed on to zmap easily
-and also said that annotators can already annotate assemblies from variants
-and different species alongside each other as needed.
-
-We need to decide on the format for specifying the alignments.
-
-
-
-3/ alternative translations: edgrif about half way through code to do this.
-
-edgrif is doing this as part of the protein search code since this code
-does translations itself. edgrif will talk to jgrg about how alternative
-genetic codes can be specified with acedb.
-
-We need a test database for this. jgrg said this would come soon.
-
-edgrif will add field to transcript feature to hold alternative translation
-table.
-
-
-4/ Blixem enhancements
-
-two areas:
-
- display multiple overlapping transcripts better (includes removing the many
-yellow lines introduced by this...clarify this point), have a scrolled window
-of the transcripts. jgrg said that perhaps only the transcripts made by havana
-should be displayed. jla1 said she would like to be able to dynamically update
-the transcripts displayed.
-
- better interaction with zmap, e.g. click on things in zmap and see them 
-highlighted in blixem and vice versa....
-
-we had better have a more generalised protocol for communicating with external
-programs....
-
- blixem: dna searching is NOT DONE, edgrif to expedite. Also protein searches
-will be added.
-
-Perhaps one way to get this done would be employ a good C programmer on a
-short contract.
-
-
-5/ acedb server performance
-
-edgrif investigating two possibilities for improving performance:
-
-	- make sgifaceserver stream data rather than batch it up, would
-	  save a lot of memory.
-
-	- deferred loading, only load features when needed and load in
-	  zone requested by user....design done...now need to implement.
-
-
-6/ A new canvas
-
-rds has been looking at alternative canvas implementations which offer an MVC
-model. He has managed to get goocanvas developers to fix some bugs and make
-some changes to support our needs.
-
-the goocanvas MVC model will mean we do not have to copy data to split windows
-meaning greatly reduced memory usage.
-
-the goocanvas will cope automatically with the X Windows window size limit, this
-combined with changes in the gtk scrolling model means we will be able to do away
-with having two scroll bars.
-
-We will introduce the new canvas this year.
-
-
-
-
-Otterlace
---------
-
-1/ Alternative alignment programs
-
-There has been some discussion about using splice aware alignment programs.
-jgrg is waiting for a fix to exonerate to support the new pipeline mustapha
-has written.
-
-edgrif and jgrg both commented that some changes to acedb data structures
-would be needed to represent both HSP's that are "joined up" but also 
-protein matches that start part of the way through a peptide. BUT one
-possibility would be for zmap to access this data directly from a mysql
-database thus sidestepping the need to put it in acedb first. gffv3 will also
-be needed to represent this kind of joined up HSP data in a natural and
-robust way.
-
-Changes will also be required to represent codons that are spliced across
-introns as perhaps surprisingly none of the acedb programs can cope with
-this currently (and neither can zmap).
-
-
-2/ Spell checker
-
-jla1 reported a problem that free text fields and some fixed text fields
-have misspellings (is that a mis-spelling ?) and it would be good to have
-some autocorrection facility. The ideal would be to have some widget that
-allowed other dictionaries (e.g. science) to be attached to it and could thus
-be used as a general text entry tool.
-
-
-
-3/ Sequence exceptions
-
-kj2 raised the subject of how to indicate sequence exceptions,
-e.g. when bases are skipped in translations. kj2 wondered if alternative
-translations could be registered as sequence exceptions, edgrif said he
-prefer a separate mechanism as much of the code is already done for this.
-We should therefore include a mechanism in zmap for sequence exceptions,
-this would require a similar mechanism in acedb. This is yet another reason
-for GFF 3 which has standards for frame shifts and other things.
-
-There should be a way of tagging transcripts where there are sequence
-exceptions.
-
-
-
-
------------------------------------------------------------------------------
-Next Meeting
-
-Will be at 2pm, 12th February 2009
-
-
-==============================================================================