Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
Z
zmap
Manage
Activity
Members
Labels
Plan
Issues
0
Issue boards
Milestones
Iterations
Wiki
Requirements
Jira
Code
Merge requests
0
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Locked files
Build
Pipelines
Jobs
Pipeline schedules
Test cases
Artifacts
Deploy
Releases
Package Registry
Container Registry
Operate
Environments
Terraform modules
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Code review analytics
Issue analytics
Insights
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
ensembl-gh-mirror
zmap
Commits
22b2e2a1
Commit
22b2e2a1
authored
16 years ago
by
edgrif
Browse files
Options
Downloads
Patches
Plain Diff
moved to 2009
parent
ba6a8c64
No related branches found
Branches containing commit
No related tags found
Tags containing commit
No related merge requests found
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
ZMAP_LACE_PROJECT/zmap_lace.2009_01_15
+0
-373
0 additions, 373 deletions
ZMAP_LACE_PROJECT/zmap_lace.2009_01_15
ZMAP_LACE_PROJECT/zmap_lace.2009_01_29
+0
-369
0 additions, 369 deletions
ZMAP_LACE_PROJECT/zmap_lace.2009_01_29
with
0 additions
and
742 deletions
ZMAP_LACE_PROJECT/zmap_lace.2009_01_15
deleted
100755 → 0
+
0
−
373
View file @
ba6a8c64
==============================================================================
ZMap/Otterlace Development
Date: Thursday 4th Dec 2008
Attendees: jgrg, jla1, lw2, kj2, edgrif, st3
------------------------------------------------------------------------------
CURRENT ITEMS
Items Completed
---------------
1/ Dumping features
kj2 asked if zmap could dump features, edgrif said that it could dump in GFFv2
format but that some work was needed to dump subsets of features (e.g. dump
all the features from a search results window), this is done via some testing
and more importantly a tidy up of SO terms.
2/ Pfam on the fly.
- jla1 asked about doing pfam analysis on the fly, Rob Finn and James have
spoken about this, Mustapha is working on this and there is a prototype in
test_otterlace. DONE...needs testing.
High priority
-------------
1/ Tick boxed for controlled vocabulary
jla1 said there is an urgent need to add "tick boxes" to the lace interface to
ensure that certain properties of annotated features can only be chosen from
a controlled vocabulary.
1a/ Locus Finished button
st3 asked if there could be a tag on a Locus to say it was Finished,
implemented via a button so that the correct tag(s) were automatically
entered. jgrg to implement.
1b/ Clone Finished button
st3 would like a "Clone finished" button with same function as Locus Finished
button. jgrg to implement.
2/ Clone summary info/Automating DE line creation / Quality Control
There is a script for automating this which kj2 wrote for zebrafish,
currently it must be run from the command line but jgrg is integrating
into the clone editing window in lace.
Following on jla1 also suggested that it would be good to have
automated QC scripts trawling through the database regularly looking for
duff data. Tina Eyre wrote one that could be co-opted and st3 also has
some. This is becoming an important issue for Havana to ensure really
good quality data. Add automated checking against SwissProt for CDS.
jgrg said that much of the checking was done for annotation and he will circulate
an email summarising this. QC for save to data back to Otterlace need doing though.
3/ Data for zebra fish DAS tracks needs mapping between assemblies, jgrg
said Mustapha has done this but he is not sure for which assemblies.
4/ SNP tracks
jla1 would like some of the DAS tracks currently available to be put into
lace and hence zmap. jgrg said that this is not immediately straight forward
as they don't all say which assembly they are based on but some can be done
fairly soon. e.g. comparacon ? jgrg to investigate.
5/ Styles
James working to introduce this now, zmap code is all there.
6/ Solexa reads
kj2 and jla1 would like to get Solexa reads into pipeline but this is a lot
of data and will require zmap to be able to do dynamic fetches of subranges of
data otherwise we will be swamped by it. rds is to do some design work on the
dynamic loading. Initially we could only load those alignments within a marked
range. Mustapha is working on this.
As an addition to this edgrif and rds will think about how we might give some
kind of "overview" for alignment columns that could show where the aligns are
without drawing them all.
In fact John Collins has initial data for gene models and confirmed introns
that can be added now without code changes.
6/ Aliass/renaming of Loci
HUGO old data was overwriting deliberate manual changes to locus by annotators.
Fixed now ? lw2 to contact MGI as there are problems with IDs from them.
8/ clone path
lw2 would like the full clone extents displayed with the non-golden sections displayed.
Do we need the clone ends information for this, edgrif to check ?? Check with Leo's
smapped example with several sections of a single clone...
Would like this info. in navigator panel + navigator panel needs to display both
the foocanvas scrolled window area _and_ the actual area on the screen, and both
should be draggable...
edgrif will do tile path information/display, rds will do navigator bit.
9/ multi-view interactions
kj2 would like a way to check positioning of multiple genes, currently would require
multiple lace sessions, requires more discussion. Possible now ??
kj2 would like to click on a feature in one view and see it highlighted in another
so that she can look for genes present in more than one clone.
Medium priority
---------------
0/ new column bump to show inconsistent matches
Often annotator has many matches that fit against an existing transcript, be good
to have a mode that hid these and only showed the ones inconsistent with the
transcripts splices.
- removing evidence already used *************
annotators would like to be able to remove from display homologies that
have already been used to annotate variants etc. Does this need to be
persistent in the database in some way ?? edgrif & jgrg will get
together to arrange this via styles so it can persist in a natural way
in the database.
**24526: Showing which evidence has been used
Differential coloring of matches that have been used already as evidence
for a transcript
mainly requires jgrg to mark features and then tell zmap to move the features
to a new column or repaint them with a new style.
1/ Locus list
jgrg to provide a list of loci as another tab window. + searching on ensembl ids.
2/ 5'and 3' EST read pairs
we need these to be marked in zmap as in acedb, requires new tags in database in
the same way as in worm database.
edgrif explained that acedb loses the match strand information which will be
to implement this cleanly. edgrif is changing acedb code so it holds this
information and also dumps it in gff v2 and v3 (it is required for the latter).
kj2 would also like DITAG information displayed.
We can use worm tags but adjust to be more generic, e.g. "Read_pairs"
3/ bug in acedb server
jgrg raised a bug in the server which was causing it run out of memory, edgrif
to investigate. There is a ticket for this: 51894
edgrif to make jgrg has up to date binaries for dotter etc.
4/ popups/labels for transcripts
jla1 said that apollo had a neat way of showing a label for a transcript
that remained in one place on the screen as the window was scrolled. edgrif
to investigate + look at "tool tips" for transcripts....especially with
locus information.
5/ Naming of Alternative Alleles
-st3 asked about naming of alternative alleles in different mouse strains / human
haplotypes. For loci that don't have HGNC/MGI names, these are incorrectly named after
the clones on the reference seqiuence. jla1 suggested correctly naming them after the
clones they are on, but making sure that the annotators can see the associated
'reference assembly' gene. st3 said this could be done via the alt_allele table, and
if it were done across the board, ie including KNOWN genes, then this would make Vega
prep easier
6/ Best in Genome matches
jla1 also said she would like to "best in genome" displayed. jgrg said this
is not easy as Otterlace works on a clone by clone basis. It was agreed
that would be worthwhile to show at least "best in clone" or better to do
a crude "best in genome".
------------------------------------------------------------------------------
BACK-BURNER ITEMS
ZMap/acedb
----------
1/ Interface issues:
jla1 and lw2 said they would like the marked area to be less obvious an also to
be a "greying" out rather than blue and with less dense dots. edgrif to implement.
jla1 said she would like to be able to click on an exon and see evidence (and
transcripts ?) with the same splice be highlighted. laurens also wants this
as it would often avoid having to open dotter to check.
2/ Display of multiple compara alignments
multiple alignments: edgrif is about a third of the way through implementing a
more general way of displaying arbitrary blocks. This will become a high
priority item as we move to haplotypes etc.
th said this would be needed soon so it should be moved up the priority list.
jgrg said they have mappings in lace that could be passed on to zmap easily
and also said that annotators can already annotate assemblies from variants
and different species alongside each other as needed.
We need to decide on the format for specifying the alignments.
3/ alternative translations: edgrif about half way through code to do this.
edgrif is doing this as part of the protein search code since this code
does translations itself. edgrif will talk to jgrg about how alternative
genetic codes can be specified with acedb.
We need a test database for this. jgrg said this would come soon.
edgrif will add field to transcript feature to hold alternative translation
table.
4/ Blixem enhancements
two areas:
- display multiple overlapping transcripts better (includes removing the many
yellow lines introduced by this...clarify this point), have a scrolled window
of the transcripts. jgrg said that perhaps only the transcripts made by havana
should be displayed. jla1 said she would like to be able to dynamically update
the transcripts displayed.
- better interaction with zmap, e.g. click on things in zmap and see them
highlighted in blixem and vice versa....
we had better have a more generalised protocol for communicating with external
programs....
- blixem: dna searching is NOT DONE, edgrif to expedite. Also protein searches
will be added.
Perhaps one way to get this done would be employ a good C programmer on a
short contract.
5/ acedb server performance
edgrif investigating two possibilities for improving performance:
- make sgifaceserver stream data rather than batch it up, would
save a lot of memory.
- deferred loading, only load features when needed and load in
zone requested by user....design done...now need to implement.
6/ A new canvas
rds has been looking at alternative canvas implementations which offer an MVC
model. He has managed to get goocanvas developers to fix some bugs and make
some changes to support our needs.
the goocanvas MVC model will mean we do not have to copy data to split windows
meaning greatly reduced memory usage.
the goocanvas will cope automatically with the X Windows window size limit, this
combined with changes in the gtk scrolling model means we will be able to do away
with having two scroll bars.
We will introduce the new canvas this year.
Otterlace
---------
1/ Alternative alignment programs
There has been some discussion about using splice aware alignment programs.
jgrg is waiting for a fix to exonerate to support the new pipeline mustapha
has written.
edgrif and jgrg both commented that some changes to acedb data structures
would be needed to represent both HSP's that are "joined up" but also
protein matches that start part of the way through a peptide. BUT one
possibility would be for zmap to access this data directly from a mysql
database thus sidestepping the need to put it in acedb first. gffv3 will also
be needed to represent this kind of joined up HSP data in a natural and
robust way.
Changes will also be required to represent codons that are spliced across
introns as perhaps surprisingly none of the acedb programs can cope with
this currently (and neither can zmap).
2/ Spell checker
jla1 reported a problem that free text fields and some fixed text fields
have misspellings (is that a mis-spelling ?) and it would be good to have
some autocorrection facility. The ideal would be to have some widget that
allowed other dictionaries (e.g. science) to be attached to it and could thus
be used as a general text entry tool.
3/ Sequence exceptions
kj2 raised the subject of how to indicate sequence exceptions,
e.g. when bases are skipped in translations. kj2 wondered if alternative
translations could be registered as sequence exceptions, edgrif said he
prefer a separate mechanism as much of the code is already done for this.
We should therefore include a mechanism in zmap for sequence exceptions,
this would require a similar mechanism in acedb. This is yet another reason
for GFF 3 which has standards for frame shifts and other things.
There should be a way of tagging transcripts where there are sequence
exceptions.
------------------------------------------------------------------------------
Next Meeting
Will be at 2pm, 29th January 2009
==============================================================================
This diff is collapsed.
Click to expand it.
ZMAP_LACE_PROJECT/zmap_lace.2009_01_29
deleted
100755 → 0
+
0
−
369
View file @
ba6a8c64
==============================================================================
ZMap/Otterlace Development
Date: Thursday 29th Jan 2009
Attendees: jgrg, jla1, lw2, kj2, edgrif, st3
------------------------------------------------------------------------------
CURRENT ITEMS
Items Completed
---------------
<NONE ?>
High priority
-------------
1/ Tick boxed for controlled vocabulary
jla1 said there is an urgent need to add "tick boxes" to the lace interface to
ensure that certain properties of annotated features can only be chosen from
a controlled vocabulary. lw2 to check whether "fragmented_loci" is included
in the tags. lw2 said all other tags are in the RT ticket: NNNNNNNNN
1a/ Locus Finished button
st3 asked if there could be a tag on a Locus to say it was Finished,
implemented via a button so that the correct tag(s) were automatically
entered. jgrg to implement.
1b/ Clone Finished button
st3 would like a "Clone finished" button with same function as Locus Finished
button. jgrg to implement.
2/ Clone summary info/Automating DE line creation / Quality Control
jgrg has done lots of work on this and summarised his progress in an
email circulated to us all.
There is a script for automating this which kj2 wrote for zebrafish,
currently it must be run from the command line but jgrg is integrating
into the clone editing window in lace.
Following on jla1 also suggested that it would be good to have
automated QC scripts trawling through the database regularly looking for
duff data. Tina Eyre wrote one that could be co-opted and st3 also has
some. This is becoming an important issue for Havana to ensure really
good quality data. Add automated checking against SwissProt for CDS.
jgrg said that much of the checking was done for annotation and he will circulate
an email summarising this. QC for save to data back to Otterlace need doing though.
3/ Data for zebra fish DAS tracks needs mapping between assemblies, jgrg
said Mustapha has done this but he is not sure for which assemblies.
Not currently possible because there is no currently finished assembly.
4/ SNP tracks
jla1 would like some of the DAS tracks & other data sources currently available
to be put into lace and hence zmap. jgrg said that this is not immediately
straight forward as they don't all say which assembly they are based on but some
can be done fairly soon. e.g. comparacon ? jgrg to investigate.
5/ Styles
James working to introduce this now, zmap code is all there. jgrg is to set a week
when he can work on this and edgrif will set aside the week also.
6/ Solexa reads
kj2 and br2 would like to get Solexa reads into pipeline but this is a lot
of data and will require zmap to be able to do dynamic fetches of subranges of
data otherwise we will be swamped by it. rds is to do some design work on the
dynamic loading. Initially we could only load those alignments within a marked
range. Mustapha is working on this.
As an addition to this edgrif and rds will think about how we might give some
kind of "overview" for alignment columns that could show where the aligns are
without drawing them all.
In fact John Collins has initial data for gene models and confirmed introns
that can be added now without code changes. This data is coming from Simon
Whitehead.
7/ Alias/renaming of Loci
lw2 to contact MGI as there are problems with IDs from them.
8/ clone path
lw2 would like the full clone extents displayed with the non-golden sections displayed.
Do we need the clone ends information for this, edgrif to check ?? Check with Leo's
smapped example with several sections of a single clone...
Would like this info. in navigator panel + navigator panel needs to display both
the foocanvas scrolled window area _and_ the actual area on the screen, and both
should be draggable...
edgrif will do tile path information/display, rds will do navigator bit.
9/ multi-view interactions
kj2 would like to click on a feature in one view and see it highlighted in another
so that she can look for genes present in more than one clone. edgrif to do this.
10/ RT numbers
It was agreed that where possible RT ticket numbers would be included in the
meetings notes. lw2, edgrif, jgrg to look up numbers.
11/ feature grouping tags (e.g. for 5'and 3' EST read pairs)
wormdb uses paired tags specific to EST read pairs but we need a more flexible
generalisation of this to handle multiple features and different types of
feature. jgrg's group have been working on filtering hits in a better way and
so have more information about grouping for display.
Medium priority
---------------
0/ new column bump to show inconsistent matches
Often annotator has many matches that fit against an existing transcript, be good
to have a mode that hid these and only showed the ones inconsistent with the
transcripts splices.
1/ dotter error messages
lw2 said that sometimes dotter just does not appear. edgrif to check that dotter
is reporting errors properly and to make sure they show in dialog windows not on
the terminal which is often not available to the annotator.
2/ removing evidence already used *************
annotators would like to be able to remove from display homologies that
have already been used to annotate variants etc. Does this need to be
persistent in the database in some way ?? edgrif & jgrg will get
together to arrange this via styles so it can persist in a natural way
in the database.
**24526: Showing which evidence has been used
Differential coloring of matches that have been used already as evidence
for a transcript
mainly requires jgrg to mark features and then tell zmap to move the features
to a new column or repaint them with a new style.
3/ Locus list
jgrg to provide a list of loci as another tab window. + searching on ensembl ids.
5/ bug in acedb server
jgrg raised a bug in the server which was causing it run out of memory, edgrif
to investigate. There is a ticket for this: 51894
edgrif to make jgrg has up to date binaries for dotter etc.
6/ popups/labels for transcripts
jla1 said that apollo had a neat way of showing a label for a transcript
that remained in one place on the screen as the window was scrolled. edgrif
to investigate + look at "tool tips" for transcripts....especially with
locus information.
7/ Naming of Alternative Alleles
-st3 asked about naming of alternative alleles in different mouse strains / human
haplotypes. For loci that don't have HGNC/MGI names, these are incorrectly named after
the clones on the reference seqiuence. jla1 suggested correctly naming them after the
clones they are on, but making sure that the annotators can see the associated
'reference assembly' gene. st3 said this could be done via the alt_allele table, and
if it were done across the board, ie including KNOWN genes, then this would make Vega
prep easier
8/ Best in Genome matches
jla1 also said she would like to "best in genome" displayed. jgrg said this
is not easy as Otterlace works on a clone by clone basis. It was agreed
that would be worthwhile to show at least "best in clone" or better to do
a crude "best in genome".
------------------------------------------------------------------------------
BACK-BURNER ITEMS
ZMap/acedb
----------
1/ Interface issues:
jla1 and lw2 said they would like the marked area to be less obvious an also to
be a "greying" out rather than blue and with less dense dots. edgrif to implement.
jla1 said she would like to be able to click on an exon and see evidence (and
transcripts ?) with the same splice be highlighted. laurens also wants this
as it would often avoid having to open dotter to check.
2/ Display of multiple compara alignments
multiple alignments: edgrif is about a third of the way through implementing a
more general way of displaying arbitrary blocks. This will become a high
priority item as we move to haplotypes etc.
th said this would be needed soon so it should be moved up the priority list.
jgrg said they have mappings in lace that could be passed on to zmap easily
and also said that annotators can already annotate assemblies from variants
and different species alongside each other as needed.
We need to decide on the format for specifying the alignments.
3/ alternative translations: edgrif about half way through code to do this.
edgrif is doing this as part of the protein search code since this code
does translations itself. edgrif will talk to jgrg about how alternative
genetic codes can be specified with acedb.
We need a test database for this. jgrg said this would come soon.
edgrif will add field to transcript feature to hold alternative translation
table.
4/ Blixem enhancements
two areas:
- display multiple overlapping transcripts better (includes removing the many
yellow lines introduced by this...clarify this point), have a scrolled window
of the transcripts. jgrg said that perhaps only the transcripts made by havana
should be displayed. jla1 said she would like to be able to dynamically update
the transcripts displayed.
- better interaction with zmap, e.g. click on things in zmap and see them
highlighted in blixem and vice versa....
we had better have a more generalised protocol for communicating with external
programs....
- blixem: dna searching is NOT DONE, edgrif to expedite. Also protein searches
will be added.
Perhaps one way to get this done would be employ a good C programmer on a
short contract.
5/ acedb server performance
edgrif investigating two possibilities for improving performance:
- make sgifaceserver stream data rather than batch it up, would
save a lot of memory.
- deferred loading, only load features when needed and load in
zone requested by user....design done...now need to implement.
6/ A new canvas
rds has been looking at alternative canvas implementations which offer an MVC
model. He has managed to get goocanvas developers to fix some bugs and make
some changes to support our needs.
the goocanvas MVC model will mean we do not have to copy data to split windows
meaning greatly reduced memory usage.
the goocanvas will cope automatically with the X Windows window size limit, this
combined with changes in the gtk scrolling model means we will be able to do away
with having two scroll bars.
We will introduce the new canvas this year.
Otterlace
---------
1/ Alternative alignment programs
There has been some discussion about using splice aware alignment programs.
jgrg is waiting for a fix to exonerate to support the new pipeline mustapha
has written.
edgrif and jgrg both commented that some changes to acedb data structures
would be needed to represent both HSP's that are "joined up" but also
protein matches that start part of the way through a peptide. BUT one
possibility would be for zmap to access this data directly from a mysql
database thus sidestepping the need to put it in acedb first. gffv3 will also
be needed to represent this kind of joined up HSP data in a natural and
robust way.
Changes will also be required to represent codons that are spliced across
introns as perhaps surprisingly none of the acedb programs can cope with
this currently (and neither can zmap).
2/ Spell checker
jla1 reported a problem that free text fields and some fixed text fields
have misspellings (is that a mis-spelling ?) and it would be good to have
some autocorrection facility. The ideal would be to have some widget that
allowed other dictionaries (e.g. science) to be attached to it and could thus
be used as a general text entry tool.
3/ Sequence exceptions
kj2 raised the subject of how to indicate sequence exceptions,
e.g. when bases are skipped in translations. kj2 wondered if alternative
translations could be registered as sequence exceptions, edgrif said he
prefer a separate mechanism as much of the code is already done for this.
We should therefore include a mechanism in zmap for sequence exceptions,
this would require a similar mechanism in acedb. This is yet another reason
for GFF 3 which has standards for frame shifts and other things.
There should be a way of tagging transcripts where there are sequence
exceptions.
------------------------------------------------------------------------------
Next Meeting
Will be at 2pm, 12th February 2009
==============================================================================
This diff is collapsed.
Click to expand it.
Preview
0%
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment