Class UniprotProxySequenceReader<C extends Compound>
java.lang.Object
org.biojava3.core.sequence.loader.UniprotProxySequenceReader<C>
- Type Parameters:
C-
- All Implemented Interfaces:
Iterable<C>,DatabaseReferenceInterface,FeaturesKeyWordInterface,Accessioned,ProxySequenceReader<C>,Sequence<C>,SequenceReader<C>
public class UniprotProxySequenceReader<C extends Compound>
extends Object
implements ProxySequenceReader<C>, FeaturesKeyWordInterface, DatabaseReferenceInterface
Pass in a Uniprot ID and this ProxySequenceReader when passed to a ProteinSequence will get the
sequence data and other data elements associated with the ProteinSequence by Uniprot. This is an
example of how to map external databases of proteins and features to the BioJava3
ProteinSequence. Important to call @see setUniprotDirectoryCache to allow caching of XML files so
they don't need to be reloaded each time. Does not manage cache.
-
Constructor Summary
ConstructorsConstructorDescriptionUniprotProxySequenceReader(String accession, CompoundSet<C> compoundSet) The uniprot id is used to retrieve the uniprot XML which is then parsed as a DOM object so we know everything about the protein. -
Method Summary
Modifier and TypeMethodDescriptionintcountCompounds(C... compounds) Returns the number of times we found a compound in the SequenceReturns the AccessionID this location is currently bound withReturns the Sequence as a List of compoundsgetCompoundAt(int position) Returns the Compound at the given biological indexGets the compound set used to back this SequenceThe Uniprot mappings to other database identifiers for this sequenceGet the gene name associated with this sequence.intgetIndexOf(C compound) Scans through the Sequence looking for the first occurrence of the given compoundDoes the right thing to get the inverse of the current Sequence.Pull uniprot key words which is a mixed bag of words associated with this sequenceintgetLastIndexOf(C compound) Scans through the Sequence looking for the last occurrence of the given compoundintThe sequence lengthGet the organism name assigned to this sequenceReturns the String representation of the SequencegetSequenceAsString(Integer bioBegin, Integer bioEnd, Strand strand) getSubSequence(Integer bioBegin, Integer bioEnd) Returns a portion of the sequence from the different positions.static StringThe current unirpot URL to deal with caching issues. www.uniprot.org is loaded balanced but you can access pir.uniprot.org directly.static StringLocal directory cache of XML that can be downloadediterator()static voidvoidsetCompoundSet(CompoundSet<C> compoundSet) voidsetContents(String sequence) Once the sequence is retrieved set the contents and make sure everything this is validstatic voidsetUniprotbaseURL(String aUniprotbaseURL) static voidsetUniprotDirectoryCache(String aUniprotDirectoryCache) toString()Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, waitMethods inherited from interface java.lang.Iterable
forEach, spliterator
-
Constructor Details
-
UniprotProxySequenceReader
The uniprot id is used to retrieve the uniprot XML which is then parsed as a DOM object so we know everything about the protein. If an error occurs throw an exception. We could have a bad uniprot id or network error- Parameters:
accession-compoundSet-- Throws:
Exception
-
-
Method Details
-
setCompoundSet
- Specified by:
setCompoundSetin interfaceSequenceReader<C extends Compound>
-
setContents
Once the sequence is retrieved set the contents and make sure everything this is valid- Specified by:
setContentsin interfaceSequenceReader<C extends Compound>- Parameters:
sequence-
-
getLength
public int getLength()The sequence length -
getCompoundAt
Description copied from interface:SequenceReturns the Compound at the given biological index- Specified by:
getCompoundAtin interfaceSequence<C extends Compound>- Parameters:
position-- Returns:
-
getIndexOf
Description copied from interface:SequenceScans through the Sequence looking for the first occurrence of the given compound- Specified by:
getIndexOfin interfaceSequence<C extends Compound>- Parameters:
compound-- Returns:
-
getLastIndexOf
Description copied from interface:SequenceScans through the Sequence looking for the last occurrence of the given compound- Specified by:
getLastIndexOfin interfaceSequence<C extends Compound>- Parameters:
compound-- Returns:
-
toString
-
getSequenceAsString
Description copied from interface:SequenceReturns the String representation of the Sequence- Specified by:
getSequenceAsStringin interfaceSequence<C extends Compound>- Returns:
-
getAsList
Description copied from interface:SequenceReturns the Sequence as a List of compounds -
getInverse
Description copied from interface:SequenceDoes the right thing to get the inverse of the current Sequence. This means either reversing the Sequence and optionally complementing the Sequence.- Specified by:
getInversein interfaceSequence<C extends Compound>- Returns:
-
getSequenceAsString
- Parameters:
bioBegin-bioEnd-strand-- Returns:
-
getSubSequence
Description copied from interface:SequenceReturns a portion of the sequence from the different positions. This is indexed from 1- Specified by:
getSubSequencein interfaceSequence<C extends Compound>- Parameters:
bioBegin-bioEnd-- Returns:
-
iterator
-
getCompoundSet
Description copied from interface:SequenceGets the compound set used to back this Sequence- Specified by:
getCompoundSetin interfaceSequence<C extends Compound>- Returns:
-
getAccession
Description copied from interface:AccessionedReturns the AccessionID this location is currently bound with- Specified by:
getAccessionin interfaceAccessioned- Returns:
-
countCompounds
Description copied from interface:SequenceReturns the number of times we found a compound in the Sequence- Specified by:
countCompoundsin interfaceSequence<C extends Compound>- Parameters:
compounds-- Returns:
-
getUniprotbaseURL
The current unirpot URL to deal with caching issues. www.uniprot.org is loaded balanced but you can access pir.uniprot.org directly.- Returns:
- the uniprotbaseURL
-
setUniprotbaseURL
- Parameters:
aUniprotbaseURL- the uniprotbaseURL to set
-
getUniprotDirectoryCache
Local directory cache of XML that can be downloaded- Returns:
- the uniprotDirectoryCache
-
setUniprotDirectoryCache
- Parameters:
aUniprotDirectoryCache- the uniprotDirectoryCache to set
-
main
-
getGeneName
Get the gene name associated with this sequence.- Returns:
- Throws:
Exception
-
getOrganismName
Get the organism name assigned to this sequence- Returns:
- Throws:
Exception
-
getKeyWords
Pull uniprot key words which is a mixed bag of words associated with this sequence- Specified by:
getKeyWordsin interfaceFeaturesKeyWordInterface- Returns:
- Throws:
Exception
-
getDatabaseReferences
The Uniprot mappings to other database identifiers for this sequence- Specified by:
getDatabaseReferencesin interfaceDatabaseReferenceInterface- Returns:
- Throws:
Exception
-