biogene
新手上路
积分 5
发帖 5
注册 2009-5-21 来自 中国
状态 离线
|
『楼 主』:
[求助成功]文档拆分 二 (拆分NCBI中gb格式的文档)
(拆分NCBI中gb格式的文档)
在国立生物技术信息中心(NCBI)中的序列信息还有一种格式,下载后,文档扩展名为.gb,很多个序列信息都在一个文档中,使用起来也不是太方便,请大家帮我看看能否也用拆分文本文档的方法拆分(扩展名为.gb,可以用写字板打开)。
其中:每一个序列信息的开头都以'LOCUS ',以'//
'结尾,然后又是下一个序列的信息。
要求:
1、将文件从空行处拆分为单个文件
2、将每段中如'LOCUS '后面的字符如''AY727883’'提取出来做为文件名
3、将拆分后文件的扩展名改为.seq
多谢各位大侠了。
文本样式下载地址: http://biogene.ys168.com/?jdfwkey=wwmfl1,文件名为sequences.gb, 2,125KB 中国DOS联盟求助贴 二---NCBI文件gb格式
文本样式:
LOCUS AY727883 1662 bp RNA linear VRL 13-DEC-2005
DEFINITION Newcastle disease virus isolate 88T.00 fusion protein (F) gene,
complete cds.
ACCESSION AY727883
VERSION AY727883.1 GI:52145385
KEYWORDS .
SOURCE Newcastle disease virus
ORGANISM Newcastle disease virus
Viruses; ssRNA negative-strand viruses; Mononegavirales;
Paramyxoviridae; Paramyxovirinae; Avulavirus.
REFERENCE 1 (bases 1 to 1662)
AUTHORS Zanetti,F., Berinstein,A., Pereda,A., Taboga,O. and Carrillo,E.
TITLE Molecular characterization and phylogenetic analysis of Newcastle
disease virus isolates from healthy wild birds
JOURNAL Avian Dis. 49 (4), 546-550 (2005)
PUBMED 16404997
REFERENCE 2 (bases 1 to 1662)
AUTHORS Zanetti,F., Berinstein,A. and Carrillo,E.
TITLE Direct Submission
JOURNAL Submitted (18-AUG-2004) INTA, Castelar, Instituto de Biotecnologia,
CICVyA, Los Reseros y Las Cabanas, Buenos Aires 1712, Argentina
FEATURES Location/Qualifiers
source 1..1662
/organism="Newcastle disease virus"
/mol_type="genomic RNA"
/isolate="88T.00"
/host="flamingo"
/db_xref="taxon:11176"
/country="Argentina"
gene 1..1662
/gene="F"
CDS 1..1662
/gene="F"
/note="structural glycoprotein; contains fusion protein
cleavage site"
/codon_start=1
/product="fusion protein"
/protein_id="AAU29401.1"
/db_xref="GI:52145386"
/translation="MGSRPSTKNPAPMMLTIRVALVLSCICPANSIDGRPLAAAGIVV
TGDKAVNIYTSSQTGSIIVKLLPNLPKDKEACAKAPLDAYNRTLTTLLTPLGDSIRRI
QESVTTSGGGRQGRLIGAIIGGVALGVATAAQITAAAALIQAKQNAANILRLKESIAA
TNEAVHEVTDGLSQLAVAVGKMQQFVNDQFNKTAQELDCIKIAQQVGVELNLYLTELT
TVFGPQITSPALNKLTIQALYNLAGGNMDYLLTKLGVGNNQLSSLIGSGLITGNPILY
DSQTQLLGIQVTLPSVGNLNNMRATYLETLSVSTTRGFASALVPKVVTQVGSVIEELD
TSYCIETDLDLYCTRIVTFPMSPGIYSCLSGNTSACMYSKTEGALTTPYMTIKGSVIA
NCKMTTCRCVNPPGIISQNYGEAVSLIDKQSCNVLSLGGITLRLSGEFDVTYQKNISI
QDSQVIITGNLDISTELGNVNNSISNALNKLEESNRKLDKVNVKLTSTSALITYIVLT
IISLVFGILSLILACYLMYKQKAQQKTLLWLGNNTLDQMRATTKM"
ORIGIN
1 atgggctcca gaccttctac caagaaccca gcacctatga tgctgactat ccgggttgcg
61 ctggtactga gttgcatctg tccggcaaac tccattgatg gcaggcctct tgcagctgca
121 ggaattgtgg ttacaggaga caaagccgtc aacatataca cctcatccca gacaggatca
181 atcatagtta agctcctccc gaatctgccc aaggataagg aggcatgtgc gaaagccccc
241 ttggatgcat acaacaggac attgaccact ttgctcaccc cccttggtga ctctatccgt
301 aggatacaag agtctgtgac tacatctgga ggggggagac aggggcgcct tataggcgcc
361 attattggcg gtgtggctct tggggttgca actgccgcac aaataacagc ggccgcagct
421 ctgatacaag ccaaacaaaa tgctgccaac atcctccgac ttaaagagag cattgccgca
481 accaatgagg ctgtgcatga ggtcactgac ggattatcgc aactagcagt ggcagttggg
541 aagatgcagc agtttgttaa tgaccaattt aataaaacag ctcaggaatt agactgcatc
601 aaaattgcac agcaagttgg tgtagagctc aacctgtacc taaccgaatt gactacagta
661 ttcggaccac aaatcacttc acctgcttta aacaagctga ctattcaggc actttacaat
721 ctagctggtg gaaatatgga ttacttattg actaagttag gtgtagggaa caatcaactc
781 agctcattaa tcggtagcgg cttaatcacc ggtaacccta ttctatacga ctcacagact
841 caactcttgg gtatacaggt aactctacct tcagtcggga acctaaataa tatgcgtgcc
901 acctacttgg aaaccttatc cgtaagcaca accaggggat ttgcctcggc acttgtccca
961 aaagtggtga cacaggtcgg ttctgtgata gaagaacttg acacctcata ctgtatagaa
1021 actgacttag atttatattg tacaagaata gtaacgttcc ctatgtcccc tggtatttat
1081 tcctgcttga gcggcaatac gtcggcctgt atgtactcaa agaccgaagg cgcacttact
1141 acaccataca tgactatcaa aggttcagtc atcgccaact gcaagatgac aacatgtaga
1201 tgtgtaaacc ccccgggtat catatcgcaa aactatggag aagccgtgtc tctaatagat
1261 aaacaatcat gcaatgtttt atccttaggc gggataactt taaggctcag tggggaattc
1321 gatgtaactt atcagaagaa tatctcaata caagattctc aagtaataat aacaggcaat
1381 cttgatatct caactgagct tgggaatgtc aacaactcga tcagtaatgc tttgaataag
1441 ttagaggaaa gcaacagaaa actagacaaa gtcaatgtca aactgactag cacatctgct
1501 ctcattacct atatcgtttt gactatcata tctcttgttt ttggtatact tagcctgatt
1561 ctagcatgct acctaatgta caagcaaaag gcgcaacaaa agaccttatt atggcttggg
1621 aataatactc tagatcagat gagagccact acaaaaatgt ga
//
LOCUS AY727882 1662 bp RNA linear VRL 13-DEC-2005
DEFINITION Newcastle disease virus isolate 126C.00 fusion protein (F) gene,
complete cds.
ACCESSION AY727882
VERSION AY727882.1 GI:52145383
KEYWORDS .
SOURCE Newcastle disease virus
ORGANISM Newcastle disease virus
Viruses; ssRNA negative-strand viruses; Mononegavirales;
Paramyxoviridae; Paramyxovirinae; Avulavirus.
REFERENCE 1 (bases 1 to 1662)
AUTHORS Zanetti,F., Berinstein,A., Pereda,A., Taboga,O. and Carrillo,E.
TITLE Molecular characterization and phylogenetic analysis of Newcastle
disease virus isolates from healthy wild birds
JOURNAL Avian Dis. 49 (4), 546-550 (2005)
PUBMED 16404997
REFERENCE 2 (bases 1 to 1662)
AUTHORS Zanetti,F., Berinstein,A. and Carrillo,E.
TITLE Direct Submission
JOURNAL Submitted (18-AUG-2004) INTA, Castelar, Instituto de Biotecnologia,
CICVyA, Los Reseros y Las Cabanas, Buenos Aires 1712, Argentina
FEATURES Location/Qualifiers
source 1..1662
/organism="Newcastle disease virus"
/mol_type="genomic RNA"
/isolate="126C.00"
/host="swan"
/db_xref="taxon:11176"
/country="Argentina"
gene 1..1662
/gene="F"
CDS 1..1662
/gene="F"
/note="structural glycoprotein; contains fusion protein
cleavage site"
/codon_start=1
/product="fusion protein"
/protein_id="AAU29400.1"
/db_xref="GI:52145384"
/translation="MGSGSSTKIPIPLMLTVRVALALSCARLASSLDGRPLAAAGIVV
TGDKAVNIYTSSQTGSIIVKLLPNMPKDKEACAKAPLEAYNRTLTTLLTPLGDSIHRI
QESVTTSGGGKQGRLIGAIIGGVALGVATAAQITAASALIQANQNAANILRLKESIAA
TNEAVHEVTDGLSQLAVAVGKMQQFVNDQFNKTAQELDCIKIAQQVGVELNLYLTELT
TVFGPQITSPALTQLTIQALYNLAGGNMDYLLTKLGVGNNQLSSLIGTGLITGNPILY
DSQTQLLGIQVTLPSVGNLNNMRATYLETLSVSTTKGFASALVPKVVTQVGSVIEELD
TSYCIETDLDLYCTRIVTFPMSPGIYSCLSGNTSACMYSQTEGALTTPYMTLKGSVIA
KSKMTTCRCADPPGIISQNYGEAVSLIDKQSCSILSLDGITLRLSGEFDATYQKNISI
QDSQVIVTGNLDISTELGNVNNSISNALDKLEESNSKLDKVNVKLTSTSALITYIVLT
VISLICGILSLVLACYLMYKQKAQQKTLLWLGNNTLDQMRATTKM"
ORIGIN
1 atgggctccg gatcttctac caagattccg atacctctga tgctgaccgt ccgggtcgca
61 ctggcactaa gttgcgcccg tctggcaagc tctcttgatg gcaggcctct tgccgctgcg
121 ggaattgtgg tgacaggaga caaggcagtc aacatatata cctcatctca gactgggtca
181 atcatagtca agctactccc gaatatgccc aaggataaag aggcatgtgc aaaagccccc
241 ttagaggcat acaacaggac attgaccact ttgctcaccc cccttggtga ttccattcat
301 aggatacaag agtctgtgac cacatctgga ggagggaaac agggacgcct gataggtgct
361 attattggcg gtgtagctct tggggttgca actgccgcac aaataacagc agcctcggct
421 ctgatacaag caaaccaaaa tgctgctaac atcctccggc ttaaagaaag cattgccgct
481 accaatgagg ctgtgcatga ggtcactgac ggattatccc aactagcagt ggcagttggg
541 aagatgcagc agtttgttaa tgaccaattt aataaaacag ctcaggaact agactgtatc
601 aaaattgccc agcaggttgg tgtagagctc aacctgtacc taactgaatt gactacggta
661 tttgggccac aaatcacttc tcctgcccta acccagctga ctatccaggc actttataat
721 ctagctggtg ggaatatgga ttatttgttg actaagttag gtgtgggaaa caatcaactc
781 agttcattaa ttggtaccgg cttaatcact ggcaacccta ttctgtatga ctcacaaact
841 caactcttgg gtatacaggt gaccttaccc tcagtcggga acctaaataa tatgcgtgcc
901 acctacttag agaccctgtc cgtaagtaca accaaaggat ttgcctcagc actcgtgcct
961 aaagtggtga ctcaggtcgg ttctgtgata gaagaacttg acacctcata ttgtatagaa
1021 accgacttag atttatattg tacaaggata gtgacattcc ctatgtctcc tggtatctat
1081 tcctgcctga gcggcaatac atcggcttgc atgtactcgc agactgaagg tgcacttacc
1141 acgccatata tgactctcaa aggttcagtt attgccaagt ctaagatgac aacatgtaga
1201 tgtgcggacc ccccgggtat catatcacag aattatggag aagccgtgtc tctgatagat
1261 aagcagtcat gcagtatcct atctttagac gggataactt tgaggctcag tggggaattt
1321 gacgctactt atcagaagaa tatctcaata caggactctc aagtaatagt aacaggtaat
1381 cttgatatct caactgaact tgggaatgtt aacaactcga taagtaatgc tttagataaa
1441 ttagaagaaa gcaacagcaa actagacaaa gtcaacgtca aattgactag cacatctgct
1501 ctcattacct atatcgtttt aactgtcata tctcttattt gtggtatact tagtctagtt
1561 ctagcgtgct acttaatgta caagcagaag gcacaacaaa agaccttatt atggcttggg
1621 aataatactc tagatcagat gagagccacc acaaaaatgt ga
//
LOCUS AY727881 1662 bp RNA linear VRL 13-DEC-2005
DEFINITION Newcastle disease virus isolate 32C/T.98 fusion protein (F) gene,
complete cds.
ACCESSION AY727881
VERSION AY727881.1 GI:52145381
KEYWORDS .
SOURCE Newcastle disease virus
ORGANISM Newcastle disease virus
Viruses; ssRNA negative-strand viruses; Mononegavirales;
Paramyxoviridae; Paramyxovirinae; Avulavirus.
REFERENCE 1 (bases 1 to 1662)
AUTHORS Zanetti,F., Berinstein,A., Pereda,A., Taboga,O. and Carrillo,E.
TITLE Molecular characterization and phylogenetic analysis of Newcastle
disease virus isolates from healthy wild birds
JOURNAL Avian Dis. 49 (4), 546-550 (2005)
PUBMED 16404997
REFERENCE 2 (bases 1 to 1662)
AUTHORS Zanetti,F., Berinstein,A. and Carrillo,E.
TITLE Direct Submission
JOURNAL Submitted (18-AUG-2004) INTA, Castelar, Instituto de Biotecnologia,
CICVyA, Los Reseros y Las Cabanas, Buenos Aires 1712, Argentina
FEATURES Location/Qualifiers
source 1..1662
/organism="Newcastle disease virus"
/mol_type="genomic RNA"
/isolate="32C/T.98"
/host="duck"
/db_xref="taxon:11176"
/country="Argentina"
gene 1..1662
/gene="F"
CDS 1..1662
/gene="F"
/note="structural glycoprotein; contains fusion protein
cleavage site"
/codon_start=1
/product="fusion protein"
/protein_id="AAU29399.1"
/db_xref="GI:52145382"
/translation="MGSGSSTRIPIPLMLTIRVALALSCVCLASSLDGRPLAAAGIVV
TGDKAVNIYTSSQTGSIIVKLLPNMPKDKEACAKAPLEAYNRTLTTLLTPLGDSIHRI
QESVTTSGGGKQGRLIGAIIGGVALGVATAAQITAASALIQANQNAANILRLKESIAA
TNEAVHEVTDGLSQLAVAVGKMQQFVNDQFNKTAQELDCIKIAQQVGVELNLYLTELT
TVFGPQITSPALTQLTIQALYNLAGGNMDYPLTKLGVGNNQLSLLIGSGLITGHPILY
DSQTQLLGIQVTLPSVGNLNNMRATYLETLSVSTTRGFASALVPKVVTQVGSVIEELD
TSYCIETDLDLYCTRIVTFPMSPGIYSCLSGNTSACMYSQTEGALTTPYMTLKGAVIA
NCKMTTCRCADPPGIISQNYGEAVSLIDKQSCSILSLDGITLRLSGEFDATYQKNISI
QDSQVIVTGNLDISTELGNVNNSISNALDKLEESNSKLDKVNVKLTSTSALITYIVLT
VISLVCGILSLVLACYLMYKQKAQQKTLLWLGNNTLDQMRATTKM"
ORIGIN
1 atgggctccg gatcttctac caggattccg atacctctga tgctgaccat ccgggtcgca
61 ctggcactaa gttgcgtctg tctggcaagc tctcttgatg gcaggcctct tgcagctgca
121 ggaattgtgg tgacaggaga caaggcagtc aacatatata cctcgtctca gactgggtca
181 atcatagtca agctactccc gaatatgccc aaggataaag aggcatgtgc aaaagccccc
241 ttagaggcat acaacaggac attgaccact ttgctcaccc cccttggtga ttccattcat
301 aggatacagg agtctgtgac cacatctgga ggagggaaac agggacgcct gataggcgct
361 attattggcg gtgtagctct tggggttgca actgccgcac aaataacagc agcctcggct
421 ttgatacaag caaaccaaaa tgctgctaac atcctccggc ttaaagaaag cattgccgct
481 accaatgagg ctgtgcatga ggtcactgac ggattatccc aactagcagt ggcagttggg
541 aagatgcagc agtttgttaa tgaccaattt aataaaacag ctcaggaact agactgtatc
601 aaaattgccc agcaggttgg tgtagagctc aacctgtacc taactgaatt gactacggta
661 ttcgggccac aaatcacttc acctgcctta acccagctga ctatccaggc actttataat
721 ctagctggtg ggaatatgga ctacccgttg actaagttag gtgtgggaaa caatcagctc
781 agcttattaa ttggtagcgg cttaatcact ggtcacccta ttctgtatga ctcacaaact
841 caactcttgg gtatacaggt gaccttaccc tcagtcggga acctaaataa tatgcgtgcc
901 acctacttgg agaccttgtc cgtaagtaca accaggggat ttgcctcagc actcgtacct
961 aaagtggtga ctcaggtcgg ctctgtgata gaagaacttg acacctcata ttgtatagaa
1021 accgacttag atttatattg tacaaggata gtgacattcc ctatgtctcc tggtatctat
1081 tcctgcctga gcggcaatac atcggcttgc atgtactcgc agactgaagg tgcacttacc
1141 acgccatata tgactctcaa aggcgcagtt attgccaact gtaagatgac aacatgtaga
1201 tgtgcggacc ccccgggtat catatcacag aattatggag aagccgtgtc tctgatagat
1261 aagcagtcat gcagtatcct atccttagac gggataactt tgaggctcag tggggaattt
1321 gacgctactt atcagaagaa tatctcaata caggactctc aagtaatagt aacaggtaat
1381 cttgatatct caactgaact tgggaatgtt aacaactcga taagtaatgc tttagataaa
1441 ttagaagaaa gcaacagcaa actagacaaa gtcaacgtca aattgactag cacatctgct
1501 ctcattacct atatcgtttt aactgtcata tctcttgttt gtggtatact tagtctagtt
1561 ctagcgtgct acttaatgta caagcagaag gcacaacaaa agaccttatt atggcttggg
1621 aataatactc tagatcagat gagagccacc acaaaaatgt ga
//
。
。
。
。
。
。
。
。
。
[ Last edited by biogene on 2009-6-29 at 07:10 ]
|
|