Tuesday, 3 September 2013

Python print lines from to

Python print lines from to

I would like to solve my problem, which is: When conditions are met for my
line, print out all lines starting from this line up until this line +
value
I have a code looking like this:
import re
##
def round_down(num):
return num - (num%100000) ###reduce search space
##
##
##def Filter(infile, outfile):
##out = open(outfile,'w')
infile = open('AT_rich','r')
cov = open('30x_good_ok_bad_0COV','r') ###File with non platinum regions
#platinum_region = [row for row in Pt]
platinum_region={} ### create dictionary for non platinum regions. Works
fast
platinum_region['chrM']={}
platinum_region['chrM'][0]=[]
ct=0
for region in infile:
(chr,start,end,types,length)= region.strip().split()
start=int(start)
end=int(end)
length = int(length)
rounded_start=round_down(start)
##
if not (chr in platinum_region):
platinum_region[chr]={}
if not (rounded_start in platinum_region[chr]):
platinum_region[chr][rounded_start]=[]
platinum_region[chr][rounded_start].append({'start':start,'end':end,'length':length})
##
##c=0
for vcf_line in cov: ###process file with indels
## if (c % 1000 ==0):print "c ",c
## c=c+1
vcf_data = vcf_line.strip().split()
vcf_chrom=vcf_data[0]
vcf_pos=int(vcf_data[1])
vcf_end=int(vcf_data[2])
coverage = int(vcf_data[3])
rounded_vcf_position=round_down(vcf_pos) ###round positions to reduce
search space
## print vcf_chrom
## for vcf_line in infile: ###process file with indels
## if (c % 1000 ==0):print "c ",c
overlapping = 'false'
if vcf_chrom in platinum_region and rounded_vcf_position in
platinum_region[vcf_chrom]:
for region in platinum_region[vcf_chrom][rounded_vcf_position]:
if (vcf_pos == region['start']):# and vcf_end ==
region['end']):# and (vcf_end > region['start'] and vcf_end <
region['end']):
if vcf_chrom != 'chrX' and vcf_chrom != 'chrY':
print vcf_data
Files are just set of intervals start-end, first column[0] conatins
chromosome ex.'chr1':
cov:
chr17 29126299 29126325 AT_rich 26
chr17 29150075 29150097 AT_rich 22
chr17 29152367 29152397 AT_rich 30
last column is the region['length']
infile:
chrM 10508 10509 4247
chrM 10509 10510 4251
chrM 10510 10511 4254
chrM 10511 10513 4253
chrM 10513 10515 4254
I would like to print lines starting from vcf_data(when conditions met)
until vcf_data + region['length']. What is the way to add this to my code?

No comments:

Post a Comment