Dear all: I am developing my package for my projects, and I have done couple of utility function that used for parsing bed files in R. My goal is to parse and analyze multiple bed files in parallel, in ideal case, we have three sample that comes from chip-seq experiments where each has different length, and goal is to process multiple sample in parallel. *my input parameter is set of bed files, when I am gonna index first bed files as querySample, while rest of bed files are being as targetSample. I used findOverlaps function from GenomicRanges packages, when all features from querySample are overlapped with all features from targetSample, report overlapped peak and generate new bed files to save them , then chose second bed file as QuerySaple, while chose others as targetSample, repeat above process. * here is my question, hope dear member give me some idea how to get out this problem. *how to iterate and index set of bedFiles?* *FYI, I carefully read posting guide for how to ask question in Bio-Dev mailing list, if I made mistake on that, I will be appreciate if someone remind me. Many thanks to all of you* I think there is set of combination, such as below: bed.1 parallel map to (bed.2, bed.3, bed.4) bed.2 parallel map to (bed.1, bed.3, bed.4) bed.3 parallel map to (bed.1, bed.2, bed.4) bed.4 parallel map to (bed.1, bed.2, bed.3)
for example, this my R code: indexSample <- function(bedFiles, desDir=getwd(), verbose=FALSE){ if(is(bedFiles %in% desDir)){ file <- list.files(path = bedFiles) idx <- unlist(sapply("bed", grep, file)) idx <- sort(unique(idx)) bedFiles <- file[idx] for(j in 1:length(bedFiles)){ qSample <- bedFiles[1] # chosen querySample bed file qIdx <- which(j==qSample) if(!is(qSample[1],"GRanges")){ qSample.gr <- loadSample(qSample) # loadSample to read bed file as GRanges objects } else{ qSample.gr <- qSample } # there is code that access all features of qSample [I have done already] for(jj in 2:length(bedFiles)){ tSample <- bedFiles[jj] # rest of bed files (multiple) # there is code that put all features of tSample in GNCList object [I have done] } # then call findOverlap from GenomicRanges packages } } # return result of first case } -- Jurat Shahidin Ph.D. candidate Dipartimento di Elettronica, Informazione e Bioingegneria Politecnico di Milano Piazza Leonardo da Vinci 32 - 20133 Milano, Italy Mobile : +39 3279366608 [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel