-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
96 lines (60 loc) · 2.87 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
---
output: github_document
author: Marco Cacciabue, Melina Obregón, Axel N. Fenoglio
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
app_name <- "voRtex"
```
# **`r app_name `** <img src='man/Figures/hex2.png' style="float:right; height:200px;"/>
## **`r app_name `** is an R package
voRtex is a package created with the purpose to help in the data analysis of large groups of foot and mouth desease rna sequences.
this package contains a collection of functions designed to manage and analyze sample data in an efficient way.
It manages VCF files, bed files, fasta files, DNAStringSet and else.
## Examples
#### VCFToDataFrame
VCFToDataFrame.R creates a data frame off a vcf file, containing the information
of allel position, allel frecuency and depth coverage
``` r
file <- system.file("extdata", "variant_file.vcf", package = "voRtex", mustWork = TRUE)
vcf_data <- VariantAnnotation::readVcf(file)
vcfdataframe<-VCFToDataFrame(vcf_data)
vcfdataframe
```
#### Compute_coverage
With compute_coverage.R we can create a dataframe containing the average position and coverage of a sequence, using a .bed file and giving the function a window size of choosing
``` r
FilePath <- system.file("extdata", "SRR12664421_full_coverage.bed",
package = "voRtex", mustWork = TRUE)
data <- read.table(FilePath, col.names = c("reference", "startpos", "endpos", "coverage"))
data_processed<-compute_coverage(data, 50,TRUE)
```
#### ggplot_heatmap
Then with ggplot_heatmap.R we can create a heatmap based on the data frame created with compute_coverage
``` r
color <- c("#D53E4F","#F46D43","#FDAE61","#FEE08B","#E6F598","#ABDDA4","#66C2A5","#3288BD")
Rplot<-ggplot_heatmap(inputdata=data_processed,
color_pal = color)
Rplot
```
Resulting in this beautiful heat map
![SRR12664421_Heatmap](Rplot.png)
#### NreadFilter
With NreadFilter we use a .bed file containing the coverage of a sequence, and it creates a dateframe with that information and then filters the rows based on a filter value, and returns a list containing the filtered dataframe and the percentage of bases that passed the filter.
``` r
FilePath <- system.file("extdata", "SRR12664421_full_coverage.bed",
package = "voRtex", mustWork = TRUE)
OGDataFrame <- read.table(FilePath,
col.names = c("reference","startpos","endpos","nreads"))
salida <- NreadFilter(OGDataFrame,5000)
```
#### NCountFilter
With NCountFilter we take a DNAStringSet file and filter it according to the minimum number of "N" (base not readed) we want.
``` r
FilePath <- system.file("extdata",
"renamed_all.fasta",
package = "voRtex",
mustWork = TRUE)
DNASequence <- Biostrings::readDNAStringSet(FilePath)
NCountFilter(DNASequence,1000)
```