@@ -20,44 +20,48 @@ Use the `readsas` function to read the file. The result is a dictionary of vari
20
20
``` julia
21
21
julia> using SASLib
22
22
23
- julia> x = readsas (" test1 .sas7bdat" )
24
- Read data set of size 10 x 100 in 0.019 seconds
23
+ julia> x = readsas (" productsales .sas7bdat" )
24
+ Read data set of size 1440 x 10 in 2.0 seconds
25
25
Dict{Symbol,Any} with 16 entries:
26
- :filename => " test1 .sas7bdat"
27
- :page_length => 65536
28
- :file_encoding => " wlatin1 "
26
+ :filename => " productsales .sas7bdat"
27
+ :page_length => 8192
28
+ :file_encoding => " US-ASCII "
29
29
:system_endianness => :LittleEndian
30
- :ncols => 100
31
- :column_types => DataType [Float64, String, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64 … Float64, Float64 …
32
- :data => Dict {Any,Any} (Pair {Any,Any} (:Column60 , [2987 .0 , 8194 .0 , 9820 .0 , 8252 .0 , 9640 .0 , 9168 .0 , 7547 .0 , 1419 .0 , 4884 .0 , NaN ]) …
33
- :perf_type_conversion => 0.0052096
34
- :page_count => 1
35
- :column_names => String[" Column60 " , " Column42 " , " Column68 " , " Column35 " , " Column33 " , " Column1 " , " Column41 " , " Column16 " , " Column72 " , " Co…
36
- :column_symbols => Symbol[:Column60 , :Column42 , :Column68 , :Column35 , :Column33 , :Column1 , :Column41 , :Column16 , :Column72 , :Column19 ……
37
- :column_lengths => [8, 9, 8, 8, 8, 9, 8, 8 , 8, 9 … 8, 8, 8, 5, 8, 8, 8, 9 , 8, 8]
30
+ :ncols => 10
31
+ :column_types => Type [Float64, Float64, Union{AbstractString, Missings . Missing}, Union{AbstractString, Missings . Missing}, Union{AbstractString, …
32
+ :data => Dict {Any,Any} (Pair {Any,Any} (:QUARTER , [1 .0 , 1 .0 , 1 .0 , 2 .0 , 2 .0 , 2 .0 , 3 .0 , 3 .0 , 3 .0 , 4.0 … 1.0 , 2.0 , 2.0 , 2.0 , 3.0 , 3.0 , 3.0 , …
33
+ :perf_type_conversion => 0.0262305
34
+ :page_count => 18
35
+ :column_names => String[" QUARTER " , " YEAR " , " COUNTRY " , " DIVISION " , " REGION " , " MONTH " , " PREDICT " , " ACTUAL " , " PRODTYPE " , " PRODUCT " ]
36
+ :column_symbols => Symbol[:QUARTER , :YEAR , :COUNTRY , :DIVISION , :REGION , :MONTH , :PREDICT , :ACTUAL , :PRODTYPE , :PRODUCT ]
37
+ :column_lengths => [8 , 8 , 10 , 10 , 10 , 10 , 10 , 8 , 8 , 8 ]
38
38
:file_endianness => :LittleEndian
39
- :nrows => 10
40
- :perf_read_data => 0.00612195
41
- :column_offsets => [0, 600, 8, 16, 24, 609, 32, 40, 48, 618 … 536, 544, 552, 795, 560, 568, 576, 800, 584, 592 ]
39
+ :nrows => 1440
40
+ :perf_read_data => 0.00639309
41
+ :column_offsets => [0 , 8 , 40 , 50 , 60 , 70 , 80 , 16 , 24 , 32 ]
42
42
```
43
43
44
44
Number of columns and rows are returned as in `:ncols` and `:nrows` respectively.
45
45
46
46
The data, reference by `:data` key, is represented as a Dict object with the column symbol as the key.
47
47
48
48
``` juia
49
- julia> x[:data][:Column1]
50
- 10-element Array{Float64,1}:
51
- 0.636
52
- 0.283
53
- 0.452
54
- 0.557
55
- 0.138
56
- 0.948
57
- 0.162
58
- 0.148
59
- NaN
60
- 0.663
49
+ julia> x[:data ][:ACTUAL ]
50
+ 1440 - element Array{Float64,1 }:
51
+ 925.0
52
+ 999.0
53
+ 608.0
54
+ 642.0
55
+ 656.0
56
+ 948.0
57
+ 612.0
58
+ 114.0
59
+ 685.0
60
+ 657.0
61
+ 608.0
62
+ 353.0
63
+ 107.0
64
+ ⋮
61
65
```
62
66
63
67
If you really like DataFrame, you can easily convert as such:
@@ -67,26 +71,53 @@ julia> using DataFrames
67
71
68
72
julia> df = DataFrame (x[:data ]);
69
73
70
- julia> df[:, 1:5]
71
- 10×5 DataFrames.DataFrame
72
- │ Row │ Column1 │ Column10 │ Column100 │ Column11 │ Column12 │
73
- ├─────┼─────────┼─────────────┼───────────┼──────────┼────────────┤
74
- │ 1 │ 0.636 │ " apple " │ 3230.0 │ NaN │ 1986-07-20 │
75
- │ 2 │ 0.283 │ " apple " │ 4904.0 │ 22.0 │ 1983-07-15 │
76
- │ 3 │ 0.452 │ " apple " │ NaN │ 7.0 │ 1973-11-27 │
77
- │ 4 │ 0.557 │ " dog " │ 8566.0 │ 26.0 │ 1967-01-20 │
78
- │ 5 │ 0.138 │ " crocodile " │ 894.0 │ 11.0 │ 1970-11-29 │
79
- │ 6 │ 0.948 │ " crocodile " │ 6088.0 │ 27.0 │ 1963-01-09 │
80
- │ 7 │ 0.162 │ " " │ 6122.0 │ NaN │ 1979-10-18 │
81
- │ 8 │ 0.148 │ " crocodile " │ 2570.0 │ 5.0 │ 1961-03-15 │
82
- │ 9 │ NaN │ " pear " │ 2709.0 │ 12.0 │ 1964-06-15 │
83
- │ 10 │ 0.663 │ " pear " │ NaN │ 16.0 │ 1985-01-28 │
74
+ julia> head (df, 5 )
75
+ 5 × 10 DataFrames. DataFrame
76
+ │ Row │ ACTUAL │ COUNTRY │ DIVISION │ MONTH │ PREDICT │ PRODTYPE │ PRODUCT │ QUARTER │ REGION │ YEAR │
77
+ ├─────┼────────┼─────────┼───────────┼────────────┼─────────┼───────────┼─────────┼─────────┼────────┼────────┤
78
+ │ 1 │ 925.0 │ CANADA │ EDUCATION │ 1993 - 01 - 01 │ 850.0 │ FURNITURE │ SOFA │ 1.0 │ EAST │ 1993.0 │
79
+ │ 2 │ 999.0 │ CANADA │ EDUCATION │ 1993 - 02 - 01 │ 297.0 │ FURNITURE │ SOFA │ 1.0 │ EAST │ 1993.0 │
80
+ │ 3 │ 608.0 │ CANADA │ EDUCATION │ 1993 - 03 - 01 │ 846.0 │ FURNITURE │ SOFA │ 1.0 │ EAST │ 1993.0 │
81
+ │ 4 │ 642.0 │ CANADA │ EDUCATION │ 1993 - 04 - 01 │ 533.0 │ FURNITURE │ SOFA │ 2.0 │ EAST │ 1993.0 │
82
+ │ 5 │ 656.0 │ CANADA │ EDUCATION │ 1993 - 05 - 01 │ 646.0 │ FURNITURE │ SOFA │ 2.0 │ EAST │ 1993.0 │
83
+ ```
84
+
85
+ If you only need to read few columns, just pass an `include_columns` argument:
86
+
87
+ ```
88
+ julia> head (DataFrame (readsas (" productsales.sas7bdat" , include_columns= [:YEAR , :MONTH , :PRODUCT , :ACTUAL ])[:data ]))
89
+ Read data set of size 1440 x 4 in 0.004 seconds
90
+ 6 × 4 DataFrames. DataFrame
91
+ │ Row │ ACTUAL │ MONTH │ PRODUCT │ YEAR │
92
+ ├─────┼────────┼────────────┼─────────┼────────┤
93
+ │ 1 │ 925.0 │ 1993 - 01 - 01 │ SOFA │ 1993.0 │
94
+ │ 2 │ 999.0 │ 1993 - 02 - 01 │ SOFA │ 1993.0 │
95
+ │ 3 │ 608.0 │ 1993 - 03 - 01 │ SOFA │ 1993.0 │
96
+ │ 4 │ 642.0 │ 1993 - 04 - 01 │ SOFA │ 1993.0 │
97
+ │ 5 │ 656.0 │ 1993 - 05 - 01 │ SOFA │ 1993.0 │
98
+ │ 6 │ 948.0 │ 1993 - 06 - 01 │ SOFA │ 1993.0 │
99
+ ```
100
+
101
+ Likewise, you can read all columns except the ones you don't want as specified in `exclude_columns` argument:
102
+
103
+ ```
104
+ julia> head (DataFrame (readsas (" productsales.sas7bdat" , exclude_columns= [:YEAR , :MONTH , :PRODUCT , :ACTUAL ])[:data ]))
105
+ Read data set of size 1440 x 6 in 0.031 seconds
106
+ 6 × 6 DataFrames. DataFrame
107
+ │ Row │ COUNTRY │ DIVISION │ PREDICT │ PRODTYPE │ QUARTER │ REGION │
108
+ ├─────┼─────────┼───────────┼─────────┼───────────┼─────────┼────────┤
109
+ │ 1 │ CANADA │ EDUCATION │ 850.0 │ FURNITURE │ 1.0 │ EAST │
110
+ │ 2 │ CANADA │ EDUCATION │ 297.0 │ FURNITURE │ 1.0 │ EAST │
111
+ │ 3 │ CANADA │ EDUCATION │ 846.0 │ FURNITURE │ 1.0 │ EAST │
112
+ │ 4 │ CANADA │ EDUCATION │ 533.0 │ FURNITURE │ 2.0 │ EAST │
113
+ │ 5 │ CANADA │ EDUCATION │ 646.0 │ FURNITURE │ 2.0 │ EAST │
114
+ │ 6 │ CANADA │ EDUCATION │ 486.0 │ FURNITURE │ 2.0 │ EAST │
84
115
```
85
116
86
117
If you need to read files incrementally:
87
118
88
119
``` julia
89
- handler = SASLib.open(" test1 . sas7bdat " )
120
+ handler = SASLib. open (" productsales .sas7bdat" )
90
121
results = SASLib. read (handler, 3 ) # read 3 rows
91
122
results = SASLib. read (handler, 4 ) # read next 4 rows
92
123
SASLib. close (handler) # remember to close the handler when done
0 commit comments