Skip to content

Commit a865410

Browse files
authored
Merge pull request #12 from tk3369/issue-3
Issue 3 implemented
2 parents b2ccb3a + 71c1ad1 commit a865410

File tree

8 files changed

+469
-125
lines changed

8 files changed

+469
-125
lines changed

README.md

+74-43
Original file line numberDiff line numberDiff line change
@@ -20,44 +20,48 @@ Use the `readsas` function to read the file. The result is a dictionary of vari
2020
```julia
2121
julia> using SASLib
2222

23-
julia> x = readsas("test1.sas7bdat")
24-
Read data set of size 10 x 100 in 0.019 seconds
23+
julia> x = readsas("productsales.sas7bdat")
24+
Read data set of size 1440 x 10 in 2.0 seconds
2525
Dict{Symbol,Any} with 16 entries:
26-
:filename => "test1.sas7bdat"
27-
:page_length => 65536
28-
:file_encoding => "wlatin1"
26+
:filename => "productsales.sas7bdat"
27+
:page_length => 8192
28+
:file_encoding => "US-ASCII"
2929
:system_endianness => :LittleEndian
30-
:ncols => 100
31-
:column_types => DataType[Float64, String, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64 Float64, Float64
32-
:data => Dict{Any,Any}(Pair{Any,Any}(:Column60, [2987.0, 8194.0, 9820.0, 8252.0, 9640.0, 9168.0, 7547.0, 1419.0, 4884.0, NaN])
33-
:perf_type_conversion => 0.0052096
34-
:page_count => 1
35-
:column_names => String["Column60", "Column42", "Column68", "Column35", "Column33", "Column1", "Column41", "Column16", "Column72", "Co…
36-
:column_symbols => Symbol[:Column60, :Column42, :Column68, :Column35, :Column33, :Column1, :Column41, :Column16, :Column72, :Column19 ……
37-
:column_lengths => [8, 9, 8, 8, 8, 9, 8, 8, 8, 9 … 8, 8, 8, 5, 8, 8, 8, 9, 8, 8]
30+
:ncols => 10
31+
:column_types => Type[Float64, Float64, Union{AbstractString, Missings.Missing}, Union{AbstractString, Missings.Missing}, Union{AbstractString,
32+
:data => Dict{Any,Any}(Pair{Any,Any}(:QUARTER, [1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 4.0 1.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0,
33+
:perf_type_conversion => 0.0262305
34+
:page_count => 18
35+
:column_names => String["QUARTER", "YEAR", "COUNTRY", "DIVISION", "REGION", "MONTH", "PREDICT", "ACTUAL", "PRODTYPE", "PRODUCT"]
36+
:column_symbols => Symbol[:QUARTER, :YEAR, :COUNTRY, :DIVISION, :REGION, :MONTH, :PREDICT, :ACTUAL, :PRODTYPE, :PRODUCT]
37+
:column_lengths => [8, 8, 10, 10, 10, 10, 10, 8, 8, 8]
3838
:file_endianness => :LittleEndian
39-
:nrows => 10
40-
:perf_read_data => 0.00612195
41-
:column_offsets => [0, 600, 8, 16, 24, 609, 32, 40, 48, 618 … 536, 544, 552, 795, 560, 568, 576, 800, 584, 592]
39+
:nrows => 1440
40+
:perf_read_data => 0.00639309
41+
:column_offsets => [0, 8, 40, 50, 60, 70, 80, 16, 24, 32]
4242
```
4343
4444
Number of columns and rows are returned as in `:ncols` and `:nrows` respectively.
4545
4646
The data, reference by `:data` key, is represented as a Dict object with the column symbol as the key.
4747
4848
```juia
49-
julia> x[:data][:Column1]
50-
10-element Array{Float64,1}:
51-
0.636
52-
0.283
53-
0.452
54-
0.557
55-
0.138
56-
0.948
57-
0.162
58-
0.148
59-
NaN
60-
0.663
49+
julia> x[:data][:ACTUAL]
50+
1440-element Array{Float64,1}:
51+
925.0
52+
999.0
53+
608.0
54+
642.0
55+
656.0
56+
948.0
57+
612.0
58+
114.0
59+
685.0
60+
657.0
61+
608.0
62+
353.0
63+
107.0
64+
6165
```
6266
6367
If you really like DataFrame, you can easily convert as such:
@@ -67,26 +71,53 @@ julia> using DataFrames
6771

6872
julia> df = DataFrame(x[:data]);
6973

70-
julia> df[:, 1:5]
71-
10×5 DataFrames.DataFrame
72-
│ Row │ Column1 │ Column10 │ Column100 │ Column11 │ Column12 │
73-
├─────┼─────────┼─────────────┼───────────┼──────────┼────────────┤
74-
│ 1 │ 0.636 │ "apple" │ 3230.0 │ NaN │ 1986-07-20 │
75-
│ 2 │ 0.283 │ "apple" │ 4904.0 │ 22.0 │ 1983-07-15 │
76-
│ 3 │ 0.452 │ "apple" │ NaN │ 7.0 │ 1973-11-27 │
77-
│ 4 │ 0.557 │ "dog" │ 8566.0 │ 26.0 │ 1967-01-20 │
78-
│ 5 │ 0.138 │ "crocodile" │ 894.0 │ 11.0 │ 1970-11-29 │
79-
│ 6 │ 0.948 │ "crocodile" │ 6088.0 │ 27.0 │ 1963-01-09 │
80-
│ 7 │ 0.162 │ "" │ 6122.0 │ NaN │ 1979-10-18 │
81-
│ 8 │ 0.148 │ "crocodile" │ 2570.0 │ 5.0 │ 1961-03-15 │
82-
│ 9 │ NaN │ "pear" │ 2709.0 │ 12.0 │ 1964-06-15 │
83-
│ 10 │ 0.663 │ "pear" │ NaN │ 16.0 │ 1985-01-28 │
74+
julia> head(df, 5)
75+
5×10 DataFrames.DataFrame
76+
│ Row │ ACTUAL │ COUNTRY │ DIVISION │ MONTH │ PREDICT │ PRODTYPE │ PRODUCT │ QUARTER │ REGION │ YEAR │
77+
├─────┼────────┼─────────┼───────────┼────────────┼─────────┼───────────┼─────────┼─────────┼────────┼────────┤
78+
1925.0 │ CANADA │ EDUCATION │ 1993-01-01850.0 │ FURNITURE │ SOFA │ 1.0 │ EAST │ 1993.0
79+
2999.0 │ CANADA │ EDUCATION │ 1993-02-01297.0 │ FURNITURE │ SOFA │ 1.0 │ EAST │ 1993.0
80+
3608.0 │ CANADA │ EDUCATION │ 1993-03-01846.0 │ FURNITURE │ SOFA │ 1.0 │ EAST │ 1993.0
81+
4642.0 │ CANADA │ EDUCATION │ 1993-04-01533.0 │ FURNITURE │ SOFA │ 2.0 │ EAST │ 1993.0
82+
5656.0 │ CANADA │ EDUCATION │ 1993-05-01646.0 │ FURNITURE │ SOFA │ 2.0 │ EAST │ 1993.0
83+
```
84+
85+
If you only need to read few columns, just pass an `include_columns` argument:
86+
87+
```
88+
julia> head(DataFrame(readsas("productsales.sas7bdat", include_columns=[:YEAR, :MONTH, :PRODUCT, :ACTUAL])[:data]))
89+
Read data set of size 1440 x 4 in 0.004 seconds
90+
6×4 DataFrames.DataFrame
91+
│ Row │ ACTUAL │ MONTH │ PRODUCT │ YEAR │
92+
├─────┼────────┼────────────┼─────────┼────────┤
93+
1925.01993-01-01 │ SOFA │ 1993.0
94+
2999.01993-02-01 │ SOFA │ 1993.0
95+
3608.01993-03-01 │ SOFA │ 1993.0
96+
4642.01993-04-01 │ SOFA │ 1993.0
97+
5656.01993-05-01 │ SOFA │ 1993.0
98+
6948.01993-06-01 │ SOFA │ 1993.0
99+
```
100+
101+
Likewise, you can read all columns except the ones you don't want as specified in `exclude_columns` argument:
102+
103+
```
104+
julia> head(DataFrame(readsas("productsales.sas7bdat", exclude_columns=[:YEAR, :MONTH, :PRODUCT, :ACTUAL])[:data]))
105+
Read data set of size 1440 x 6 in 0.031 seconds
106+
6×6 DataFrames.DataFrame
107+
│ Row │ COUNTRY │ DIVISION │ PREDICT │ PRODTYPE │ QUARTER │ REGION │
108+
├─────┼─────────┼───────────┼─────────┼───────────┼─────────┼────────┤
109+
1 │ CANADA │ EDUCATION │ 850.0 │ FURNITURE │ 1.0 │ EAST │
110+
2 │ CANADA │ EDUCATION │ 297.0 │ FURNITURE │ 1.0 │ EAST │
111+
3 │ CANADA │ EDUCATION │ 846.0 │ FURNITURE │ 1.0 │ EAST │
112+
4 │ CANADA │ EDUCATION │ 533.0 │ FURNITURE │ 2.0 │ EAST │
113+
5 │ CANADA │ EDUCATION │ 646.0 │ FURNITURE │ 2.0 │ EAST │
114+
6 │ CANADA │ EDUCATION │ 486.0 │ FURNITURE │ 2.0 │ EAST │
84115
```
85116
86117
If you need to read files incrementally:
87118
88119
```julia
89-
handler = SASLib.open("test1.sas7bdat")
120+
handler = SASLib.open("productsales.sas7bdat")
90121
results = SASLib.read(handler, 3) # read 3 rows
91122
results = SASLib.read(handler, 4) # read next 4 rows
92123
SASLib.close(handler) # remember to close the handler when done

0 commit comments

Comments
 (0)