-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathch02_concepts.xml
4538 lines (3535 loc) · 344 KB
/
ch02_concepts.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="UTF-8"?>
<chapter id="ch_concepts">
<title>SMOKE Concepts</title>
<section>
<title>Introduction</title>
<para>The purpose of SMOKE is to convert the resolution of the data in an emission inventory to the resolution needed by an air quality model. Emission inventories typically have an annual-total emissions value for each emissions source, or perhaps an average-day emissions value. The AQMs, however, typically require emissions data on an hourly basis, for each model grid cell (and perhaps model layer), and for each model species. Consequently, to achieve the input requirements of the AQM, emissions processing must (at a minimum) transform inventory data by temporal allocation, chemical speciation, spatial allocation, and perhaps layer assignment.</para>
<para>In addition to changing the resolution of the data, SMOKE must also provide the AQM input files in the correct file format. SMOKE can create the Input/Output Applications Programming Interface (I/O API) Network Common Data Form (NetCDF) output format needed by the CMAQ model. It can also create the Fortran binary format for the 2-D emissions needed by UAM, and CAM<subscript>X</subscript>, and the ASCII elevated-point-source format used by the Ptsrce preprocessor to these models. File format is also important for the input files used by SMOKE, most of which are ASCII files, but some of which are I/O API NetCDF or CF-compliant NetCDF format files.</para>
<para>In this chapter, we introduce you to various concepts that are critical to understanding the technical description of emissions processing, as well as provide more detail about the processing capabilities of SMOKE. (Later, <xref linkend="ch_utilities" />, <xref linkend="ch_programs" />, <xref linkend="ch_quality_assurance" />, <xref linkend="ch_input_files" />, <xref linkend="ch_intermediate_files" />, and <xref linkend="ch_output_files" /> give more specifics about each program’s capabilities and each file’s format.) This chapter provides the context and framework for the rest of the user’s manual. To assist you in reading and using this chapter, we provide <xref linkend="app_glossary" /> for definitions of emissions inventory and emissions modeling terminology.</para>
</section>
<comment>
<section id="sect_concepts_assigns">
<title>Assigns file and environment variables</title>
<para>The Assigns file is a script used to set up the parameters of a SMOKE run. The file configures the UNIX environment so that all of the correct input, intermediate, and output directories and files can be identified and used by the SMOKE programs. It also sets things like the name of the grid and the time period that you will run SMOKE for a given case. It does this by setting many UNIX environment variables, explained in the next paragraph. The Assigns file also uses environment variables to configure compiler options, so that SMOKE can be compiled on operating system other than the ones provided with the SMOKE distribution. More information on the Assigns file is provided in <xref linkend="sect_scripts_assigns_files" />.</para>
<para>Environment variables are aliases that can be set by a UNIX operating system. These variables are defined during a user’s UNIX session, usually defined by an <command>xterm</command> or other UNIX terminal window. The environment variables that SMOKE uses store the input, intermediate, and output files and directories. For example, the environment variable for the directory that is the SMOKE root directory is <envar>SMKROOT</envar>. At the UNIX prompt, this environment variable could be defined to an actual path such as <filename class="directory">/home/mylogin/smoke</filename>. To set an environment variable, the UNIX <command>setenv</command> command is needed. In this example, the command to define <envar>SMKROOT</envar> as the given path is:</para>
<para><userinput><command>setenv</command> <envar>SMKROOT</envar> <filename class="directory">/home/mylogin/smoke</filename></userinput></para>
<para>After this command is issued, the <envar>SMKROOT</envar> environment variable stores the characters <filename class="directory">/home/mylogin/smoke</filename> as its value. To use the value of an environment variable, the dollar sign must proceed the variable name at the UNIX prompt. In the follow example, we give the UNIX command <command>echo</command> to print the contents of the <envar>SMKROOT</envar> environment variable at the UNIX prompt. Note the use of the dollar sign before the <envar>SMKROOT</envar> variable name.</para>
<para><userinput><command>echo</command> <envar>$SMKROOT</envar></userinput></para>
<para>When the UNIX system executes this command, the following is displayed at the UNIX prompt:</para>
<para><computeroutput>/home/mylogin/smoke</computeroutput></para>
<para>The environment variables set by the Assigns file for directories are described in <xref linkend="ch_dirs_files" />. The variables used by the SMOKE scripts for controlling SMOKE execution are described in <xref linkend="sect_scripts_script_settings" />. Finally, the environment variables that control program behavior are described in <xref linkend="ch_utilities" />, <xref linkend="ch_programs" />, and <xref linkend="ch_quality_assurance" />.</para>
</section>
</comment>
<section id="sect_concepts_emis_inv">
<title>Emission inventories</title>
<para>Emission inventories are the key input files to SMOKE and emissions modeling. The data types that these inventories contain are called inventory pollutants (e.g., carbon monoxide, ammonia, mercury). By itself, SMOKE does not require specific data types in the inventory files it reads. However, the AQMs that SMOKE supports do require certain input data, called model species, which in turn requires SMOKE to use certain inventory pollutants.</para>
<para>In this section, we focus on the inventory files that SMOKE uses. <xref linkend="sect_concepts_inv_data_types" /> describes the major inventory types useable by SMOKE. In <xref linkend="sect_concepts_inv_source_categories" />, we describe the inventory source categories, and in <xref linkend="sect_concepts_inv_file_formats" /> we discuss the inventory file formats. The remaining sections describe the various codes used in specific inventory sources: <xref linkend="sect_concepts_costcy_codes" />, <xref linkend="sect_concepts_scc_codes" />, <xref linkend="sect_concepts_sic_codes" />, <xref linkend="sect_concepts_mact_codes" />, <xref linkend="sect_concepts_section_112" /> and <xref linkend="sect_concepts_source_type_codes" />.</para>
<section id="sect_concepts_inv_data_types">
<title>Inventory data types</title>
<para>SMOKE processes criteria, particulate, toxics, and activity data inventories. Activity data will be discussed along with on-road mobile sources in the next section. By criteria inventories, we mean inventories containing EPA’s criteria pollutants: carbon monoxide (CO), nitrogen oxides (NO<subscript>x</subscript>), and volatile organic compounds (VOC) or total organic gases (TOG). Particulate inventories contain ammonia (NH<subscript>3</subscript>), sulfur dioxide (SO<subscript>2</subscript>), particulate matter (PM) of size 10 microns or less (PM<subscript>10</subscript>), and PM of size 2.5 microns or less (PM<subscript>2.5</subscript>).</para>
<para>Additionally, SMOKE can process inventories with pre-speciated criteria and/or particulate emissions. For example, elemental carbon of size 2.5 microns or less can be provided as input to SMOKE directly, instead of letting SMOKE’s speciation step compute it from the PM<subscript>2.5</subscript> total emissions. To ensure that SMOKE correctly processes the data when you are using pre-speciated emissions, other input files must be configured in specific ways<comment>, as explained in <xref linkend="sect_scripts_change_speciation" /></comment>.</para>
<para>The toxics inventories that SMOKE can process are data from the National Emission Inventory (NEI) for Hazardous Air Pollutants (HAPs). This inventory contains hundreds of specific compounds representing the 188 HAPs defined by the Clear Air Act. The original list of 189 HAPs and modifications representing the current list are available from the <ulink url="http://www.epa.gov/ttn/atw/orig189.html">EPA’s web site</ulink>. The reason the inventory contains many more pollutants than 188 is because several on the list of 188 are pollutant groups, such as polycyclic organic matter, cyanide compounds and numerous metal compounds including chromium compounds, cadmium compounds, manganese compounds, and others. Note that because of these groups, specific compounds in the inventory in one inventory year may not exactly match the compounds in another inventory year. For example, one may have lead oxide reported one year but not in a subsequent year. However, those compounds not belonging to compound groups are likely to be in the inventory year after year, particularly the common gaseous HAPs emitted by mobile sources such as benzene, 1,3-butadiene, acrolein, formaldehyde, and acetaldehyde.</para>
</section>
<section id="sect_concepts_inv_source_categories">
<title>Inventory source categories</title>
<section>
<title>Overview</title>
<para>Emission inventories are divided into several source categories. These divisions stem from both differing methods for preparing the inventories and from different characteristics and attributes of the categories (more on these terms later). Generally, emission inventories are divided into the following source categories:</para>
<itemizedlist>
<listitem>
<para><emphasis role="bold">Stationary area/Nonpoint sources:</emphasis> Sources that are treated as being spread over a spatial extent (usually a county or air district) and that are not moveable (as compared to nonroad mobile and on-road mobile sources). Because it is not possible to collect the emissions at each point of emission, they are estimated over larger regions. The EPA introduced the term <quote>nonpoint</quote> to replace <quote>stationary area</quote> in order to avoid confusion with the term <quote>area source</quote>, which is used as a regulatory term in the toxics realm. However, <quote>nonpoint</quote> has not gained acceptance (thus far) by the criteria inventory/modeling community. Thus, in this manual we will use the term <quote>stationary area</quote> to refer to these sources when they are in criteria inventories, while we use the term <quote>nonpoint</quote> to refer to these sources when they are in toxics inventories. Examples of nonpoint or stationary area sources are residential heating and architectural coatings. Numerous sources, such as dry cleaning facilities, may be treated either as stationary area/nonpoint sources or as point sources; in particular, the toxics inventory contains numerous small sources (based on emissions) that are not inventoried as nonpoint sources because their locations are known and are provided.</para>
</listitem>
<listitem>
<para><emphasis role="bold">Nonroad mobile sources:</emphasis> Vehicular and otherwise movable sources that do not include vehicles that travel on roadways. These sources are also computed as being spread over a spatial extent (again, a county or air district). Examples of nonroad mobile sources include locomotives, lawn and garden equipment, construction vehicles, and boating emissions. These sources are included in both criteria and toxics inventories.</para>
</listitem>
<listitem>
<para><emphasis role="bold">On-road mobile sources:</emphasis> Vehicular sources that travel on roadways. These sources can be computed either as being spread over a spatial extent or as being assigned to a line location (called a link). Data in on-road inventories can be either emissions or activity data. Activity data consists of vehicle miles traveled (VMT) and, optionally, vehicle speed. Activity data are used when SMOKE will be computing emission factors via another model such as MOVES. Examples of on-road mobile sources include light-duty gasoline vehicles and heavy-duty diesel vehicles. On-road mobile sources are included in both criteria and toxics inventories.</para>
</listitem>
<listitem>
<para><emphasis role="bold">Point sources:</emphasis> These are sources that are identified by point locations, typically because they are regulated and their locations are available in regulatory reports. Point sources are often further subdivided into electric generating utilities (EGUs) and non-EGU sources, particularly in criteria inventories in which EGUs are a primary source of NO<subscript>x</subscript> and SO<subscript>2</subscript>. Examples of non-EGU point sources include chemical manufacturers and furniture refinishers. Point sources are included in both the criteria and toxics inventories.</para>
</listitem>
<listitem>
<para><emphasis role="bold">Wildfire sources:</emphasis> Traditionally, wildfire emissions have been treated as stationary area sources. More recently, data have also been developed for point locations, with day-specific emissions and hour-specific plume rise (vertical distribution of emissions). In this case, the wildfire emissions are processed by SMOKE as point sources.</para>
</listitem>
<listitem>
<para><emphasis role="bold">Biogenic land use data:</emphasis> Biogenic land use data characterize the type of vegetation that exists in either county total or grid cell values. The biogenic land use data in North American is available using two different sets of land use categories: the Biogenic Emissions Landcover Database (BELD) version 2 (BELD2), and the BELD version 3 (BELD3).</para>
</listitem>
</itemizedlist>
<para>Emission processing in SMOKE is divided into four processing categories: area, biogenic, mobile, and point. The definitions of these categories that SMOKE uses are different than those used for defining emission inventories. <xref linkend="tbl_concepts_inv_categories" /> lists the inventory source categories, the types of inventories (activity data, criteria, particulates, and toxics) that SMOKE can process, the temporal resolution that is acceptable to SMOKE, and the SMOKE processing category that should be used for processing the inventory.</para>
<table id="tbl_concepts_inv_categories">
<title>Inventory source categories and SMOKE processing capabilities and categories</title>
<tgroup cols="6">
<colspec colname="c1" colwidth="20*" />
<colspec colname="c2" colwidth="10*" />
<colspec colname="c3" colwidth="10*" />
<colspec colname="c4" colwidth="10*" />
<colspec colname="c5" colwidth="10*" />
<colspec colname="c6" colwidth="15*" />
<thead>
<row>
<entry morerows="1" valign="bottom" align="center">Inventory source category</entry>
<entry namest="c2" nameend="c5" align="center">Temporal resolution that SMOKE can process*</entry>
<entry morerows="1" valign="bottom" align="center">SMOKE processing category</entry>
</row>
<row>
<entry align="center">Activity data</entry>
<entry align="center">Criteria</entry>
<entry align="center">Particulates</entry>
<entry align="center">Toxics</entry>
</row>
</thead>
<tbody>
<row>
<entry>Nonpoint or stationary area</entry>
<entry align="center">N/A</entry>
<entry align="center">A, S, D, H</entry>
<entry align="center">A, S, D, H</entry>
<entry align="center">A, S, D, H</entry>
<entry>Area</entry>
</row>
<row>
<entry>Nonroad mobile</entry>
<entry align="center">N/A</entry>
<entry align="center">A, S, D, H</entry>
<entry align="center">A, S, D, H</entry>
<entry align="center">A, S, D, H</entry>
<entry>Area</entry>
</row>
<row>
<entry>On-road mobile (MOBILE 6)</entry>
<entry align="center">A</entry>
<entry align="center">A, S, D, H</entry>
<entry align="center">A, S, D, H</entry>
<entry align="center">A, S, D, H</entry>
<entry>Mobile</entry>
</row>
<row>
<entry>On-road mobile (MOVES)</entry>
<entry align="center">A</entry>
<entry align="center">H</entry>
<entry align="center">H</entry>
<entry align="center">H</entry>
<entry>Mobile</entry>
</row>
<row>
<entry>EGU</entry>
<entry align="center">N/A</entry>
<entry align="center">A, S, D, H</entry>
<entry align="center">A, S, D, H</entry>
<entry align="center">A, S, D, H</entry>
<entry>Point</entry>
</row>
<row>
<entry>Non-EGU</entry>
<entry align="center">N/A</entry>
<entry align="center">A, S, D, H</entry>
<entry align="center">A, S, D, H</entry>
<entry align="center">A, S, D, H</entry>
<entry>Point</entry>
</row>
<row>
<entry>Wildfire with precomputed plume rise</entry>
<entry align="center">N/A</entry>
<entry align="center">D, H</entry>
<entry align="center">D, H</entry>
<entry align="center">N/A</entry>
<entry>Point</entry>
</row>
<row>
<entry>Wildfire with internal plume rise calculation</entry>
<entry align="center">N/A</entry>
<entry align="center">D</entry>
<entry align="center">D</entry>
<entry align="center">N/A</entry>
<entry>Point</entry>
</row>
<row>
<entry>Biogenic land use</entry>
<entry align="center">N/A</entry>
<entry align="center">X</entry>
<entry align="center">N/A</entry>
<entry align="center">N/A</entry>
<entry>Biogenic</entry>
</row>
<row>
<entry namest="c1" nameend="c6" align="center">* <emphasis role="bold">A</emphasis> = Supports annual data; <emphasis role="bold">S</emphasis> = Supports average-day data; <emphasis role="bold">D</emphasis> = Supports day-specific data; <emphasis role="bold">H</emphasis> = Supports hourly data; <emphasis role="bold">X</emphasis> = Supports available data</entry>
</row>
</tbody>
</tgroup>
</table>
</section>
<section>
<title>Detailed source category descriptions</title>
<para>Each inventory source category has source characteristics, source attributes, data values, and data attributes. <emphasis>Source characteristics</emphasis> are unique to each inventory source category and also distinguish one source in the inventory from another. <emphasis>Source attributes</emphasis> further describe the sources with other information that is useful for emissions processing, such as point-source flue gas exit height and temperature. The <emphasis>data values</emphasis> are either emissions values or activity values. The <emphasis>data attributes</emphasis> are additional information about the data values, such as the percentage reduction in emission from controls already applied to the source. In the following subsections, we summarize the source characteristics and attributes and the data values and attributes that are used by SMOKE for each of the inventory categories.</para>
<section>
<title>Nonpoint/stationary area and nonroad mobile (SMOKE category: area)</title>
<itemizedlist>
<listitem>
<para><emphasis role="bold">Source characteristics:</emphasis> For all typical inventories, the source characteristics that identify these sources are country/state/county code, SCC and/or GEOCODE_LEVEL[1-4]. See
<xref linkend="sect_concepts_costcy_codes" /> and <xref linkend="sect_concepts_scc_codes" /> and <xref linkend="sect_concepts_geocodes" /> for further information.</para>
</listitem>
<listitem>
<para><emphasis role="bold">Optional source characteristics:</emphasis> SMOKE can also use pregridded data from the same modeling domain as a SMOKE area source; this is described in more detail in <xref linkend="sect_concepts_pregridded_data" />. In this case, the source characteristics and attributes, (country/state/county code and SCC) are <emphasis>not</emphasis> used in SMOKE. SMOKE can also use pregridded data from a different modeling domain along with geographical codes (GEOCODE_LEVEL[1-4]) and source information from the ARINV file to specify the source characteristics and associated source attributes for each grid cell.</para>
</listitem>
<listitem>
<para><emphasis role="bold">Source attributes:</emphasis> The inventory year is associated with all sources in the inventory input files. In addition, SMOKE assigns a time zone (see <xref linkend="sect_concepts_assign_countries" />) and an approach for normalization of temporal profiles (see <xref linkend="sect_concepts_set_weekday_approach" />). In the nonpoint toxics inventory only, Standard Industrial Classification (SIC) codes, Maximum Achievable Control Technology (MACT) codes, and North American Industrial Classification System (NAICS) codes are optional source attributes; the NAICS code is read by SMOKE but not otherwise used at this time. Additionally, a <quote>source type</quote> field is available in the nonpoint inventory to identify major and Clean Air Act (CAA) section 112 area sources. See <xref linkend="sect_concepts_sic_codes" /> for a description of SIC codes, <xref linkend="sect_concepts_mact_codes" /> for more about MACT codes, and <xref linkend="sect_concepts_section_112" /> for more about source types. We will refer to the CAA section 112 area sources as simply <quote>section-112 area sources</quote>.</para>
</listitem>
<listitem>
<para><emphasis role="bold">Data:</emphasis> SMOKE can read emissions data for criteria, particulate, and toxics pollutants for nonpoint/stationary area and nonroad inventories. The SMOKE system is not constrained with regard to the pollutants read (although typical examples were given in <xref linkend="sect_concepts_inv_data_types" />). SMOKE accepts annual emissions data, average-day emissions data, or both (though not all input formats support all types). An emission factor value can also be read by SMOKE, but SMOKE does nothing with it.</para>
</listitem>
<listitem>
<para><emphasis role="bold">Data attributes:</emphasis> Inventories for nonpoint/stationary area and nonroad mobile sources can contain control efficiency, rule penetration, and rule effectiveness information for each pollutant. SMOKE will use these data if provided; otherwise it will set default values that indicate that no control-based adjustments have been applied to the inventory pollutant data. The defaults are listed in the file formats in <xref linkend="ch_input_files" />.</para>
</listitem>
</itemizedlist>
</section>
<section>
<title>On-road mobile (SMOKE category: mobile)</title>
<itemizedlist>
<listitem>
<para><emphasis role="bold">Source characteristics:</emphasis> For on-road mobile inventories, the minimum source characteristics that identify these sources are country/state/county code and either SCC <emphasis>or</emphasis> road class and vehicle type codes. When the SCC is provided, it must follow a specific pattern in order to contain the road class and vehicle type codes (see <xref linkend="sect_concepts_onroad_sccs" />). When road class and vehicle type codes are provided to SMOKE directly, SMOKE translates these to SCC values.</para>
</listitem>
<listitem>
<para><emphasis role="bold">Optional source characteristics:</emphasis> A link code may also identify on-road sources. This code must be unique within each county and SCC (or road class/vehicle type combination).</para>
</listitem>
<listitem>
<para><emphasis role="bold">Source attributes:</emphasis> The inventory year is associated with all sources in the inventory input files. In addition, SMOKE assigns a time zone (see <xref linkend="sect_concepts_assign_countries" />) and an approach for normalization of temporal profiles (see <xref linkend="sect_concepts_set_weekday_approach" />). For sources with link codes, SMOKE will use the starting and ending coordinates of the link, using either latitude-longitude (lat-lon) values or coordinates in the Universal Transverse Mercator (UTM) coordinate system.</para>
</listitem>
<listitem>
<para><emphasis role="bold">Data:</emphasis> Emissions data for criteria, particulate, and toxics pollutants can be read for on-road mobile inventories. SMOKE is not constrained with regard to the pollutants read (although typical examples were given in <xref linkend="sect_concepts_inv_data_types" />). SMOKE accepts annual emissions data, average-day emissions data, or both (though not all input formats support all types).</para>
<para>Additionally, on-road mobile inventories can contain VMT and average speed activity data, which are needed when users would like SMOKE to run MOVES to compute emissions. A combination of precomputed emissions and VMT data is also acceptable for input to SMOKE, but you are responsible for preventing duplication of emissions. Duplication could occur if you input precomputed emissions for the same sources that you use SMOKE to compute the emissions on the fly, by multiplying the on-road emissions factors from MOVES by hourly VMT, and the off-network emission factors from MOVES by annual vehicle populations.</para>
</listitem>
<listitem>
<para><emphasis role="bold">Data attributes:</emphasis> No data attributes are associated with on-road mobile sources.</para>
</listitem>
</itemizedlist>
</section>
<section>
<title>Point sources (SMOKE category: point)</title>
<itemizedlist>
<listitem>
<para><emphasis role="bold">Source characteristics:</emphasis> The source characteristics for point sources depend on the inventory input format. The SMOKE one-record-per-line (ORL) and Flat File 10 Format (FF10) formats identify sources by country/state/county code, plant code, point code, stack code, segment code, and SCC. 95.
</para>
</listitem>
<listitem>
<para><emphasis role="bold">Optional source characteristics:</emphasis> SMOKE can support up to five location identifiers within a plant, although the most used in any currently implemented input file format is four.</para>
</listitem>
<listitem>
<para><emphasis role="bold">Source attributes:</emphasis> As with other source categories, inventory year is associated with all sources in the inventory input files. SMOKE also assigns a time zone (see <xref linkend="sect_concepts_assign_countries" />) and an approach for normalization of temporal profiles (see <xref linkend="sect_concepts_set_weekday_approach" />). In addition, point sources have the following required source attributes not associated with other source categories: latitude, longitude, stack height, stack diameter (at the exit location), flue gas exit velocity, and flue gas exit temperature. Finally, the following optional source attributes are also used by SMOKE: SIC codes, MACT codes, plant descriptions, emissions release type point (e.g., horizontal stack, fugitive), source type (major or section-112 area), Office of Regulatory Information Systems (ORIS) identification codes, and boiler identification codes. Also, the MACT code and source types are supported only by the ORL format. See <xref linkend="sect_concepts_sic_codes" />, <xref linkend="sect_concepts_mact_codes" />, and <xref linkend="sect_concepts_section_112" /> for more information.</para>
</listitem>
<listitem>
<para><emphasis role="bold">Data:</emphasis> Emissions data for criteria, particulate, and toxics pollutants can be read for point inventories. SMOKE is not constrained with regard to the pollutants read (although typical examples were given in <xref linkend="sect_concepts_inv_data_types" />). SMOKE accepts annual emissions data, average-day emissions data, or both.</para>
<para>Optionally, point-source emissions data can be provided using day-specific or hour-specific records. The formats for these data are described in <xref linkend="sect_input_ptday" /> and <xref linkend="sect_input_pthour" />.</para>
</listitem>
<listitem>
<para><emphasis role="bold">Data attributes:</emphasis> EGU and non-EGU point sources can contain control efficiency and rule effectiveness information for each pollutant. SMOKE will use these data if provided; otherwise it will set default values that indicate that no control-based adjustments have been applied to the inventory pollutant data. The defaults are listed in the file formats in <xref linkend="ch_input_files" />.</para>
</listitem>
</itemizedlist>
</section>
<section>
<title>Wildfire with precomputed plume rise (SMOKE category: point)</title>
<itemizedlist>
<listitem>
<para><emphasis role="bold">Source characteristics:</emphasis> Wildfires with precomputed plumes are identified by the country/state/county code and the fire name.</para>
</listitem>
<listitem>
<para><emphasis role="bold">Optional source characteristics:</emphasis> There are no optional source characteristics for wildfire sources.</para>
</listitem>
<listitem>
<para><emphasis role="bold">Source attributes:</emphasis> Like other source categories, inventory year is associated with all sources in the inventory input files. SMOKE also assigns a time zone (see <xref linkend="sect_concepts_assign_countries" />) and an approach for normalization of temporal profiles (see <xref linkend="sect_concepts_set_weekday_approach" />). In addition, wildfire sources require the latitude and longitude source attributes. Finally, additional hour-specific source attributes for wildfire sources <emphasis>must</emphasis> be provided for the fraction of emissions in the surface layer, the height of the bottom of the plume, and the height of the top of the plume. These hour-specific attributes are provided to SMOKE using the point source hour-specific formats described in <xref linkend="sect_input_pthour" />.</para>
</listitem>
<listitem>
<para><emphasis role="bold">Data:</emphasis> Wildfire source inventories can contain criteria and particulate pollutants. SMOKE is not constrained with regard to the pollutants read (although typical examples were given in <xref linkend="sect_concepts_inv_data_types" />). These data must be provided as day-specific or hour-specific emissions values using point source formats specified in <xref linkend="sect_input_ptday" /> and <xref linkend="sect_input_pthour" />.</para>
</listitem>
<listitem>
<para><emphasis role="bold">Data attributes:</emphasis> No data attributes are associated with wildfire sources.</para>
</listitem>
</itemizedlist>
</section>
<section>
<title>Wildfires with internal plume rise calculation (SMOKE category: point)</title>
<itemizedlist>
<listitem>
<para><emphasis role="bold">Source characteristics:</emphasis> Wildfires with internal plume rise calculation are identified by the country/state/county code, fire identification, fire name, location identification, and SCC.</para>
</listitem>
<listitem>
<para><emphasis role="bold">Optional source characteristics:</emphasis> There are optional source characteristics for fire sources, such as material burned, vegetation types, size of area burned, fuel loading, and fire start/end hour. The size of area burned and fuel loading are used for computing the fire-specific plume rise. Fire starting and ending hours are needed to adjust the hourly temporal profiles for the emissions.</para>
</listitem>
<listitem>
<para><emphasis role="bold">Source attributes:</emphasis> Like other source categories, inventory year is associated with all sources in the inventory input files. SMOKE also assigns a time zone (see <xref linkend="sect_concepts_assign_countries" />) and will re-normalize temporal profiles based on the starting and ending hours of the fire. In addition, wildfire sources require the latitude and longitude source attributes to locate the fire. Note that all emissions for a fire will be assumed to come from the single grid cell that contains the latititude and longitude of the fire. Finally, additional day-specific source attributes listed above for fire sources <emphasis>must</emphasis> be provided for calculating the heat flux of each fire, which is used to estimate the fraction of emissions in the surface layer, the height of the bottom of the plume, and the height of the top of the plume. <comment>See more information about how to process at <xref linkend="sect_ptfire_emis_cmaq"/></comment></para>
</listitem>
<listitem>
<para><emphasis role="bold">Data:</emphasis> Fire source inventories can contain criteria and particulate pollutants. SMOKE is not constrained with regard to the pollutants read (although typical examples were given in <xref linkend="sect_input_ptinv_fire" />). These data must be provided as day-specific emissions values using point source formats specified in <xref linkend="sect_input_ptday_fireemis" />.</para>
</listitem>
<listitem>
<para><emphasis role="bold">Data attributes:</emphasis> No data attributes are associated with wildfire sources.</para>
</listitem>
</itemizedlist>
</section>
<section>
<title>Biogenic land use (SMOKE category: biogenic)</title>
<itemizedlist>
<listitem>
<para><emphasis role="bold">Source characteristics:</emphasis> Biogenic emission data does not fit as neatly into the source-characteristic paradigm as the previously described source types. Emissions for biogenic sources are estimated starting with land use data, which are available for both BELD2 and BELD3 processing. The BELD2 data are available either by U.S. state/county and BELD2 land use category or by grid cell and BELD2 land use category. BELD3 land use data are available by 1-km grid cell over North and Central America and by BELD3 land use category.</para>
</listitem>
<listitem>
<para><emphasis role="bold">Optional source characteristics:</emphasis> Biogenic land use data do not include optional source characteristics. The data are either by state/county or by grid cell.</para>
</listitem>
<listitem>
<para><emphasis role="bold">Source attributes:</emphasis> There are no source attributes for biogenic land use data.</para>
</listitem>
<listitem>
<para><emphasis role="bold">Data:</emphasis> The biogenic land use data consist of fractions associated with each land use type within a county or grid cell.</para>
</listitem>
<listitem>
<para><emphasis role="bold">Data attributes:</emphasis> There are no data attributes for biogenic land use data.</para>
</listitem>
</itemizedlist>
</section>
</section>
</section>
<section id="sect_concepts_inv_file_formats">
<title>Inventory file formats</title>
<para>SMOKE supports a variety of inventory formats for criteria, particulate, toxics, and activity data inventories, which are described in detail in <xref linkend="sect_input_inventory" />. Here, we provide a brief introduction to these formats, which will be helpful as you read more about SMOKE in the remainder of this chapter and the chapters before <xref linkend="ch_input_files" />. All formats described here are text files. To convert your data to these formats, the best approach is to use a database or spreadsheet program to reformat and output the data in the requested format. There is not a standard format-conversion method that comes with SMOKE.</para>
<para> In the following paragraphs, we describe the formats available for nonpoint/stationary area, nonroad mobile, on-road mobile, point, and point-wildfire sources.</para>
<itemizedlist>
<listitem>
<para><emphasis role="bold">Nonpoint/stationary area sources:</emphasis> SMOKE supports two formats for nonpoint/stationary area sources. The ORL and FF10 (Flat File 10) format are list directed (comma or semicolon delimited) and these file formats may be used to represent many different sources. The header of the file indicates what source data are in the file.</para>
</listitem>
<listitem>
<para><emphasis role="bold">Nonroad mobile sources:</emphasis> There are three available inventory formats for nonroad mobile sources. The FF10 (Flat File 10) format is list directed (comma or semicolon delimeted) and the header of the file is used to indicate the nonroad mobile source data is within the file.</para>
</listitem>
<listitem>
<para><emphasis role="bold">On-road mobile sources:</emphasis> The Flat File 10 (FF10) format is list directed (comma or semicolon delimited) and contains activity inventory such as VMT, speed, and vehicle population data. This format requires VMT, SPEED, and VPOP inventory data.</para>
</listitem>
<listitem>
<para><emphasis role="bold">Point sources:</emphasis> SMOKE has formats for annual or average-day inventories, for day-specific inventories, and for hour-specific inventories. For annual or average-day inventories, the ORL and FF10 formats can be used for criteria, particulate, and toxics inventories. Finally, the CEM data format can be used for day-specific or hour-specific data : SMOKE uses the ORIS codes and boiler codes in the annual inventory files to match sources from the CEM data files.</para>
</listitem>
<listitem>
<para><emphasis role="bold">Wildfire sources:</emphasis> There are two approches available that you can provide wildfire data that are being treated as point sources to SMOKE using the ORL and FF10 point-source formats.</para>
<itemizedlist spacing="compact">
<listitem><emphasis role="bold">Precomputed plume rise approach:</emphasis> Certain fields must be left blank (such as stack parameters) because they do not apply to wildfire sources. When using wildfire data provided as point sources, you must also provide day-specific or hour-specific wildfire emissions and hour-specific precomputed plume rise using the FF10 day-specific and hour-specific formats.</listitem>
<listitem><emphasis role="bold">Internal plume rise calculation approach:</emphasis> Requires two separate inventory files that are provided in a modified ORL format: (1) a list of fires with fire-specific characteristics including country/state/county, fire identification, location coordinate, fire name, SCC and others, as described in <xref linkend="sect_input_ptinv_fire" />, and (2) a day-specific fire data including size of area burned, fuel loading, and star/end hour of fire (<xref linkend="sect_input_ptday_fireemis" />). Unlike the approach listed above, this approach internally estimates the plume rise using the size of the area burned and fuel loading, and it adjusts temporal profiles using the start and end hours of the fire. <comment>See detail at <xref linkend="sect_ptfire_emis_cmaq" /></comment></listitem>
</itemizedlist>
</listitem>
</itemizedlist>
</section>
<section id="sect_concepts_costcy_codes">
<title>Country, state, and county codes</title>
<para>SMOKE uses a 6-digit integer code to identify a country, state (or province), and county (or other region) for a particular source. Most U.S. inventories input to SMOKE have the 5-digit U.S. Federal Implementation Planning Standards (FIPS) state and county codes. All inventory input formats have been adapted to include a special header record with which you can specify the country, effectively allowing the inventories to be provided with the 6-digit code that SMOKE uses. The 6-digit system was designed for use in the United States with states and counties, as well as Canada and Mexico, but it can be adapted for other uses. The format used by SMOKE for the codes is:</para>
<informalfigure>
<mediaobject>
<imageobject role="pdf">
<imagedata width="3in" fileref="images/concepts/costcy_pdf.jpg" />
</imageobject>
<imageobject role="html">
<imagedata fileref="images/concepts/costcy_html.jpg" />
</imageobject>
</mediaobject>
</informalfigure>
<para>The SMOKE installation is set up to use U.S.-centered codes as defined in the <envar>COSTCY</envar> or the <envar>GEOCODE_LEVEL4</envar> (if USE_EXP_GEOCODES Y) file, which contains the codes and their associated names and time zone settings. In this file, the U.S. country code is zero, which allows the U.S. country/state/county codes to be the same as the FIPS state/county codes that appear in U.S. inventories. See <xref linkend="sect_input_costcy" /> for more information on the <envar>COSTCY</envar> file format.</para>
<para>To change the meaning of the country, state, or county codes in SMOKE, the <envar>COSTCY</envar> or the <envar>GEOCODE_LEVEL4</envar> (if USE_EXP_GEOCODES Y) file must be modified to use different names associated with each country, state, county or tribal number. All SMOKE input files must also use this new numbering scheme, including inventory files and cross-reference files.</para>
<para>Acceptable values in SMOKE for the country code are 0 through 9. Acceptable values of the state code are 1 through 99. Acceptable values of the county code are 1 through 999. No alphabetic codes are accepted, since SMOKE stores these values as integers.</para>
</section>
<section id="sect_concepts_scc_codes">
<title>Source Classification Codes</title>
<para>EPA uses Source Classification Codes (SCCs) and area and mobile source (AMS) codes to classify different types of anthropogenic emission activities. SCCs have 8 digits for point sources, while AMS codes have 10 digits, and sometimes include a leading <quote>A</quote> as an eleventh character. In SMOKE, we refer to both kinds of codes as <quote>SCCs</quote>, and we ignore the leading <quote>A</quote> in the area and mobile codes. Additionally, SMOKE permits the nonpoint and point toxics inventories to use both 8-digit and 10-digit SCCs in the same inventory input file, because both 8- and 10-digit codes are contained in the nonpoint and point inventories in the 1999 NEI for HAPs. The maximum field width in SMOKE and its input files for SCCs is 20 characters as of SMOKE v4.0. The 8 or 10 digit SCC are still supported, but if the SCC is greater than 10 digits the SCC hierarchial approach will not be supported.</para>
<para>For SCC's of size 10 characters or less, the codes use a hierarchical system in which the definition of the code gets increasingly more specific as you move from left to right. (NOTE: if the SCC is greater than 10 characters in length the hierarchial system is not used). For SCC's of 10 characters or less, it is important to understand the hierarchy of the codes, because you can take advantage of the hierarchy in building cross-reference files for assigning emissions processing factors to inventory emission sources. In the diagrams below, level 1 is the least specific and level 4 is the most specific.</para>
<para>The code structure for the 8-digit point-source codes is:</para>
<informalfigure>
<mediaobject>
<imageobject role="pdf">
<imagedata width="3in" fileref="images/concepts/scc_point_pdf.jpg" />
</imageobject>
<imageobject role="html">
<imagedata fileref="images/concepts/scc_point_html.jpg" />
</imageobject>
</mediaobject>
</informalfigure>
<para>An example point-source activity and corresponding SCC can be taken directly from SMOKE’s SCC description file (<envar>SCCDESC</envar>): <quote>External Combustion Boilers; Electric Generation; Lignite; Spreader Stoker</quote> is represented by 10100306. Below we have mapped the levels of this description with the levels of the SCC:</para>
<informalfigure>
<mediaobject>
<imageobject role="pdf">
<imagedata width="3in" fileref="images/concepts/scc_example_pdf.jpg" />
</imageobject>
<imageobject role="html">
<imagedata fileref="images/concepts/scc_example_html.jpg" />
</imageobject>
</mediaobject>
</informalfigure>
<para>Similarly, the code structure for the 10-digit area- and mobile-source codes is:</para>
<informalfigure>
<mediaobject>
<imageobject role="pdf">
<imagedata width="3in" fileref="images/concepts/scc_nonpoint_pdf.jpg" />
</imageobject>
<imageobject role="html">
<imagedata fileref="images/concepts/scc_nonpoint_html.jpg" />
</imageobject>
</mediaobject>
</informalfigure>
<para>SMOKE treats SCCs as character strings, though in practice the values in the inventories and cross-reference files are usually numeric. In <xref linkend="sect_concepts_cross_referencing" /> on cross-references and profiles, we explain how these hierarchies are used by SMOKE and how you should use them in preparing SMOKE input files.</para>
<para>For on-road mobile sources, SCCs are treated somewhat differently than for other source categories. We explain more about this in <xref linkend="sect_concepts_onroad_sccs" />.</para>
</section>
<section id="sect_concepts_sic_codes">
<title>Standard Industrial Classification codes</title>
<para>Although SIC codes are being replaced by NAICS codes in building emission inventories at EPA, SIC codes are still used in emissions processing. As of SMOKE v4.0, the SIC codes may be up to 20 characters in length, but for SIC codes greater than 4-digits, the hierarchial system is not used. For SICs of size 4-digits, a 2-level hierarchial system is recognized by SMOKE for application of growth, control, and chemical speciation factors. The two code levels are illustrated below.</para>
<informalfigure>
<mediaobject>
<imageobject role="pdf">
<imagedata width="3in" fileref="images/concepts/sic_pdf.jpg" />
</imageobject>
<imageobject role="html">
<imagedata fileref="images/concepts/sic_html.jpg" />
</imageobject>
</mediaobject>
</informalfigure>
</section>
<section id="sect_concepts_geocodes">
<title>Geographical Code (GEOCODE_LEVEL[1-4]</title>
<para> Geographical codes may be specified to the user's desired level of detail using the GEOCODE_LEVEL[1-4] files. GEOCODE_LEVEL1 contains three character codes for each country in the inventory (CCC). GEOCODE_LEVEL2 contains six character codes for each state that the user would like to track in the inventory (CCCSSS). GEOCODE_LEVEL3 contains nine character codes for each county that the user would like to track in the inventory (CCCSSSYYY). GEOCODE_LEVEL4 contains twelve character codes for each tribal region that the user would like to track in the inventory (CCCSSSYYYTTT). </para>
<informalfigure>
<mediaobject>
<imageobject role="pdf">
<imagedata width="3in" fileref="images/concepts/geocode_pdf.jpg" />
</imageobject>
<imageobject role="html">
<imagedata fileref="images/concepts/geocode_html.jpg" />
</imageobject>
</mediaobject>
</informalfigure>
</section>
<section id="sect_concepts_mact_codes">
<title>Maximum Achievable Control Technology codes</title>
<para>The following quote explaining MACT codes was taken from EPA to explain what MACT codes are and why they are used in some inventories and not others:</para>
<blockquote>
<para>To evaluate EPA’s progress in reducing air toxic emissions through the Maximum Achievable Control Technology (MACT) standards and to identify sources that may be modeled as part of residual risk assessments, operations within facilities that are subject to MACT standards are identified in the NTI by 4-digit MACT codes. <emphasis>[note that the term NTI (National Toxics Inventory) has since been replaced with NEI and that the codes are now 6 digits]</emphasis></para>
<para>A MACT category is one for which emissions limitations have been or are being developed under section 112(d) of the Clean Air Act (National Emissions Standards for Hazardous Air Pollutants). EPA sets source category, technology based standards through its MACT program that sharply reduce emissions of HAPs. EPA’s ATW web site includes information on the MACT source categories and the MACT program (www.epa.gov/ttn/atw/eparules.html). The tagging of data with MACT codes allows EPA to determine reductions attributable to the MACT program. The NTI associates MACT codes corresponding to MACT source categories with stationary major and [section-112] area source data. MACT codes are assigned at the process level or at the site level in the point source data, e.g., the MACT code for municipal waste combustors (MWCs) is assigned at the site level whereas the MACT code for petroleum refinery catalytic cracking is assigned at the process level. MACT codes are also assigned to source categories in the nonpoint source file.</para>
</blockquote>
<para>In SMOKE, MACT codes are treated as 6-character strings, with no internal hierarchy associated with the number.</para>
</section>
<section id="sect_concepts_section_112">
<title>Source types: major and section-112 area sources</title>
<para>For point and nonpoint toxics inventories, each source can be labeled as <quote>major</quote> or <quote>section-112 area</quote> for input to SMOKE (the following paragraph explains how the term <quote>area</quote> can be applied to a point-source inventory). The Clean Air Act defines major sources as those stationary facilities that emit or have the potential to emit 10 tons per year or more of any one toxic air pollutant or 25 tons per year or more of any combination of toxic air pollutants. section-112 area sources include facilities that have air toxics emissions below the major source threshold as defined in section 112 of the Clean Air Act and thus emit less than 10 tons per year of a single toxics air pollutant or less than 25 tons per year combined of multiple toxics air pollutants. Another source type exists in principle for nonpoint sources: the <quote>other</quote> source type; an example of this source type is wildfires. However, these source types are not labeled differently from the section-112 area sources in the nonpoint toxics inventories, so the <quote>other</quote> source type has not been included in SMOKE to date.</para>
<para>A note about the confusing use of <quote>area</quote> terminology to describe point sources: The designation of sources in the point inventories as section-112 area sources has no relationship whatsoever to SMOKE’s area processing category. The point sources that are section-112 area sources are still processed by SMOKE as point sources using a lat-lon location and stack parameters.</para>
<para>In practice, all <quote>major</quote> sources should appear only in the point toxics inventory, but in some cases, <quote>major</quote> sources have shown up in the nonpoint inventory (specifically in inventory year 1996, in the July 2001 version of that inventory). Thus, the source type designation is provided in both the point and nonpoint toxics input formats.</para>
<para>The major and section-112 area designations are used when applying MACT-based control factors. These control factors are assigned based on a source’s MACT code and may be applied to major sources only, to section-112 area sources only, or to both types of sources regardless of designation.</para>
</section>
<section id="sect_concepts_source_type_codes">
<title>Source types: nonroad and onroad mobile sources</title>
<para>The nonroad and onroad mobile source type designations are used when applying MACT-based control factors. These control factors are assigned based on a source’s MACT code.</para>
</section>
</section>
<section id="sect_concepts_cross_referencing">
<title>Cross-referencing and profiles</title>
<para>The emission inventories described in <xref linkend="sect_concepts_emis_inv" /> can contain hundreds of thousands or even millions of sources. Collecting specific information for each source about its temporal allocation, chemical speciation, and spatial allocation is not practical. Therefore, a part of emissions processing involves assuming that many sources share the same factors for these major processing steps. For example, we apply monthly, day-of-week, and hourly temporal factors (called profiles) to convert from an annual emissions value to an hour-specific emissions value. A limited set of monthly, day-of-week, and hourly diurnal profiles are available from various studies, and these profiles each have their own unique profile number (also called profile code or profile ID). This limited set of profiles is assigned to the much more numerous inventory sources using an approach called cross-referencing, which is implemented using cross-reference files.</para>
<para>The cross-reference files assign the profiles based on source characteristics such as country, state, and county codes and/or SCCs, using the profile numbers to associate source characteristics with the profiles. While the profile numbers are unique in the profile files, they will appear many times in the cross-reference; this is how SMOKE is able to group the sources to treat them in the same manner. This approach is used for temporal allocation profiles, chemical speciation profiles and the spatial <quote>profiles</quote>, which are called spatial (or gridding) surrogates.</para>
<para>The cross-reference tables are applied to the sources in a stepwise manner, such that the most specific entry available is always applied. For example, if a cross-reference entry were available that matched a source by state, county, and SCC, SMOKE would apply that entry instead of a different cross-reference entry that matched that source only by SCC. The hierarchy that describes how each cross-reference file is applied to the inventory is described for each program in <xref linkend="ch_programs" />.</para>
<para><xref linkend="fig_concepts_xref" /> provides a generic example of how cross-reference files and profile files work together. In the example, the profile to be used for most of North Carolina is profile ID 16. Durham and Orange counties, however, are assigned profile 15, which would be preferentially applied to all sources in Durham and Orange counties, instead of using the general North Carolina profile. South Carolina sources would be assigned profile 17.</para>
<figure id="fig_concepts_xref">
<title>Generic example of how cross-reference files and profiles work together</title>
<mediaobject>
<imageobject role="pdf">
<imagedata width="3.5in" fileref="images/concepts/xref_pdf.jpg" />
</imageobject>
<imageobject role="html">
<imagedata fileref="images/concepts/xref_html.jpg" />
</imageobject>
</mediaobject>
</figure>
<para>This example does not correspond to a particular processing step (i.e., temporal allocation, chemical speciation, or spatial allocation), but rather assigns generic <quote>factors</quote> from profiles 15, 16, and 17 based on the state and county information in the cross-reference file. (Note that we have used the state and county names in this example, whereas real cross-reference files would use the country, state, and county codes according to the file format of the actual cross-reference files.)</para>
<para>SMOKE handles cross-references and profile application in a very efficient manner. In reading a cross-reference file, SMOKE first sorts the cross-reference entries using the same sort criteria as are used for the inventory sources (e.g. by country/state/county code, then by SCC, then by remaining source characteristics if any). Next, the cross-reference entries are grouped according to the <quote>level</quote> of matching of each of the entries. For example, all entries that could match to the inventory using only state and county codes would be in one group, while entries that could match to the inventory using only SCCs would be in another group. Once the cross-reference entries are grouped, SMOKE processes each sources in the inventory, and attempts to find a matching entry in one of the cross-reference groups. The most specific groups are searched first, and when a match is found for a particular source, the other groups are not searched. This helps increase efficiency. In addition, because the cross-reference entries are sorted within each group, an efficient searching algorithm can be used for each individual search. When a match to one of the cross-reference groups has been found, SMOKE continues to the next source in the inventory until all sources have been processed.</para>
<para>Cross-references and profiles are used in the following SMOKE processing steps. These steps and their associated programs (listed in parentheses) will be described in the sections to come.</para>
<itemizedlist spacing="compact">
<listitem>Inventory import (<command>Smkinven</command>)
<itemizedlist spacing="compact">
<listitem>cross-references: <envar>NHAPEXCLUDE</envar>, <envar>VMTMIX</envar>, <envar>PSTK</envar>, <envar>ARTOPNT</envar></listitem>
<listitem>profiles: none (factors are included in the cross-reference files when needed)</listitem>
</itemizedlist>
</listitem>
<listitem>Temporal allocation (<command>Temporal</command>)
<itemizedlist spacing="compact">
<listitem>cross-references: <envar>ATREF</envar>, <envar>MTREF</envar>, <envar>PTREF</envar></listitem>
<listitem>profiles: <envar>ATPRO</envar>, <envar>MTPRO</envar>, <envar>PTPRO</envar></listitem>
</itemizedlist>
</listitem>
<listitem>Chemical speciation (<command>Spcmat</command>)
<itemizedlist spacing="compact">
<listitem>cross-references: <envar>GSREF</envar>, <envar>GSCNV</envar></listitem>
<listitem>profiles: <envar>GSPRO</envar></listitem>
</itemizedlist>
</listitem>
<listitem>Spatial allocation (<command>Grdmat</command>)
<itemizedlist spacing="compact">
<listitem>cross-references: <envar>AGREF</envar>, <envar>MGREF</envar></listitem>
<listitem>profiles: <envar>AGPRO</envar>, <envar>MGPRO</envar> (<command>* Note</command>)</listitem>
</itemizedlist>
</listitem>
<listitem>Growth and controls (<command>Cntlmat</command>)
<itemizedlist spacing="compact">
<listitem>cross-references: <envar>GCNTL</envar></listitem>
<listitem>profiles: none (factors are included in the cross-reference files)</listitem>
</itemizedlist>
</listitem>
<listitem>Mobile-source speed assignment (<command>Movesmrg</command>)
<itemizedlist spacing="compact">
<listitem>cross-references: <envar>MCXREF</envar>, <envar>MFMREF</envar></listitem>
<listitem>profiles: <envar>SPDPRO</envar></listitem>
</itemizedlist>
</listitem>
</itemizedlist>
<para>The hierarchies that each SMOKE program uses to assign cross-reference entries to sources are provided in <xref linkend="ch_programs" />, where the programs are described at length. The file contents and formats are described in more detail in <xref linkend="ch_input_files" />.</para>
<para><command>Note</command>: The use of the Environment variable <envar>AGPRO</envar> (Area spatial surrogate file)and <envar>MGPRO</envar> (Mobile spatial surrogate file) have been discontinued. Two new Environment variables have been introduced to SMOKE; <envar>SRGPRO_PATH</envar> (spatial surrogate profile file location) and <envar>SRGDESC</envar> (description file with the specific list of available surrogates located in <envar>SRGPRO_PATH</envar>) See <xref linkend="fig_programs_grdmat" />. The surrogate files located in <envar>SRGPRO_PATH</envar> are refinements of the old [A|M]GPRO files. They are of the same format as the old files, however, there now may be one or more surrogate files. <command>Grdmat</command> now process each surrogate separately. On domains with large cell counts, this approach limits the memory usage at the expense of slightly longer run times. </para>
</section>
<section id="sect_concepts_basic_formats">
<title>Input and output file types</title>
<para>Before we describe more about the SMOKE processing, we first need to explain the types of files you will encounter in this documentation. SMOKE primarily uses two types of file formats: ASCII files and I/O API files. In addition, the output file format for the UAM-based air quality model is a Fortran binary file format. <xref linkend="ch_input_files" />, <xref linkend="ch_intermediate_files" />, and <xref linkend="ch_output_files" /> describe all input, intermediate, and output files, including the file format for each one. Input files are files that are read by at least one core SMOKE program (listed in <xref linkend="ch_programs" />), but are not written by a core program. Intermediate files are files that are written by a core program and read by at least one other core program. Output files are files output by a SMOKE core program but not read by any of them; these files include reports, log files, and the model-ready files to be input to an air quality model. (Exception: one intermediate file [used by a core program] is also an output file [used by an AQM]: the <envar>STACK_GROUPS</envar> file, described in <xref linkend="sect_intmed_stack_groups" />.) In this section, we further describe the ASCII and I/O API files, and then provide information about the two approaches for formatting the model-ready output files produced by SMOKE (the CMAQ/Models-3 approach and the UAM-based approach).</para>
<para>SMOKE’s input files are primarily ASCII files, although a few I/O API files are used. The intermediate files in SMOKE are primarily I/O API files, although there are several important ACSII files as well. The output files from SMOKE are primarily I/O API files and Fortran binary files for model-ready emissions files, and ASCII files for reports and logs.</para>
<section>
<title>ASCII files</title>
<para>ASCII files are simply the text files with which most computer users are familiar. The ASCII files input by SMOKE come in two structures: <emphasis>column-specific</emphasis> and <emphasis>list-directed</emphasis>.</para>
<section id="sect_concepts_column_specific">
<title>Column-specific ASCII files</title>
<para>In column-specific files, the fields in the files must appear in certain columns in the file. Each character on a line represents a single column. The lines below represent a column-specific ASCII data file:</para>
<programlisting>TEST 1 2 3
Additional data</programlisting>
<para>The letters <literal>TEST</literal> are in columns 1 through 4 of the file and the numbers 1, 2, and 3 are in columns 6, 8, and 10 respectively:</para>
<programlisting>123456789012345
TEST 1 2 3
Additional data</programlisting>
</section>
<section id="sect_concepts_list_directed">
<title>List-directed ASCII files</title>
<para>In list-directed files, the exact positioning of the fields on a line is not important, but the order of the fields on that line is crucial. The fields must be delimited (separated) by special characters called delimiters; in SMOKE, valid delimiters are <emphasis role="bold">spaces</emphasis>, <emphasis role="bold">commas</emphasis>, or <emphasis role="bold">semicolons</emphasis>. If a particular field happens to contain any of these delimiters within it, then that field must be surrounded by single or double quotes in the input file.</para>
</section>
</section>
<section>
<title>I/O API files</title>
<para>I/O API files are read and written by the I/O API library used by SMOKE and other Models-3 programs. A library is a set of routines that have been created and compiled for use by multiple programs. The I/O API library, in turn, is built upon yet another library called the NetCDF library. For this reason, I/O API files are also referred to as I/O API NetCDF files. More information on both of these libraries is available at the <ulink url="http://www.baronams.com/products/ioapi/">I/O API web site</ulink>. <comment><xref linkend="sect_install_compile" /> contains instructions for obtaining the I/O API and NetCDF libraries.</comment></para>
<para>The I/O API files cannot be viewed with a text editor because they are binary files. These binary files use less disk space than ASCII files containing the same data. They also allow much more efficient input and output of the data, and the I/O API library provides many quality assurance (QA) features useful for all input and output (I/O), including I/O for emissions processing.</para>
<para>The basic I/O API file has a limitation of 120 variables per file. To overcome this, SMOKE uses a wrapper called the FileSetAPI that creates and manages multiple I/O API files when more than 120 variables are needed in a single I/O API dataset in SMOKE. For example, if the SMOKE speciation matrix requires 140 pollutant-to-species variables, SMOKE will open by default two standard I/O API files: one with 120 variables and one with 20 variables. This resulting <quote>file set</quote> will be treated by other SMOKE programs as a single file, which enables processing of any number of pollutants and species in a single run, despite the I/O API variable limitation.</para>
<para>Some I/O API files can be viewed by the <ulink url="http://www.verdi-tool.org">Visualization Environment for Rich Data Interpretation</ulink> (VERDI). In SMOKE, any gridded output file from the <command>Smkmerge</command>, <command>Mrggrid</command>, or <command>Smk2emis</command> programs can be viewed by VERDI.</para>
<para>In some cases, it can be helpful to directly view the contents of the I/O API files in text form. This provides a quick way to check grid settings, time period, or species names in the model-ready output files. By viewing the text version of the model-ready output files produced by SMOKE, you can easily confirm that the correct species have been created or that the emission units are correct. To convert the I/O API files to text, one can use a combination of the NetCDF-provided <command>ncdump</command> utility and UNIX commands. The <command>ncdump</command> utility is created when you compile the NetCDF library, or you can download it from the <ulink url="http://www.unidata.ucar.edu/packages/netcdf/">NetCDF web site</ulink>. The command to convert the files to text format is:</para>
<para><userinput><command>ncdump</command> <replaceable><infile></replaceable> | <command>cut -c1-80</command> > <replaceable><outfile></replaceable></userinput></para>
<para>Replace <replaceable><infile></replaceable> in the command above with your input I/O API file name, and <replaceable><outfile></replaceable> with your desired ACSII output file name. The output file contains all the applicable data stored in the I/O API file including grid information, time period, variable names, etc.</para>
</section>
<section id="sect_concepts_model_ready_files">
<title>Model-ready files</title>
<para>SMOKE supports two major approaches for formatting its output files that are used as inputs to air quality models (i.e., model-ready files): the CMAQ/Models-3 approach and the UAM-based approach. The CMAQ/Models-3 approach is used for the CMAQ model, and the UAM-based approach is used for the UAM models, and CAM<subscript>X</subscript>.</para>
<para>The CMAQ/Models-3 approach uses one required 3-D I/O API file that contains the gridded, hourly, speciated, and vertically distributed emissions. In SMOKE, it is called the <envar>EGTS3D_L</envar> file. To create the 3-D model-ready emissions file, SMOKE computes plume rise for some or all point sources. For CMAQ, two additional optional files can be provided for plume-in-grid (PinG) processing. The first must contain locations and stack parameters for PinG sources and is called the <envar>STACK_GROUPS</envar> file. The second must contain the hourly, speciated emissions for the same PinG sources in a file called the <envar>PINGTS_L</envar> file.</para>
<para>The UAM-based approach has two required files: (1) a 2-D emissions Fortran binary file with all sources other than point sources and all low-level point sources, and (2) an elevated-point-source Fortran binary file. The SMOKE program <command>Smk2emis</command> can create the 2-D emissions Fortran binary file (called the <envar>UAM_EGTS</envar> file) by converting a 2-D <envar>EGTS_L</envar> file from an I/O API format. To obtain the elevated-point-source Fortran binary file, the SMOKE program <command>Smkmerge</command> can create an ASCII elevated-point-source file, which can then be converted to the required binary format using the UAM preprocessor <ulink url="http://www.remsad.com/ptsrce.htm">Ptsrce</ulink>.</para>
</section>
</section>
<section>
<title>Modeling parameters</title>
<para>Emissions modeling requires information about the subsequent air quality modeling that will be done. For example, to produce appropriate model-ready files using SMOKE, you must know which AQM will be used, the model grid and map projection, the episode dates, and the chemical mechanism to be used. In this manual, we refer to these settings collectively as <quote>modeling parameters</quote>. In this section, we provide information on what these modeling parameters are and SMOKE’s capabilities to support them.</para>
<para>SMOKE reads in the modeling parameters from both script settings (environment variables) and input files. In the subsections below, we provide the relevant settings and files that control the modeling parameters<comment>. More information about how to configure your scripts and files to change these parameters can be found in <xref linkend="sect_scripts_how_use_smoke" /></comment>; how the settings affect the programs is described in <xref linkend="ch_utilities" /> and <xref linkend="ch_programs" />.</para>
<section>
<title>Map projections and model grids</title>
<para>A map projection is the mathematical representation of the spherical surface of the earth on a 2-D plane. SMOKE supports Lambert conformal, lat-lon, UTM, and polar stereographic map projections. There are many different settings that you may use to define your Lambert conformal, UTM, and polar stereographic projections, to make these projections match the one being used by your meteorology model and AQM. (Lat-lon is a fixed projection and cannot be changed.)</para>
<para>A model grid is a two-dimensional region overlaid on a map projection. It is defined by the starting <emphasis>x-y</emphasis> coordinates, the number of grid cells in each direction, and the physical size of the grid cells. <xref linkend="fig_concepts_grid" /> shows an example of a model grid that includes most of the eastern U.S. This example has <comment>starting coordinates of,</comment> 81 grid cells in the <emphasis>x</emphasis>-direction, 75 grid cells in the <emphasis>y</emphasis>-direction, and each grid cell is 36 by 36 kilometers. Each set of 10 cells by 10 cells (counting from the starting coordinates) is enclosed in black grid lines.</para>
<figure id="fig_concepts_grid">
<title>Example model grid</title>
<mediaobject>
<imageobject role="pdf">
<imagedata width="5in" fileref="images/concepts/grid_pdf.jpg" />
</imageobject>
<imageobject role="html">
<imagedata fileref="images/concepts/grid_html.jpg" />
</imageobject>
</mediaobject>
</figure>
<para>The model grid is set in SMOKE using the <envar>IOAPI_GRIDNAME_1</envar> setting to select a grid and map projection from among those defined in the <envar>GRIDDESC</envar> input file. The name of the grid set with the <envar>IOAPI_GRIDNAME_1</envar> setting must match a grid name in the <envar>GRIDDESC</envar> file to allow SMOKE to obtain the grid and map projection parameters from the <envar>GRIDDESC</envar> file.</para>
</section>
<section>
<title>Base year and past/future years</title>
<para>For any modeling effort, the emissions base year and future year are key modeling parameters needed for performing emissions processing. The base year is usually the year for which the air quality model is being run in order to compare modeling results with observed air quality data. Such comparisons allow modelers to tune the emissions data and air quality model, to ensure that the AQM is performing adequately during the modeling episode.</para>
<para>The base year is most often a year for which an emission inventory is available. This is usually the same year for which the meteorology model has been run to prepare input to SMOKE and an AQM and for which air quality observations are available. Of course there are exceptions to this principle, but generally that is how one establishes a base year. <comment>At the time of this writing, versions of the 1996 criteria and particulate inventory and 1999 criteria, particulate, and toxics inventories are available from EPA.</comment></para>
<para>Several different files and settings are used to set the base year in SMOKE, each of which should be consistent with each other for ideal results.</para>
<itemizedlist>
<listitem>
<para>The <envar>YEAR</envar> setting in the SMOKE Assigns file is the reference point used by the scripts to determine the base year and set the names of various year-specific input files.</para>
</listitem>
<listitem>
<para>The episode and run settings (see <xref linkend="sect_concepts_modeling_episodes" />) determine the base year that will be used in the model-ready output files. This base year must match the <envar>YEAR</envar> setting so that the correct input files are used.</para>
</listitem>
<listitem>
<para>The input emissions files should ideally contain data for the same base year, and the #YEAR header setting in those files should be consistent with the <envar>YEAR</envar> environment variable in the Assigns file. If the years in the annual inventory files are not consistent with each other, SMOKE will determine the year used by the most sources and set that as the base year. If day-specific or hour-specific data are used, all years in those files must be consistent with the base year of the annual emissions.</para>
</listitem>
<listitem>
<para>The MOVES input data, if they are being used, should also be consistent with the base year. SMOKE is capable of running MOVES with inputs from a different year, but certain inputs may not be correct.</para>
</listitem>
<listitem>
<para>Finally, the dates in the I/O API meteorology data from the Meteorology-Chemistry Interface Processor (MCIP) must be consistent with both the base year and the episode and run settings.</para>
</listitem>
</itemizedlist>
<para>The future (or past) year is a chosen year in the future (or past) for which a modeler needs to run an air quality model; for example, to model the future effects of particular emission control strategies. To model a future year with SMOKE, you must have either an inventory that has been computed for a future year, or growth and control factors to project the base-year inventory to the future year. The settings and files that must be considered are as follows:</para>
<itemizedlist>
<listitem>
<para>The setting <envar>FYEAR</envar> is set in the run script and is used by the script to automatically assign the name of the <command>Cntlmat</command> input file <envar>GCNTL</envar>, which contains the growth factors. <envar>FYEAR</envar> must be set to the future year even if a future-year inventory is not being created because it has already been provided to you.</para>
</listitem>
<listitem>
<para>If you already have a future-year inventory and so do not need to use SMOKE to project one from the base year inventory, then the emissions data year must match the future year, and the #YEAR header in the inventory file must match that year as well. In this case, the <envar>SMK_BASEYR_OVERRIDE</envar> setting must also be used to indicate what the base year is (which will be the same as the year of the meteorology data).</para>
</listitem>
<listitem>
<para>The MOVES input data, if they are being used, must also include the correct settings for the future year of interest.</para>
</listitem>
<listitem>
<para>The episode and run settings, meteorology files, and day- or hour-specific inventories should <emphasis>not</emphasis> match the future year, but rather should use the base-year episode dates.</para>
</listitem>
</itemizedlist>
</section>
<section id="sect_concepts_modeling_episodes">
<title>Modeling episodes</title>
<para>The modeling episode is the total time period for which you will run SMOKE and your AQM. Unless the episode is just a few days long, users typically set up SMOKE to create emissions files of a shorter duration than their modeling episode, often creating one-day files for each day of their episode. Though SMOKE can create a single file for an entire episode, the file often becomes too large for some computers to handle (the limit for 32-bit operating systems is 2 GB files), so necessity rather than preference dictates that smaller files (usually one-day files) be created by SMOKE. We use the term <quote>run period</quote> to distinguish between these shorter durations and the full modeling episode; unless otherwise noted, we will assume that the run period is one day. For example, a typical SMOKE episode might cover July 1, 1996 through July 31, 1996. There will be 31 run periods (days) within this episode, the first starting on July 1, 1996 and the last starting on July 31, 1996.</para>
<comment><para>In the SMOKE Assigns file, there are several settings that you need to change to cause SMOKE to create emissions for the episode of interest. <xref linkend="sect_scripts_how_use_smoke" /> provides more guidance on the particular form and approaches needed for using these settings.</para></comment>
<itemizedlist>
<listitem>
<para>The episode start date (<envar>EPI_STDATE</envar>), episode start time (<envar>EPI_STTIME</envar>), episode duration in hours (<envar>EPI_RUNLEN</envar>), and the episode number of days (<envar>EPI_NDAY</envar>) all must be set to cover the modeling episode. Note that SMOKE can only be run for periods contained within a single calendar year. It cannot, for example, start in December of 1996 and run through January of 1997. Two separate episodes would need to be set up in this case, with the first ending on December 31, 1996, and the second starting on January 1, 1997.</para>
</listitem>
<listitem>
<para>The start date of the first run period needs to be set using the <envar>G_STDATE</envar> and <envar>ESDATE</envar> settings. The <envar>G_STDATE</envar> is the year and Julian day setting used by the SMOKE programs; in our example above, <envar>G_STDATE</envar> would be set to 1996183, since July 1 is the 183rd day of 1996. The <envar>ESDATE</envar> is the Gregorian date used in naming the SMOKE intermediate and output files; for our example, <envar>ESDATE</envar> would be 19960701. The SMOKE scripts will use the <envar>EPI_NDAY</envar> setting to automatically loop through the number of run periods in the episode, starting with the first <envar>G_STDATE</envar> value in the Assigns file. The <envar>G_STDATE</envar> and <envar>ESDATE</envar> settings are changed for each run period.</para>
</listitem>
<listitem>
<para>The run period start time (<envar>G_STTIME</envar>) and duration (<envar>G_RUNLEN</envar>) must also be set to indicate the start time and length of each run period. Both values are provided as a number of hours, using a HHMMSS (hours, minutes, seconds) format.</para>
<para>The run period duration (<envar>G_RUNLEN</envar>) is usually not the same as the episode duration (<envar>EPI_RUNLEN</envar>). For example, if the episode length is 30 days (720 hours), the run period duration setting could be just 1 day (25 hours), 2 days (49 hours), or three days (73 hours) (the reason for the extra hour in each case is explained below). In the first case, SMOKE would create thirty 25-hourfiles; in the second case, fifteen 49-hour files; and in the third case, SMOKE would create ten 73-hour files.</para>
</listitem>
<listitem>
<para>The <envar>NDAYS</envar>, <envar>MSDATE</envar>, and <envar>MDAYS</envar> settings are used for naming files. The <envar>NDAYS</envar> setting should be set to the number of days in each run period, and is used by default for naming time-based files. The <envar>NDAYS</envar> setting is also used along with the <envar>EPI_NDAY</envar> setting to loop through the run periods in the episode. The <envar>MSDATE</envar> and <envar>MDAYS</envar> settings can be used for naming the meteorology input files, but are not being used by the default Assigns file provided with SMOKE.</para>
</listitem>
</itemizedlist>
<para>There are a few key things to remember when you are verifying that you have the correct episode settings:</para>
<itemizedlist>
<listitem>
<para>SMOKE cannot process emissions over a calendar-year break. Thus, the longest run that can be done is for 365 days, with the episode start date being January 1. If a modeling episode spans multiple years, then a different Assigns file, script, and sets of input files must be created for each year.</para>
</listitem>
<listitem>
<para>The AQMs supported by SMOKE always need one extra hour in each emissions input file due to how they calculate boundary conditions. Therefore, if you are inputting emissions to run a 24-hour period, the <envar>G_RUNLEN</envar> setting should be 250000 for 25 hours.</para>
</listitem>
<listitem>
<para>The CMAQ and CAM<subscript>X</subscript> models can accept emissions files for multiple days, but the UAM must have 25-hour files only. As stated earlier, however, all of these models are often run using 25-hour files, with one file for each day of the episode.</para>
</listitem>
<listitem>
<para>All times are associated with a time zone, including the episode and run period start time settings. These settings must be consistent with the time zone of the meteorology files. If the meteorology data were created using MM5, the time zone is most likely Greenwich Mean Time (GMT); therefore, the <envar>EPI_STDATE</envar>, <envar>EPI_STTIME</envar>, <envar>G_STDATE</envar>, and <envar>G_STTIME</envar> settings would have to be provided in that same time zone. Whatever time zone is inherent in the meteorology files and these date settings will also be the time zone of the dates and times in the output emissions files from SMOKE. This ensures that the dates and times of the emissions and meteorology files are consistent for input to the AQM.</para>
</listitem>
</itemizedlist>
</section>
<section>
<title>Chemical mechanisms</title>
<para>SMOKE can accommodate a variety of chemical mechanisms for the models it supports. From the emissions processing perspective, the chemical mechanism is the mapping of the pollutants provided in the emissions inventory to the species needed by the AQM of interest. For example, the input files for five chemical mechanisms for the CMAQ model are available for download from the EPA; these mechanisms are Carbon Bond 6 (CB6), CB6 with particulates, Regional Acid Deposition Model, 2 (RADM2), RADM2 with particulates, and a research version of CB6 with toxics.</para>
<comment><para>In <xref linkend="sect_scripts_change_speciation" />, we provide the settings needed in the Assigns file to use a different chemical mechanisms with SMOKE. SMOKE is not constrained to the files available for download. If you need to process other data (e.g., a tracer species) with SMOKE, they can be added to several input files, including the chemical mechanism file, to be output to the AQM. Some additions to chemical mechanisms are easier than others, and we explain how to determine whether you can create the files you need for your situation. We also give instructions on how to add species to the chemical mechanism files and how to make sure that the inventory pollutants are mapped to the correct chemical species.</para></comment>
<para>SMOKE users must know what chemical mechanism will be used in the AQM for which the SMOKE output emissions are intended. Once that has been determined, the following files must be configured to be consistent with the inventory being used and the chemical mechanism: the inventory table (<envar>INVTABLE</envar>), speciation profiles (<envar>GSPRO</envar>), speciation cross-reference (<envar>GSREF</envar>), and the mobile processes file (<envar>MEPROC</envar>) when creating on-road mobile emissions with MOVES through SMOKE.</para>
</section>
<section>
<title>Layer structures</title>
<para>SMOKE needs information on layer structures for processing elevated point sources’ plume rise in the <command>Laypoint</command> program and creating the ASCII elevated-point-source file (<envar>ELEVTS_L</envar> or <envar>ELEVTS_S</envar>) with the <command>Smkmerge</command> program. The way SMOKE obtains the layer information differs depending on whether you are creating emissions using a CMAQ-based or UAM-based approach (see <xref linkend="sect_concepts_model_ready_files" />). For the CMAQ-based approach, SMOKE determines the layer structure from the structure included in the header of the <envar>GRID_CRO_3D</envar> meteorology file. For the UAM-based approach, SMOKE does not really need to know the layer structure, except to output it to the ASCII elevated-point-source file. In this case, there are many settings obtained by <command>Smkmerge</command> from environment variable names starting with <envar>UAM_</envar>.</para>
</section>
</section>
<section id="sect_concepts_sparse_matrix">
<title>Sparse matrix approach to emissions modeling</title>
<para>The paradigm for atmospheric emissions models prior to SMOKE was a network of pipes and filters. This means that at any given stage in the processing, an emissions file includes self-contained records describing each source and <emphasis>all</emphasis> of the attributes acquired from previous processing stages. Each processing stage acts as a filter that inputs a stream of these fully-defined records, combines it with data from one or more support files, and produces a new stream of these records. Redundant data are passed down the pipe at the cost of extra I/O, storage, data processing, and program complexity. Using this method, all processing is performed one record at a time, without necessarily a structure or order to the records.</para>
<para>This old paradigm came about as a way to avoid repeatedly searching through data files for needed information, which would be very inefficient. It is admirably suited to older computer architectures with very small available memories and tape-only storage, but is not suitable for current desktop machines or high-performance computers. SMOKE developers demonstrated this when the Emissions Preprocessor System (EPS) 2.0 was run on a Cray Y-MP. It ran four times slower on the Cray machine (a much faster computer) than on a desktop 150 MHz DEC Alphastation 3000/300. This paradigm also fostered a serial approach to the emissions processing steps, as shown in <xref linkend="fig_concepts_serial_approach" />.</para>
<figure id="fig_concepts_serial_approach">
<title>Serial approach to emissions processing</title>
<mediaobject>
<imageobject role="pdf">
<imagedata width="6.5in" fileref="images/concepts/serial_approach_pdf.jpg" />
</imageobject>
<imageobject role="html">
<imagedata fileref="images/concepts/serial_approach_html.jpg" />
</imageobject>
</mediaobject>
</figure>
<para>The new paradigm implemented in SMOKE came about from analyses indicating that emissions computations should be quite adaptable to high-performance computing if the paradigm were appropriately changed. For each SMOKE processing category (i.e., area, biogenic, mobile, and point sources), the following tasks are performed:</para>
<itemizedlist>
<listitem>
<para>read emissions inventory data files</para>
</listitem>
<listitem>
<para>optionally grow emissions from the base year to the (future or past) modeled year (except biogenic sources)</para>
</listitem>
<listitem>
<para>transform inventory species into chemical mechanism species defined by an AQM</para>
</listitem>
<listitem>
<para>optionally apply emissions controls (except for biogenic sources)</para>
</listitem>
<listitem>
<para>model the temporal distribution of the emissions, including any meteorology effects</para>
</listitem>
<listitem>
<para>model the spatial distribution of the emissions;</para>
</listitem>
<listitem>
<para>merge the various source categories of emissions to form input files for the AQM</para>
</listitem>
<listitem>
<para>at every step of the processing, perform quality assurance on the input data and the results</para>
</listitem>
</itemizedlist>
<para>Each processing category has its particular complexities and deviations from the above list; these are described in <xref linkend="sect_concepts_processing_summaries" />. For all categories, however, most of the needed processing steps are <emphasis>factor-based</emphasis>; they are linear operations that can be represented as multiplication by matrices. Further, some of the matrices are <emphasis>sparse</emphasis> matrices (i.e., most of their entries are zeros).</para>
<para>SMOKE is designed to take advantage of these facts by formulating emissions modeling in terms of sparse matrix operations, which can be performed by optimized sparse matrix libraries. Specifically, the inventory emissions are arranged as a vector of emissions sorted in a particular order, with associated vectors that include characteristics about the sources such as the state/county and SCCs. SMOKE then creates matrices that apply the control, gridding, and speciation factors to the vector of emissions. In many cases, these matrices are independent from one another, and can therefore be generated in parallel and applied to the inventory in a final <quote>merge</quote> step, which combines the inventory emissions vector (now an hourly inventory file) with the control, speciation, and gridding matrices to create model-ready emissions. <xref linkend="fig_concepts_parallel_approach" /> shows how the matrix approach allows for a more parallel approach to emissions processing, in which fewer steps depend on other needed steps.</para>
<para>Note that in <xref linkend="fig_concepts_parallel_approach" />, temporal allocation outputs hourly emissions instead of a temporal matrix. This is because of some peculiarities with temporal modeling for point sources, which can use hourly emissions as input data. To be able to overwrite the inventory emissions with these hourly emissions, the temporal allocation step must output the emissions data. The matrix approach is used internally in the temporal allocation step.</para>
<para>The growth and controls steps shown in <xref linkend="fig_concepts_parallel_approach" /> are optional. If the inventory is not grown to a future or past year, then the temporal allocation step uses the original inventory vectors to calculate the hourly emissions.</para>
<figure id="fig_concepts_parallel_approach">
<title>Parallel approach to emissions processing</title>
<mediaobject>
<imageobject role="pdf">
<imagedata width="6.5in" fileref="images/concepts/parallel_approach_pdf.jpg" />
</imageobject>
<imageobject role="html">
<imagedata fileref="images/concepts/parallel_approach_html.jpg" />
</imageobject>
</mediaobject>
</figure>
<para>Several benefits can be realized from this more parallel approach. For example, given a single emissions inventory, temporal modeling is performed only once per inventory and episode (though in practice, this step is often performed once per episode day). Also, gridding matrices typically need only be calculated once per inventory and model grid definition, without having to reprocess other steps. As shown in <xref linkend="fig_concepts_additional_grid" />, SMOKE usually needs to rerun only the gridding and merge steps to process a different grid for the same inventory. The merge step in the figure will read the previously created results from the temporal allocation, chemical speciation, and control processing steps.</para>
<figure id="fig_concepts_additional_grid">
<title>Processing steps for running an additional grid in SMOKE</title>
<mediaobject>
<imageobject role="pdf">
<imagedata width="6.5in" fileref="images/concepts/additional_grid_pdf.jpg" />
</imageobject>
<imageobject role="html">