The two-dimensional simulations presented in this report were run on
the University of Oklahoma ECAS Cray J-90 series computer. This computer
is composed of 8 processors and 256 megawords of main memory. Model performance
was measured using the -perf compiler option on the Cray FORTRAN 77 compiler.
The domain size is 64 x 64 x 53 grid points in the x, y, and z directions.
The test was run on one processor using the -Zv compiler option. The large
and small time steps were 20 and 4 seconds respectively. The grid spacing
was dx = dy = 2000m and dz = 250m and the base state wind was a constant
10m/s. The vertically implicit
solution
technique and upper radiation were implemented. Table D-1 presents a compilation
of the performance statistics by subroutine for a three-dimensional mountain
wave simulation. The overall code rating for this test is 95.6 MFLOPS.
The percentage of the total time for the
solver is a function of the ratio of large time steps to small time steps.
For a large big to small time step ratio, the small time step solver requires
a larger portion of the total CPU time. Figure D.1 presents a pie chart
of the most significant contributors to the model total CPU time. Approximately
35% of the time is spent in the small time step solvers dwpim3d and tri3d.
The subroutine arpi3d also contains small time step calculations for u
and v.
Table D-1. Performance statistics for ARPI3D three-dimensional mountain
wave simulation on a Cray J-90 series computer using a single processor
and the vector compiler option.
Figure D.1. Pie chart of CPU time requirements for a three-dimensional mountain wave simulation using ARPI3D and the ECAS Cray J-90 computer.
Tests were conducted during the initial model development phase which measured the efficiency of different terrain transformations and pressure equation formulations. The simple chain rule terrain formulation is found to be significantly faster (>33%, without turbulence) than the strongly conservative form used in a number of models including the ARPS. This is primarily due to the computationally intensive floating-point divisions in the strong conservation formulation. The adaptation of the system of equations from pressure to non-dimensional pressure also improves the computational efficiency of the code, as does the implementation of the advective form of the equations. In the dimensional pressure system of equations, the additional term in the buoyancy relation, due to a power series approximation, is computed on the small time step. The effect of this term was not explicitly determined but is estimated on the order of a few percent of the total solution time.
Another method of estimating computational efficiency is to test the model with other established mesoscale numerical models. A rough comparison of ARPI3D with ARPS Version 4.0 was made for a number of simple tests with the results of only two comparisons presented here. In 2-D mode, ARPI3D is on the order of 12-15 times more efficient (CPU seconds) than a similarly configured ARPS simulation. In defense of the ARPS, this is primarily due to the fact that the ARPS has a pseudo 2-dimensional option. The ARPS 2-dimensional mode computes 4 vertical slices, due to boundary condition requirements, while ARPI3D?s 2-D mode computes only 1 vertical slice. A more realistic test involves a 3-dimensional cold bubble dropped over a symmetric mountain. Both models were run without moisture since ARPI3D currently uses a dry formulation. The simulation time on a Cray J-90 computer for ARPS is approximately 3 times greater than that required by ARPI3D. Such a large discrepancy is likely due to the use of a simple coordinate transformation (chain rule), equivalent advective form of the advection terms, solving non-dimensional pressure, and the absence of operator subroutines. The memory requirements between the two models are comparable with ARPI3D requiring approximately 1/2 that of the ARPS.
The three-dimensional experiments presented in this report were performed on the Pittsburgh Super Computing Center?s Cray T3D and T3E massively parallel computers and the University of Oklahoma Hitachi SR2201C parallel super computer. During the winter of 1996, the source code was upgraded to include message passing interface (MPI) subroutine calls. MPI was chosen over the Parallel Virtual Machine (PVM) message passing technique because it is more efficient in passing similarly sized packets. The message passing application allows the code to be run on massively parallel computer platforms. The advantage to this method is the removal of the memory limitation existing on the Cray J90 and other symmetric multi-processor (SMP) platforms. Tests were conducted on the T3D in which the per processor model grid arrays remained constant and the number of processors increased. This experiment tests the scalability of ARPI3D on a specific machine type. As the number of processors increases the domain size also increases. A perfect code implemented on an infinitely fast computer would register the same wall clock times regardless of the number of processors. The relation for the number of grid points per processor to the global domain size is:
(D.2)
Figure D.2. Plot of the normalized wall clock time for a 20x12x115 per processor grid simulations as a function of processor configuration. Tests were conducted on the PSC Cray T3D computer.
Sounding data for Wangara Day 33 simulations.
Sounding filename = wang.snd
1-D Sounding Input for ARPI3D
Sounding Data collected at Wangara Surface Experiment,
34.5 South 144.93 East, Australia
Date: 9:00am August 16, 1967
Sounding obtained from Yamada and Mellor, 1975.
Surface Height = 0.0 m, Surface Pressure = 102,300 Pa
Number of Levels = 23
Pressure Temp. Qv U V
15000 -65.0 .00000 35.00 00.00
35000 -40.0 .00023 30.00 00.00
48000 -15.0 .00023 25.00 00.00
62300 -5.0 .00026 15.00 00.00
72300 -1.5 .00031 7.00 00.00
79900 -0.2 .00060 .50 1.10
82000 1.4 .00070 -.70 1.72
84000 1.7 .00080 -1.19 .26
86100 2.0 .00080 -1.45 .07
88300 2.3 .00150 -1.93 -.90
89000 2.6 .00180 -2.29 -1.41
90500 2.5 .00200 -2.55 -1.16
91600 2.9 .00220 -2.28 -.76
92800 3.5 .00250 -2.45 -.48
93900 3.8 .00290 -2.43 -.35
95100 4.7 .00320 -2.79 -.26
96300 5.8 .00330 -2.49 -.37
97400 6.8 .00330 -3.20 -.47
98600 7.4 .00370 -3.12 -.51
99800 7.5 .00380 -2.79 -.57
101100 5.4 .00380 -2.92 -.38
101700 5.1 .00370 -2.84 .03
102300 5.5 .00420 0.0 0.00
Sounding data for January 11, 1972 Boulder Colorado windstorm simulations.
Sounding filename = bld2.snd
1-D Sounding Input for ARPI3D
Sounding Data collected at Grand Junction, Colorado
Date: 12Z Jan. 11, 1972
Sounding estimated from Figure 10 Durran and Klemp (1983)
The top two layers were taken from Peltier and Clark (1979)
Surface Height = 0.0 m, Surface Pressure = 82000 Pa
Number of Levels = 13
Pressure Pt Qv U V
100.00000 1481.0000 0.00000 20.00 0.00
1000.00000 764.00000 0.00000 20.00 0.00
11000.00000 388.00000 0.00000 20.00 0.00
16000.00000 350.00000 0.00000 22.00 0.00
18500.00000 328.50000 0.00000 31.00 0.00
22000.00000 321.50000 0.00000 44.00 0.00
24000.00000 319.50000 0.00000 53.00 0.00
30000.00000 317.00000 0.00000 46.00 0.00
40000.00000 313.00000 0.00000 38.50 0.00
53000.00000 308.50000 0.00000 31.00 0.00
62500.00000 296.50000 0.00000 20.00 0.00
68000.00000 293.00000 0.00000 17.00 0.00
82000.00000 293.00000 0.00000 9.00 0.00
Sounding Data for the January 9, 1989 Boulder Colorado 2305UTC simulations.
Sounding filename = cl2d.snd
1-D Sounding Input for ARPI3D taken from Clark et. al. (1994)
Data collected at Craig, Colorado
Date: 15Z January 9, 1989
Surface Height 0.0 m, Surface Pressure 100000 Pa
Number of Levels = 20
Pressure Temp. Qv U V
500.00000 -55.70000 0.00000 30.00 0.00
2500.00000 -55.70000 0.00000 30.00 0.00
5000.00000 -55.70000 0.00000 30.00 0.00
9810.00000 -55.70000 0.00000 30.00 0.00
11880.00000 -55.80000 0.00000 31.09 0.00
15090.00000 -56.90000 0.00000 31.26 0.00
19980.00000 -60.90000 0.00000 40.57 0.00
24970.00000 -57.10000 0.00000 39.28 0.00
29920.00000 -47.20000 0.00000 34.74 0.00
35000.00000 -41.90000 0.00000 29.77 0.00
40030.00000 -35.00000 0.00000 29.07 0.00
45000.00000 -28.80000 0.00000 27.14 0.00
50170.00000 -22.60000 0.00000 26.11 0.00
55210.00000 -20.30000 0.00000 27.99 0.00
60290.00000 -15.30000 0.00000 25.50 0.00
69460.00000 -11.90000 0.00000 23.26 0.00
70220.00000 -11.00000 0.00000 13.34 0.00
75420.00000 -6.80000 0.00000 9.96 0.00
81160.00000 -6.00000 0.00000 3.75 0.00
100000.0000 -6.00000 0.00000 3.75 0.00