HAl PC Excel/Office SIG  Feb. 18, 2000

Copyright 2000: Dr. Stephen M. Rasey, Texaco Exploration, Houston, TX.
Email ids:   raseysm@wiserways.com

 

Spreadsheet Design: Best Practices

Overall Design Philosophy

Multiple Input Data workbook.   Each workbook is a different case.

1 and only 1 instance of the calculation workbook.    THIS IS VITAL!   It is the only source of calculation control you have on the process.

Multiple Output Workbook pages.    Might be added to the Input workbook or the Input could be duplicated in the output.   Output workbooks should be devoid of links.

Simple macros move data from Input Workbooks to the Calculation module to the Output module.

Constants page

Typical "Constants" that change!

Tax Rates, Discount Rates, Versions, Calendar Year, Company Name.mDivions Name

Status Variables:

These might be the values held in range names.    Show the current values of these on the Constants page so that : A) you can quickly see what the value is, and B) other people can see what range names are important!

Show the links used in workbook here to.   

use =Cell("filename",linked reference)

Nobody hard codes these, right?

Prices, Costs

Styles

Input Cells

adopt a consistent style for input values in the workbook.

My favorite is BoldBlue.   Sometimes I use a yellow background and BoldBlue for input.

Input Section

Keep all input toghether

Avoid like the plague mixing calculations even a sum within the input section.

Numeric-Only Input

The values  =1999 and ="1999" display exactly the same way, but they have enormous differences in all calculations.   It is vital to highlight text mistakenly entered into numberic cells.

In all Numeric cells, use the custom format:

0.00; -0.00; " - "; [red]"!!Text!!"

any value that is held as text will be shown as a red !!TEXT!! value.  The actual text cannot be seen until you go to the cell to edit it.   But who cares, whatever it is it is wrong.

Output: Most important data on 1 printed page.

This is in fact where you should start the design.   

Concentrate on putting all CRITICAL Input data and Output Results on 1 piece of paper.

Input

Your most important input data should be displayed here.   This is NOT the inputs section.   It is formulas to the Input section.   

Where possible, it should LOOK like the input data.   Same arrangement.

Not all the input can be put on one page, but enough to make the output (results) meaningful.

Output

The reason for the report itself.   Critical output.   Also any note of error flags.

Location of File

in one of the corners put the full path of the file that printed the report.

User and Date.

Put the date and time of the ANALYSIS not when it was printed.

See VBA function for getting Userid

Charts: Important data on 1 page

Absolutely nothing worse for your credibility is to show a chart of correct data, but labeled wrong!

The prime culprit is that people link the chart series to data in the workbook (you have to), but type in the Chart Headers as static text.

Solution:  In the chart Title, Y-Axis, X-Axis, enter formula references to complete text strings that are built RIGHT NEXT TO THE DATA!.

Note, formulas in charts are pretty dumb.   Basically they can only be a simple reference to one cell.   If you have to concatinate text or combine text and numbers in your plot title, you must do all that building in the worksheet cell.

Liberally use text boxes on charts with a formula to worksheet cells conveying more identifying information.

Do everything you can to insure that the plotted data and the text that describe the data are in sync.   Formulas to worksheet cells are the best way to do this.

Finally, use a textbox with an Arial 6 font with a formula that refers to a cell with the following formula "=cell("filename",A1)"

 

Data from other Workbooks: use a "Loading Dock"

In the overview I stressed that there is 1 and only 1 calculation engine but there are many input cases.    To make this work you must get the Input data into the Calculation Engine as easily as possible.

Whether you use a manual Copy & Paste, an automated Copy & Paste (my preferred method) or Links, the key is to have a Loading Dock on the Calculation Sheet where data is brought in for processing.

Bring in links pure:

When you use links, do not use them within calculations!    Chances are you will need them again!   Bring them in once, cleanly into the loading dock.   Once there you can refer to those data time and time again.

Automated Copy & Paste is best

Links are hard to maintain.    If you use a rangename in your links (and you SHOULD!) then the source workbook often needs to be open for the link to work.    Having multiple workbooks open slows down your calculation steps.

It is far better to copy and paste values from the source into the loading dock.   Then the source document can be closed to speed up the calculation routine.

Access Queries

A loading doc is essential for data from an Access Query.

The simplest way to bring data from Access into Excel is either to

Copy the rows of a query and paste it into an Excel Worksheet or

Drag the Query Icon onto the Excel Worksheet

 

I have a 1 page VBA routine that does the following:   In two input cells, enter the full path to the Access database, enter the query name to retrieve.   Press a button and that query will be brought down into the Active sheet.     This uses a DAO method of getting data from Access.

 

Advanced Filters

using Data, Advanced filters, is an excellent way to make a small database query from a large EXCEL table and paste a copy of the results somewhere else.    To make full use of this, you need to organize your tables as described in "Verticals Arrays"

MS Query

I have had bad luck using MS Query.   I find it fragile, difficult to edit, and very un-Excel like it its retrieval of data.

Create Names Friendly orgainizations:

Scalers:

Description; RangeName; Value; Units

Horizonal arrays

Text Formula, Description, Units, RangeName, Value Array

 

Verticle Arrays - Also the layout of all tables

Description as a Report Heading; Units; FormatFormula row; Blank row, RangeName; Value Array

The best layout I have devised for tables is as follows:

 

1Column Headings that People Read.

2Units

3an optional blank row

4a Match row or Constants row.

5a formula/format row, to be copied down to all rows of the 6table at some strategic time

7a required blank row

8column header row.   These are names to be us a database field names and/or Excel Range names.   OneWordUpperAndLowerCase.

9the first line of data.    

 

With the table orginized this way, page headers include elements 1 and 2 so elements 3 thru 8 are hidden from the user on the print out.

 

But the data is organized for easy use with pivot tables and Autofilters.   Selecting any cell within the table and selecting Current Area selects the table and the column Header Row.  

 

Favorite Names

Where "X" is a letter usually to determine a step in processing.

"DataAreaX" = typically a vertical array table ready for pivot and autofilter work.

"DataAreaXHead" the 1 row of data area headers.

"DataAreaXSort"  The array of data that does not include the headers.   this is what you would sort.   (now that there is an option to include or exclude headers, this one is less important than before"

Pad with Zeros

It is often desirable in some processing to have the first record and the last record of a table composed of zeros.   this is especially true if the calcuations are not limited to their own row.

Often this is not possible nor practical.    A database query is one place where you can seldom pad the table with zeros.

RowLock ,ColumnLock

"Insert, Cell, down" is a great way to destroy a table!

To prevent it at the bottom of a table, create another row composed of a Horizontal array the width of the table with the entry   ="Column Lock".

This then means that you can insert into a table only if you insert across all columns of the table.  

Likewise to the right of a table, create a vertical array the height of the table with the contents ="Row Lock".  This prevents any cells from being inserted unless the entire table gets a new column.

Note,  Row lock and Column lock are both used, they prevent anything from being inserted.

 

Options

Edit in Cell: off  (for the mathematically inclided)

Model Results 2 or more ways.

"Holistic Programming"

Examples: MultiField9xG2!McKevNPV2:

Compare several tightly complex array formula with 1 broken up function to see the intermediate parts.   You might use the complex formula on hundreds of cells, but you have one cell you have modeled a different way to check you work.   If that cell does not agree, why should you trust the others?

 

Reduces chance for major errors going unnoticed.

Mars Polar Orbiter.

Horror Stories

Dead Formulas

Referencing the Wrong cell

Out of date constants

Data inside Formulas


A Reprint from a 1995, 1996, and 1997 Compuserve Excel forum

Rasey  970404

Attached is a repost of my own principles that I wrote Dec. 1995 and revised in May, 1996.   And revised again now.[April '97 -Rasey]

 

It is short on advice for multi-sheet workbook design, VBA style, and ODBC datasources.  So I am interested in advice and standards in these areas in particular.   Also I lack good rules for outlining.  I have extensive experience in multi-sheet, macro driven financial planning processess with 500K calculation models, 800K of XLM macros, and 20 MB of Excel Database files.  These are the lessons I learned over the last 5 years.

 

(Reposted from Oct, 1996)

My design principles:

 

Use Column $A or a "History" Sheet for a Version history to document what you did and what others may have done to the sheet or workbook.   Frankly, I prefer a History column on EVERY sheet.  $A$1 on one of those sheet is named "Version" for easy reference.

 

Use Columns $B:$E to define all important Constants by Range Name. A Constants Sheet could also be used depending upon the circumstances.  If a constants sheet is used, then all ranges must be global.  Four columns:

             The values go in $D 

             the range name labels go in $C. 

             The text describing the constant goes in $B

             The UNITS of the constant goes in $E

 

These constants include not just numeric constants like "DiscountRate" and the like, but all Filenames, SheetNames, and Directories that will need to be accessed.

 

When arranged in this manner, you can easily create and maintain your range names by highlighting $C:$D, Menu, Insert, Names, Create, Left Column

 

NAME ALL RANGES used in addresses between sheets and files.  This rule is broken all the time which causes lots of unseen bugs.   I cannot stress this enough.  Any absolute cell address to another sheet guarantees bugs.

 

Even within a single sheet, using Range Names for for formula references is a good habit to use whenever possible.

 

Range Names should be written in UpperAndLowerCaseWithoutUnderlines.  Personal preference coupled with the fact that the Underline and the minus look too much alike and come from the same key.

 

Use Array Arithematic as a habit.  It has several advantages:

             (a)  It is secure.  You or another user cannot mistakenly change one element of an array so that there is a camophlaged  "0" inside a large relative reference formula area.

             (b)  It is FAST!  MS did something right in its array processing on x86 machines. 

             (c)  It helps you structure the problem better.   By thinking about arrays, you naturally start to organize your sheet according to the type of array you a building.

             (d)  Coupled with Range Names, it is much easier to understand what the formula does.   A 48 element array with the formula {=UnitsSoldByMo*UnitPriceByMo} is pretty clearly 4 years of Revenue by Month.

 

For involved calculations, I try to use horizontal arrays.  When I do this, I try to document the logic this way:

             If the Array formula is in E10:P10,

             then D10 will contain the Range name of the Array. 

             C10 will be a TEXT version of the formula, using subscripts if necessary.  C10 can be narrow with most of the text hidden, but it there if you need it.   This structure is similar to the way the constants section is organized.

 

 

As part of the Constants area, build concatenated strings for full path filenames.   If you instead create statements like:

   =Open(DirectoryA&"\"&FileA)

you are making an implicit assumption DEEP WITHIN THE CODE that FileA will always be found in DirectoryA.   Much better to have that assumption made JUST ONCE in the constants column $C

   FilePathA=DirectoryA&"\"&FileA

where you will be confronted with it.

 

Whether you use cross-sheet and cross-file links or use the Copy/PasteValue method of transfering data, I strongly recommend that you reserve places on your sheets where the data comes in and leaves.  Think of it as a "Shipping & Receiving" area on a complicated sheet.   This allows you to

             (a) locate the data that's coming in and going out quickly, and

             (b) allows for Array formulas for your links.

 

If you adopt the link method, keep the links as simple references and save the arithmetic operations for dependent cells.   Whenever I've done something like this:

    {=[FileA]SheetB!RangeC*[FileD]SheetE!RangeF}

I've always regretted it.   Invariably, I'll need something in RangeC uncontaminated by RangeF sometime later. 

 

This leads to another rule that supports the "Shipping & Receiving" method.  Bring in data from another sheet no more than once.   If you make the same cross-sheet reference in more than one formula, you create complexity and slow the process down (both the recalc and your debugging!).

 

Arrays are wonderfully fast in x86/Windows.  The Mac kernel is not as well designed; I've seen the same array-intensive model recalc anywhere from 5 to 1000+ times slower on a Mac than on equivalent PC's.  (Haven't compared PowerPC Mac or Windows.  Nor have I compared Excel 5.).  So if you need your model to run on more than one type of machine, Excel Version, and OpSys, test the model's cross platform performance EARLY in the prototyping.

 

Arrays, even in Windows, sometimes are finicky.   A sort within the domain of the array references can cause inefficiencies in the internal memory that will slow down the recalc time substantially (100x in one case).  However, if the file was saved, closed, and reopened, the recalc time would be speedy again.  I can give details if there is interest.

 

Other lessons for designing large, complex systems:

 

The overall design should be

1. Data Input in one or more workbooks that can be copied and modified for many projects.   Each workbook should be of a different type of information.  There may be many instances of each type saved as different files.

2. ONE, repeat ONE, copy of any calculation engine.

3. One Master sheet for each report with links to the calculation engine (2).

4. All Output saved in UNLINKED dumb sheets with formats and values.

 

It takes a simple macro to control this.  

             Copy Data Input from files in (1)

             Paste into Input range in (2).  

             Calculate,   

             Copy Results from (3) and

             Paste value and Format into a new file for  in (4).

 

One well designed block copy from one sheet into a receiving range on another is much more easily maintained than lots of cross sheet links.   It depends upon the application, but my experience is that if you need to worry about changing links, you should think instead about copy/paste through macro control.

 

Starting with Excel 5, every Excel Database range needs to go on its own sheet.  I still use alot of =Data.Delete() operations in XLM code to localize deletions in the middle of a sheet, but thats only because MS changed the rules of Data operations starting in XL5.  I don't know how much longer I will be able to do that.

 

Reports and Sheets built via Query,  Data.Extract, or Adv. Autofilter should be structured this way:

 

(1) some blank rows

(2) Optional column summation rows

(3) Print Header Rows (for beauty)

(4)        row of column constants

(5)        row of column formulae needed to be built.

(6) Database Column Headers for query or extraction.

(7) top row of data extraction.

.....

(8) bottom row of data extraction

 

(6) is the Extract range, Query destination, or Autofilter extraction target.  Its headers are must be directly on top of the data, especially if this data is to be used as another Database area.  The column headers are generally not for human consumption.

 

Most of the time the extraction is not sufficient for a report.  Usually you need to do arithematic ops on the data.  Even rearrange it.   (5) contains a range of formula that can be copied down for all rows of extracted data.

 

(4) serves (5) as a place for column constants referenced in formulas in (5).

 

(3) This is where you put the multi-line, well formatted column headers for your printed report.   The page data headers will cover the rows in (2):(3).

 

(2) As a matter of preference, my column total appear at the top of the report (above the headers).  I can't say eveyone likes it, but it has two advantages: 

             The summation is promenently on the 1st page (as well as all others)

             It is much easier to program, the position never moves and the formulation is usually be a range name who length is a function of the extraction size.

 

Give range names to (6):(8) and to (7):(8).  The former is a range name that can be used in other filters, pivot tables, or as an upload to Access.  (7):(8) is good for sorting, and Index references.

 

 

Steve Rasey

Houston, TX