Excellent Analysis

Delivering information as a report may not be enough for your users–they may want to manipulate, analyze, and otherwise explore the data. Unfortunately, for that you need Excel. Editor Peter Vogel shows you how to create an Excel spreadsheet that includes the most powerful analysis tool that Excel contains: the PivotTable.

When I was head of an IT department, one of the most important functions that my department performed was to deliver usable information to our users. For most users that was a simple report. However, for many of our users, a simple report wasn’t going to do the job. They wanted the information but they also wanted to be able to manipulate the information, to generate new numbers based on the existing data, and to re-organize the data to see new relationships. In other words, they wanted Excel. However, the data that our users wanted required some processing before it was ready for use, and that massaging was beyond the capabilities of end-user reporting.

Access came to the rescue, of course. A combination of VBA code and SQL allowed me to extract the data that our users needed. But when it came to delivering the report, the best that my department could do for our users was to save an Access dataset or report as an Excel spreadsheet and then e-mail the spreadsheet to those users who needed the information. However, my users were then obliged to build the Excel functionality that they needed into the spreadsheet every time they got the report.

More could have been done, I suppose, including using automation to manipulate Excel to create the spreadsheets that the users really needed. However, starting Excel on the computer that was generating these reports wasn’t a trivial task. Excel can start slowly and require a lot of memory to do its job, and manipulating Excel from Access can run very, very slowly. More importantly, using automation to manipulate Excel at the server isn’t supported by Microsoft (and probably violates Office licensing).

That was a very long time ago (pre-Web, among other things). Now rather than e-mail the spreadsheet, I’d want to create a Web page that users could access to get the report or spreadsheet that they wanted. In addition to the problems that I had before, moving a binary spreadsheet across the Web is problematic.

XML to the rescue

As I discussed in my article in the September 2003 issue of Smart Access (“Creating Spreadsheets Without Excel”), you can generate Excel spreadsheets as XML text files without ever involving Excel (at least at runtime). In addition to avoiding licensing problems and reducing the load on the server, text files can be sent reliably across the Web. These XML spreadsheets are supported by versions of Excel from 2002 on.

In my September article, I looked at the simplest workbook, which includes a spreadsheet but no data or formulas:

<?xml version="1.0"?>

<?mso-application progid="Excel.Sheet"?>

<Workbook

 xmlns="urn:schemas-microsoft-com:office:spreadsheet"

 xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">

<Worksheet ss:Name="Sheet1">

</Worksheet>

</Workbook>

Adding rows, cells, data, and formulas isn’t that much more complicated. This example puts the number 1 in cell A1, the number 2 in cell A2, and a formula that adds the two numbers together in A3:

<Worksheet ss:Name="Sheet1">

  <ss:Table>

     <ss:Row>

         <ss:Cell>

             <ss:Data ss:Type="Number">1</ss:Data>

         </ss:Cell>

     </ss:Row>

         <ss:Cell>

             <ss:Data ss:Type="Number">2</ss:Data>

         </ss:Cell>

     </ss:Row>

     <ss:Row>

         <ss:Cell ss:Formula="=R1C1+R1C2">

             <ss:Data ss:Type="Number">3</ss:Data>

         </ss:Cell>

     </ss:Row>

  </ss:Table>

</Worksheet>

Those samples could be created with standard string-handling functions or with an XML parser. Once the spreadsheet is constructed, I could deliver the spreadsheet via e-mail (as I did before) or save the spreadsheet as a file to the Web site’s directory and let the users pick it up with their browser at their convenience (Internet Explorer will start up Excel to read the spreadsheet automatically).

Building a PivotTable

But what if you want to send something more than a simple spreadsheet? What if you want to send that most useful of tools, a PivotTable? (See Figure 1 for an example of a worksheet with a PivotTable.) An Excel PivotTable is a table within the spreadsheet. Like a table, it has a set of data along the top row, a set of data down the outside column, and a third set of data in the middle of the table, organized by the data in the top row and outside column. It’s a tremendously powerful tool, sort of an Access crosstab query on steroids. For instance, a PivotTable allows the users to choose which set of data is in used in the top row and outside column by dragging a label describing the data from a holding area outside the table. Users can drag multiple sets of category data to the top row and outside row if they want. They can also set up a label that acts as Page data, which sets criteria that allow the users to select values that control which data is in the table (for example, what year’s data is displayed in the PivotTable).


Figure 1

You could probably figure out the tags required to create an XML spreadsheet containing a PivotTable, but it wouldn’t be easy. Here are just selections out of the significant parts of a simple PivotTable in an XML spreadsheet:

<PivotTable

  xmlns="urn:schemas-microsoft-com:office:excel">

   <Name>PivotTable1</Name>

   <ImmediateItemsOnDrop/>

   <ShowPageMultipleItemLabel/>

   <Location>R1C1:R12C6</Location>

   <VersionLastUpdate>1</VersionLastUpdate>

   <DefaultVersion>1</DefaultVersion>

   <PivotField>

       <Name>ReorderLevel</Name>

       <Orientation>Row</Orientation>

       <Position>1</Position>

       <DataType>Integer</DataType>

          <PivotItem>

            <Name>0</Name>

        </PivotItem>

        <PivotItem>

            <Name>5</Name>

        </PivotItem>

    …more PivotItems…

   </PivotField>

   …more PivotFields…

   <PTLineItems>

     <PTLineItem>

         <Item>0</Item>

      </PTLineItem>

      <PTLineItem>

         <Item>1</Item>

      </PTLineItem>

      …more PTLineItems…

    </PTLineItems>

</PivotTable>

With Office 2003, Microsoft has provided the schema for XML in Excel–or, rather, the 15 schemas that are used by Excel. In addition, there’s an 84-page overview of the Excel XML dialect, of which 13 pages are a not particularly clear description of some of the tags used to create a PivotTable (my guess is that well over half the tags in Excel XML are used for PivotTables). You can download the complete package, which includes all of the other Office 2003 XML schemas, from www.microsoft.com/office/xml. A quick look makes my point: You don’t want to try to figure out the structure of a PivotTable on your own.

Fortunately, you don’t have to. The strategy that I’ve been using is to create an Excel spreadsheet with the design that I want, save it as an XML spreadsheet, open the resulting document in Notepad, and then copy the text that I need into my program.

To convert the bewildering tags that I just showed you into a spreadsheet, all that’s necessary is to enclose them in the Workbook tags that you saw in my first example. The PivotTable element goes inside the Worksheet element and after the Table element:

<Workbook

 xmlns="urn:schemas-microsoft-com:office:spreadsheet"

 xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">

  <Worksheet ss:Name="Sheet1">

    <ss:Table>

     </ss:Table>

     <PivotTable>

       …PivotTable tags…

     </PivotTable>

  </Worksheet>

</Workbook>

With this XML present, the user will get the spreadsheet shown in Figure 2. No tags are required inside the Table element to define the PivotTable. Clicking on the Refresh Data button on the PivotTable menu bar will cause Excel to rebuild the PivotTable and give your user the display that you saw in Figure 1.


Figure 2

You can make it easier for your users to refresh the PivotTable if the current cell is inside the PivotTable. With the current cell inside the PivotTable, the user can either click on the Refresh Data button on the PivotTable menu bar or select Data | Refresh Data from the Excel menus to cause Excel to rebuild the table. The position of the current cell is handled through the Panes element in the WorksheetOptions element. The WorksheetOptions element follows the Table element and precedes the PivotTable element:

</Table>

  <WorksheetOptions

       xmlns="urn:schemas-microsoft-com:office:excel">

    <Panes>

      <Pane>

        <Number>3</Number>

        <ActiveRow>4</ActiveRow>

        <ActiveCol>1</ActiveCol>

      </Pane>

    </Panes>

  </WorksheetOptions>

  <PivotTable>

Some explanation is required for these tags: An Excel worksheet can be broken up into as many as four panes. By default only one pane is displayed, and that’s pane number 3 (as indicated in the Number element). The ActiveRow and ActiveColumn elements specify the current cell within the pane. So my previous example makes the cell at row 4, column 1 (that is, cell A4) the current cell. It would be convenient if you could also ensure that the PivotTable menu bar was visible, but that doesn’t seem to be controlled by the XML in a spreadsheet.

Where’s the data?

The next step is to include the data that’s displayed in the PivotTable. This is the value-added part of the process: If the data just exists in a table somewhere, your users could just use a database query from within Excel to retrieve the data that they wanted.

The first thing that you need to do is decide where in your Excel spreadsheet you’re going to put the data for your PivotTable. You specify that location using the ConsolidationReference element, which appears in the PTSource element after the end of the PTLineItems element (right at the end of the PivotTable element). The ConsolidationReference element contains the FileName element (which holds the name of the spreadsheet) and the Reference element (which contains a reference to the area of the spreadsheet that holds the data):

</PTLineItems>

   <PTSource>

      <ConsolidationReference>

        <FileName>[Excel XML.xml]Sheet1</FileName>

        <Reference>R26C4:R103C7</Reference>

      </ConsolidationReference>

    </PTSource>

  </PivotTable>

In this example, the spreadsheet is called Excel XML.xml (since the spreadsheet name has spaces in it, I’ve had to put the name inside square brackets). The reference points to cells D26 to G103 (row 2, column 4 to row 103, column 7).

The first row of your data must contain the names of the columns used in your PivotTable. These can be found Name element of the PivotField elements in the spreadsheet. This example sets the labels for my data area:

<Row ss:Index="26">

 <Cell ss:Index="4">

   <Data ss:Type="String">ReorderLevel</Data>

 </Cell>

 <Cell>

    <Data ss:Type="String">Order Date</Data>

 </Cell>

 <Cell>

    <Data ss:Type="String">Company</Data>

 </Cell>

 <Cell>

    <Data ss:Type="String">UnitsInStock</Data>

 </Cell>

</Row>

The Row element encloses all the cells for a single row. The index attribute specifies what row in the spreadsheet I’m defining (in this case, row number 26). Within the Row element, Cell elements specify each cell in the row. Again, the Index attribute specifies which Cell I’m defining. My first Cell element has its index attribute set to 4, indicating that this is the cell in column D. Subsequent cells don’t need an Index attribute, as long as the cells follow each other in the spreadsheet without gaps (the same is true of rows).

Within a Cell, the Data element holds the data for the cell and its Type attribute specifies the data type. Since I’m defining the labels for my data here, the data type for all of these cells is set to String. Within the first Data element, I have my first label “ReorderLevel”, which will appear in cell D26.

Inserting data into your spreadsheet is only slightly more complicated. Here’s a row with four fields (since this row would immediately follow the previous row, I don’t have to specify an Index attribute):

<Row>

  <Cell ss:Index="4">

    <Data ss:Type="Number">10</Data>

  </Cell>

  <Cell>

     <Data ss:Type="DateTime">

       1997-05-20T00:00:00.000

     </Data>

  </Cell>

  <Cell>

     <Data ss:Type="String">ALFKI</Data>

  </Cell>

  <Cell>

      <Data ss:Type="Number">39</Data>

  </Cell>

</Row>

The good news is that Excel only has five data types: Number, DateTime, Boolean, String, and Error. The bad news is that DateTime values must include the time, which follows the date and is separated from it by the letter “T”.

Widening and styling

The last thing that you’ll probably want to do is apply some style to your table. The first thing you’ll notice, for instance, is that your columns aren’t wide enough for much of the data that you’re displaying. Setting your column width is easy, however: Just add a Column element after the Table element and use its Width attribute to set the width of the column in points. This example sets column 5 (column E) to 68 points:

<Table>

 <Column ss:Index="5" ss:Width="68"/>

Setting a style for a cell or row isn’t that much harder. First, you need to define the style you want to use. Again, I do this through the Excel interface and save the results in an XML spreadsheet. I open the spreadsheet in Notepad, find my style, and copy it into my program. I define all of my styles in a single spreadsheet that effectively acts as a library of available styles for me. Here’s one example of a style that selects a type of font (Swiss) and turns on bolding:

<Style ss:ID="s21">

   <Font x:Family="Swiss" ss:Bold="1"/>

</Style>

Styles are held inside the Styles element, which comes immediately before the Worksheet element:

<Styles>

  <Style ss:ID="bolded">

    <Font x:Family="Swiss" ss:Bold="1"/>

  </Style>

</Styles>

<Worksheet …

The ID attribute on the style uniquely identifies each style. To assign a style to a cell, you reference the style ID in the cell’s StyleId property. To bold my labels, for instance, I’d use this XML:

<Cell ss:Index="4" ss:StyleID="bolded">

   <Data ss:Type="String">ReorderLevel</Data>

</Cell>

In the accompanying Download, you’ll find a sample database that generates an Excel XML spreadsheet with a PivotTable that displays data from the Northwind database. I’m willing to bet that just a few minutes of playing with that PivotTable will convince you that that this is a powerful way to deliver data to your users.

 

Other Pages That You Might Like To Read

Excel Automation
Build Word 2003 Docs with XML
Four Ways to Merge to Word
Outputting Flexible Data

About Peter Vogel

After 10 years of editing Smart Access, Peter continues to develop with Access (though he also does a lot of .NET stuff). Peter is also still editing other people's article--he even self-published a book on writing user manuals ("rtfm*") with a blog here.
This entry was posted in Office. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.