Penn Computing

Penn Computing

Computing Menu Computing A-Z
Computing Home Information Systems & Computing Penn

  
  UPDATES
Current Load Status
Regular Availability
  
  INFORMATION
FAQs & Tips
Password Changer
Support services
Security
About the Data Warehouse
Data Administration
  
  DATA COLLECTIONS
Advancement
Assets
BRIM
Express Mail
Facilities
Faculty
GAR
General Ledger
Infrastructure
ISSS-iOffice
Learning Management
Position Inventory
Research-PennERA Proposals
Salary Management
Student
Travel Expense Management
Tuition Distribution
Cross-Collection

Guidelines for Writing Queries

Queries often require complex conditions (or "query filters") to return appropriate results. WHERE clauses (or conditions) reduce the amount of data to be processed in a SELECT statement (comprised of desired result objects or columns) by specifying that only those rows meeting the criteria in the WHERE clause are displayed. Depending upon which tool you use to query the Data Warehouse, your degree of control over the query language and operators will vary.

  • BusinessObjects and Microsoft Access employ graphical interfaces that construct appropriate SQL behind the scenes based on the tables and joins the user indicates.
  • Oracle SQL*Plus, on the other hand, requires the user to write his or her own SQL statments.
  • The Oracle database system, which forms the foundation of the Data Warehouse, permits the use of powerful SQL operators, some of which may also be available as post-query operators in your desktop tool.

The following guidelines are for all users of the Data Warehouse, regardless of query tool, and provide good practices for efficient querying.

Refer to the help documentation.

The main difficulty most people have with writing queries is knowing which table to use. If you are unsure about which table to use, refer to the table help that sections "Common Uses" and "Cautions." These sections may help you decide part of the data collection documentation on the web. Use the search functionality on the main page of each collection to help find exactly what you need.

Take advantage of indexes.

If possible, include an indexed data element in your condition statement. A query with a record selection condition using an indexed data element tells the system to go directly to the rows in the table that contain the value indicated and to stop retrieving data when the value is no longer found. If a query does not select records based on an indexed data element in its record selection condition, the system starts searching at the first row in the table and works through every row until it reaches the last row in the table. Indexed columns are noted in each collection's documentation.

Certain operators or query segments are processed by the system without the use of indexes, even if the column in the condition is indexed. It may, of course, be necessary for you to construct your query in this manner to retrieve correct results, but in considering alternatives in query construction you may wish to keep in mind the following situations where indexes may not be used:

  1. Negative comparisons such as Not Equal (represented by =! in SQL), Different From, or Not In. Avoid negative phrasing of condition statements as much as possible. In general, it is easier (both for the system and for you) to interpret a positive phrase than a negative phrase. For example, instead of the condition statement "If term is not greater than 1998A," rephrase the statement to "If term is less than or equal to 1998A." Or, if practical, eliminate the condition from the query and filter your results on the desktop.
  2. Nulls such as Is Null or Is Not Null.
  3. Like or Matches Pattern comparison with a date or number column. For example, to retrieve employee payments from March (of any year) use "FISCAL_MONTH_SEQ = '09'" rather than "CHECK_DATE Like 03/%".
  4. Wildcards at the beginning of a string. Avoid matching patterns beginning with a wildcard (Like %...). A wildcard at the end of a pattern is definitely appropriate and can be very efficient (e.g., Where Fund Like 5% will retrieve all Funds 500000 - 599999).
  5. Indexed columns modified by an expression or function (e.g., rather than concatenating all Chart of Account segments as COA_CNAC||COA_ORG||COA_BC||COA_FUND||COA_OBJECT||COA_PROGRAM||COA_CREF, select the column COA_ACCOUNT, which is indexed). Also, comparing an indexed column to another indexed column using Greater Than (>) or Less Than (<).
  6. Check the "and/or" qualifiers in the records selection criteria of the condition statement.
    • For example, a query coded to get students with the following conditions statement will actually return every student in COL for 1998A and every student for 1998C regardless of the division:
      • If division is equal to 'COL' and term is equal to '1998A' or term is equal to '1998C'
    • The query coded to get students with the following conditions statement will return every student in COL for 1998A and 1998C:
      • If division is equal to 'COL' and (term is equal to '1998A' or term is equal to '1998C')

If your access to data is restricted, do not force the security system to select records for you.

For example, if you are authorized to access data only from a particular department, one of your record selection conditions should state "If Organization='My Organization'," where organization is the code for your department.

Review your query before executing it.

Check to make sure that your query is as precise as possible. This includes selecting the tables that will give the best results, reviewing selection conditions and sort criteria, and if it makes sense to do so, including at least one indexed data element in the conditions statement. For example, if you want to find all undergraduate freshmen and their names, choose the Person table rather than the ADDRESS table. This is because a student can have multiple addresses, and choosing the ADDRESS table would return a name for each address the student has listed.

Be aware of data that is subject to change and its effect on your results.

For example, a grade change can affect a student's grade point average (GPA). A query executed before and after the grade change may or may not result in a changed GPA. In addition, keep in mind that there is a "data delay" between Warehouse collections and their respective source systems. Refresh schedules are noted in each collection's documentation.

Give the query time to execute.

Queries can take many minutes to execute; complex queries can take longer. It is not uncommon for a query to take 5 to 10 minutes to complete. In general, let the query run until it finishes. If the query takes longer than 1 hour to complete, contact Data Administration.

Note the date and time of the query when creating reports or communicating results to others.

top

Information Systems and Computing
University of Pennsylvania
Comments & Questions


Penn Computing University of Pennsylvania
Information Systems and Computing, University of Pennsylvania