IRS Statistics of Income Division (SOI) Exempt Organization Files (Technical Note)

Are variables weighted?

All organizations with total assets (end-of-year) of more than $10 mil. (prior to 2000 or 2001) or $30 mil. (more recent files) are included with a weight of 1.

Weights for other organizations are designed to match populations of 6 other asset classes:

31 - Less than $250,000

32 - $250-499,999

33 - $500-999,999

34 - $ 1 mil. to under $2.5 mil.

35 - $2.5 mil. to under $5 mil.

36 - $5 mil. to under $10 mil.

Source: Cecilia Higard (SOI).

NOTE: SOI weighting categories are periodically adjusted. See FAQ link:

What returns are included in a 1993 file?

Any tax period BEGINNING in 1993. Thus a return covering 12/93-11/94 would be considered a 1993 return. Note that the E07-1 and E07-3 fields indicate the ENDING period, not the beginning period.

The SOI Bulletins contain a section written by Cecilia Higard and her statistician on data sources. This is the best (and only) published source for technical notes on the weighting process and other matters.

Source: Cecilia Higard (SOI).

The following are descriptions of variables used by SOI for internal processing:

C004: Document Locater Number (DLN): This is a unique code assigned to all IRS service centers transactions. It is helpful for us in tracking down missing returns. The 4th and 5th digits of the DLN represent the Document Code (DocCd in Core files). Possible values of DocCd are:

09 = Form 990-EZ revision 2007 and earlier

90 = Form 990 revision 2007 and earlier

91 = Form 990-PF

92 = Form 990-EZ revision 2008 and later

93 = Form 990 revision 2008 and later

C1001: Editor Code: This code identifies which editor (tax examiner) performed the data entry at the Ogden Service Center

E005: Sample Code: This code is assigned to returns on the Masterfile based on the posted (fair market value) asset amount. The are different sample rates associated with each code. (Do not confuse with E1007.) I can send you more info on our sample design if you're interested.

E1006: SCPL: This is a unique code that shows: (1) the Service Center where the return was processed. (Currently all returns are processed in Ogden, UT, so the first two digits are always 29.) (2) The Cycle (year+week) the return posted to the IRS Masterfile. (3) & (4) The Page and Line number from the shipping transmittal. The SCPL is the code we use most in querying our database, because while there may be duplicate EINs, each return will have a unique SCPL.

E1007: Generated Sample Code: This code is assigned to the return based on its asset amount AFTER editing. It is usually, but not always, the same as E005.

P794 and P795: These are adjustment fields from Part VII (Analysis of Income Producing Activities). On this section of the return, SOI picks up the data just as the filers report it. If the detail doesn't add to the total, the difference is put in P792 (Column B), P794 (Column D), and/or P795 (Column E).

Source: Paul Arnsberger 12/1/99

Added 03/05/2002 by tpollak, Modified 02/22/2010 by jdurnford


No comments.

Please login to add your own comments.