flowchart TB
%% command center
A["main"]
%% pipelines
B["pipeline: usa_ass"]
C["pipeline: usa_con"]
D["pipeline: ali"]
E["pipeline: awardee_lookup"]
%% output
AA["file: grants-assistance.parquet"]
AB["file: contracts.parquet"]
AC["file: ali.parquet"]
AD["file: awardee_lookups.parquet.parquet"]
AE["file: silver.parquet"]
%% flow
A --> B --> AA --> AE
A --> C --> AB --> AE
A --> D --> AC --> AE
A --> E --> AD --> AE
Data Dictionary
Silver
Creating the silver dataset
There are several input or raw data files:
- two USA Spend files
- Assistance/ Grants and Contracts
- a file provided by ALI that is similar to the web-scraped results
- a CSV file for state populations
The first step in is to join the ies short data with the usa spend assistance/grant data.
Next, the ies short data is joined with the usa spend contract data.
Next, the ies short data is joined with the ali data.
The next step is a bit more involved. Thus far three separate data sources have been matched to ies short using award_number_clean. The results of these joins are then used to create a lookup table based on awardees and recipient geographic info such as state, county, zip, etc. Some awardees in this lookup table have multiple geographies. Only awardees with distinct or singular geographic data are retained in the lookup table. The awardee lookup table is then joined to ies short on awardee. We call the results of this join awardee geo lookup. For example, let’s say that in the first three steps Florida State University (FSU) was identified as an awardee and it had distinct geo data - that is, all awards with FSU as the awardee have the same state code, county FIPS, zip code, etc. Then, using this info, all unmatched awards where the awardee is FSU are collected and added to the final silver data.
To create the silver dataset all of the aforementioned results are combined into a single file.
There are also some post-processing steps. Annual state population is added as a new field. Several lookup tables are created using the silver data to populate missing recipient geography fields. These lookup tables use zip code, state code, and county FIPS as the primary keys.
File and output structure
Schema overview
flowchart TB
%% Databases
bronze[(Database Bronze)]
silver[(Database Silver)]
%% Data
B["
USA Spend
Assistance/Grants & Contracts
"]
C[ALI]
D[Awardee Lookup]
E[State populations]
F[Awardee lookup table]
G[Matched awards]
%% Silver
bronze --> B --> G
bronze --> C --> G
bronze --> D --> G
bronze --> E --> G
bronze --> F --> G
G --> silver
Details
### Totals ###
IES Short awards: 2,996
- Grants/ assistance: 2,509 awards (84%) of total
- Contracts: 487 awards (16%) of total
USA Spend Grants/ Assistance: 1,592 awards
USA Spend Contracts: 1,069 awards
ALI DB: 1,125 awards
### Matches ###
Matched awards between IES Short and USA Spend Grants/ Assistance: 1,425 awards
- 57% of IES Short grants (1,425 out of 2,509)
- 90% of USA Spend Grants/ Assistance (1,425 out of 1,592)
Matched awards between IES Short and USA Spend Contracts: 182 awards
- 37% of IES Short contracts (182 out of 487)
- 17% of USA Spend Contracts (182 out of 1,069)
Matched awards between IES Short and ALI DB: 10 awards
- 0% of IES Short (10 out of 2,996)
- 1% of ALI DB (10 out of 1,125)
Awards identified using awardees with unique geo info in previously matched IES Short awards: 828
- 60% of unmatched IES Short (828 out of 1,379)
### silver ###
There are 2,445 awards in the final results
- 82% of IES Short (2,445 out of 2,996)
### Unmatched ###
Unmatched IES Short awards:
- 18% (551 out of 2,996)
Unmatched USA Spend Grants/ Assistance awards:
- 43% of IES Short grant (1,084 out of 2509)
- 10% of USA Spend Grants/ Assistance (167 out of 1592)
Unmatched USA Spend Contracts awards:
- 63% of IES Short contract (305 out of 487)
- 83% of USA Spend Contract (887 out of 1592)
Schemas
USA spend assistance/ grants
- primary key:
award_number
ibis.Schema {
award_number string
award_number_clean string
status string
award_type string
name string
amount int64
award_year int64
award_date_range_start date
award_date_range_end date
awardee string
project_type string
principal_investigator string
office array<string>
program array<string>
program_topics array<string>
evaluation_topics array<string>
as_of date
awarding_agency_code string
awarding_agency_name string
awarding_sub_agency_code string
awarding_sub_agency_name string
recipient_uei string
recipient_name string
recipient_parent_uei string
recipient_parent_name string
recipient_country_code string
recipient_country_name string
recipient_county_name string
recipient_city_name string
recipient_state_code string
recipient_state_name string
recipient_zip_code string
primary_place_of_performance_country_code string
primary_place_of_performance_country_name string
primary_place_of_performance_city_name string
primary_place_of_performance_county_name string
primary_place_of_performance_state_name string
primary_place_of_performance_zip_code string
recipient_county_fips_code string
recipient_state_fips_code string
primary_place_of_performance_county_fips_code string
primary_place_of_performance_state_fips_code string
usaspending_permalink string
parent_award_id_piid string
}
USA spend contracts
- primary key:
award_number
ibis.Schema {
award_number string
award_number_clean string
status string
award_type string
name string
amount int64
award_year int64
award_date_range_start date
award_date_range_end date
awardee string
project_type string
principal_investigator string
office array<string>
program array<string>
program_topics array<string>
evaluation_topics array<string>
as_of date
parent_award_id_piid string
awarding_agency_code string
awarding_agency_name string
awarding_sub_agency_code string
awarding_sub_agency_name string
recipient_uei string
recipient_name string
recipient_parent_uei string
recipient_parent_name string
recipient_country_code string
recipient_country_name string
recipient_county_name string
recipient_city_name string
recipient_state_code string
recipient_state_name string
recipient_zip_code string
primary_place_of_performance_country_code string
primary_place_of_performance_country_name string
primary_place_of_performance_city_name string
primary_place_of_performance_county_name string
primary_place_of_performance_state_name string
primary_place_of_performance_zip_code string
recipient_county_fips_code string
recipient_state_fips_code string
primary_place_of_performance_county_fips_code string
primary_place_of_performance_state_fips_code string
usaspending_permalink string
}
ALI
- primary key:
award_number
ibis.Schema {
award_number string
award_number_clean string
status string
award_type string
name string
amount int64
award_year int64
award_date_range_start date
award_date_range_end date
awardee string
project_type string
principal_investigator string
office array<string>
program array<string>
program_topics array<string>
evaluation_topics array<string>
as_of date
awarding_agency_code string
awarding_agency_name string
awarding_sub_agency_code string
awarding_sub_agency_name string
recipient_uei string
recipient_name string
recipient_parent_uei string
recipient_parent_name string
recipient_country_code string
recipient_country_name string
recipient_county_name string
recipient_city_name string
recipient_state_code string
recipient_state_name string
recipient_zip_code string
primary_place_of_performance_country_code string
primary_place_of_performance_country_name string
primary_place_of_performance_city_name string
primary_place_of_performance_county_name string
primary_place_of_performance_state_name string
primary_place_of_performance_zip_code string
recipient_county_fips_code string
recipient_state_fips_code string
primary_place_of_performance_county_fips_code string
primary_place_of_performance_state_fips_code string
usaspending_permalink string
parent_award_id_piid string
}
Awardee lookup
- primary key:
awardee
ibis.Schema {
award_number string
award_number_clean string
status string
award_type string
name string
amount int64
award_year int64
award_date_range_start date
award_date_range_end date
awardee string
project_type string
principal_investigator string
office array<string>
program array<string>
program_topics array<string>
evaluation_topics array<string>
as_of date
awarding_agency_code string
awarding_agency_name string
awarding_sub_agency_code string
awarding_sub_agency_name string
recipient_uei string
recipient_name string
recipient_parent_uei string
recipient_parent_name string
recipient_country_code string
recipient_country_name string
recipient_county_name string
recipient_city_name string
recipient_state_code string
recipient_state_name string
recipient_zip_code string
primary_place_of_performance_country_code string
primary_place_of_performance_country_name string
primary_place_of_performance_city_name string
primary_place_of_performance_county_name string
primary_place_of_performance_state_name string
primary_place_of_performance_zip_code string
recipient_county_fips_code string
recipient_state_fips_code string
primary_place_of_performance_county_fips_code string
primary_place_of_performance_state_fips_code string
usaspending_permalink string
parent_award_id_piid string
}