┏━━━━━━━┳━━━━━━━┳━━━━━━━━━━┓ ┃ len ┃ n ┃ rel_freq ┃ ┡━━━━━━━╇━━━━━━━╇━━━━━━━━━━┩ │ int32 │ int64 │ float64 │ ├───────┼───────┼──────────┤ │ 11 │ 935 │ 0.831111 │ │ 14 │ 1 │ 0.000889 │ │ 13 │ 172 │ 0.152889 │ │ 12 │ 17 │ 0.015111 │ └───────┴───────┴──────────┘
Exploring IES short and ALI DB
ALI DB
Exploring award number character lengths
Joins
- left anti to find unmatched IDs in ies short
- left join ali db
There are 1,686 unmatched records out of 2,996 total records (56.28%) in the ies short data
on award ID clean
0 records joined via award id clean out of 1,686
Of the 0 records, 0 of these have different values on key dimensions such as award number, year, and type, which suggests a mismatch.
on award ID
0 records joined via award id out of 1,686
Of the 0 records, 0 of these have different values on key dimensions such as award number, year, and type, which suggests a mismatch.
Better to use original award number.
on title
2 records joined via title out of 1,686
Of the 2 records, 2 of these have different values on key dimensions such as award number, year, and type, which suggests a mismatch.
Amounts and years don’t match on a few IDs.
prod
Unmatched columns that will be populated with NULLs.