I have a series of files I've downloaded from the [Texas Ethics Commission](https://www.ethics.state.tx.us/dfs/search_CF.htm) (a misnamed . You can see an example of their [layout in the
ReadMe.txt
](https://www.ethics.state.tx.us/tedd/CFS-ReadMe.txt) . I've made an [tag:ETL] library which processes the [ReadMe.txt
](https://www.ethics.state.tx.us/tedd/CFS-ReadMe.txt) to generate the [SQL DDL](https://en.wikipedia.org/wiki/Data_definition_language) to create this schema, and load it from the CSVs. The thing is, I *think* this is a standardized format. I imagine it's backed by something like a PICK (which is a BASIC database) or something COBOL-esque -- having worked with them before, and that this is something like a [MARC](https://en.wikipedia.org/wiki/MARC_standards) , or ANSI, or ISO standard.
I'd like to potentially abstract out my ETL script to benefit others who use this format.
Some identifying features of the format are that it supports
* Arrays
and internal one-to-many relations on the record
* at least types BigDecimal
, Long
, Date
, String
* the export is labeled "Flat File Architecture Record Listing"
Arrays
=====
For example here you see this,
Array 4050
loanGuarantorLoanPersent[5/ROW_MAJOR] CsvPublicExportLoanGuarantorLoanPersent 810 Guarantors for the loan (maximum 5)
46 guarantorPersentTypeCd String 30 Type of guarantor name data - INDIVIDUAL or ENTITY
47 guarantorNameOrganization String 100 For ENTITY, the guarantor organization name
48 guarantorNameLast String 100 For INDIVIDUAL, the guarantor last name
That defines a structure called a loanGuarantorLoanPersent
and essentially declares that there are five of them. So the export CSV will have something like
guarantorPersentTypeCd1,guarantorNameOrganization1,guarantorNameLast1,guarantorPersentTypeCd2,guarantorNameOrganization2,guarantorNameLast2,guarantorPersentTypeCd3,guarantorNameOrganization3,guarantorNameLast3...
You can see an example of this data here,
* https://github.com/EvanCarroll/db-Texas-Ethics-Commission/blob/master/data/TEC_CF_CSV/ReadMe.txt
* https://github.com/EvanCarroll/db-Texas-Ethics-Commission/blob/master/data/TEC_LA_CSV/LobbyLAR-ReadMe.txt
Asked by Evan Carroll
(65502 rep)
May 21, 2018, 03:31 AM
Last activity: Aug 7, 2024, 05:35 PM
Last activity: Aug 7, 2024, 05:35 PM