IEM :: NWS Text back to 1983

5130 Views NWS Text back to 1983

Link: https://mesonet.agron.iastate.edu/wx/afos/list.phtml

Although the year is 2020, the National Weather Service (NWS) continues to use ASCII text files as their primary and official data format to convey most of the products issued to the public. While it is trivial to find real-time sources of these text files on the Internet, archives are not easy to come by. So for that reason, the IEM maintains a very large archive of these text products and is excited to announce a back-fill to the year 1983!

Notes on IEM Archive Sources

Starting in 2008, the IEM started systematic archiving the data and the archive should be fairly complete since then.

For the 2001-2008 period, the NCEI Service Record Retention System was used to back-fill the database. Data from this upstream service was rather difficult to work with and code was written to attempt to work through corrupted data files and such. Archive coverage for this period is decent, but there is less confidence than the 2008 and onward period.

For the 1995-2001 period, a colleague at the University of Wisconsin provided some NOAAPort daily archive files that contained a lot of text products that were parsed out and dumped into the database. These provided files were saved to the MTArchive website and can be found in the 'noaaport' folders within the daily directory trees, for example 1 Jan 1998.

Over the years, I knew of NWS employees that have directly contacted NCEI to request text data for years prior to what's available on the public website starting in 2001 (see SRRS above). One employee suggested I contact NCEI and see if they would dump the entire archive to me for liberation :) And sure enough, they did! The files provided were in a binary format called TD9949 and unfortunately the documentation did not exactly match the actual format. NCEI support denoted that nobody there used or knew the format anymore :) So I was able to reverse engineer some quirks of the format and wrote a python script that I think does a good job of getting nearly all the ASCII text out of the binary format files. The raw files are in the same noaaport folders mentioned above with file names starting with 9949.

IEM Archive Quirks

The feed of NWS text data includes products in SHEF format, which is generally hydrological data. The data volume is way, way, way too much for me to archive within a database. The IEM text archives only contain the last seven days of SHEF format data and then it is deleted. I have some means to manually dump archived SHEF data files, but alas. Of course, the processed SHEF atomic data can be downloaded here.
Besides SHEF, some other high-volume text files are purged after seven days as well. If you are curious, this python script contains the SQL query that purges these products. They generally are not of too much public interest and so are purged to conserve database space.
The database stores product metadata along with the unaltered text. This metadata allows for easier searching. One such piece of metadata is the identifier of the NWS center that issued the product. These identifiers have changed over the years as offices have been opened and closed. For better or worse, the IEM rectifies these identifiers to use present day WFO identifiers to allow for more simplified searching. A practical example is there used to be a WSO in Waterloo, IA that used the identifier KALO. Since the WSO is now long closed and the location is now covered by the KDMX (Des Moines, IA) office, these products can be found under the KDMX identifier. A python script on the IEM Github repository contains a cross reference of the rectification that I did. If you are struggling to find old text from a closed center, please contact us for help!
Since three or more sources of text were merged into this archive, there may be product duplication. I ran some scripts that attempted to deduplicate, but I am sure some issues remain.
Another key metadata stored within the database is the "AWIPS ID" or somtimes called the six character AFOS identifier. The TD9949 format explicitly provided these IDs, so it was rather easy to dump these to the database. For other data sources, this had to be parsed from the raw text and as such is fraught with pain. If you are struggling to find a product, it could be because this ID is not exactly what you expected. Please contact us for help and we'll try to find it!

In summary, this is a very cool and uniquely available dataset that I hope the community really enjoys combing through. To reiterate, if you are having trouble finding something you think should be in the archives, please contact us. Enjoy!