Browse the [modern] web in Internet Explorer 5 & 6

Here's something fun: Enable HTTPS browsing on IE5/Windows 98 with a Flask proxy that handles SSL and rewrites web content.

Windows 98 IE 6 Screenshot

For whatever reason, I have a Windows 98 virtual machine in UTM (which is fantastic on Apple Silicon) that I like to boot in to and take a trip down memory lane. It's fun, and painful at the same time. But, the modern web has moved on from Internet Explorer 6 (and Internet Explorer in general, but we're not celebrating that in this post). Lost in the dust, Internet Explorer of the Windows 98 days doesn't quite work anymore. IE6, released in 2001, came with support for SSL 2.0 and 3.0, and later updates added support for TLS 1.0. At the time, this was sufficient. The web was a different place, less sophisticated in both the technology it used and the threats it faced (generally speaking, of course).

Fast forward to the present, and the world has dramatically changed. We have toasters with wifi and fridges you can talk to. 
The versions of SSL and the initial iterations of TLS (1.0 and 1.1) are now considered insecure due to numerous vulnerabilities. Heartbleed, POODLE, and other less charmingly named vulnerabilities have led to a consensus in the community: SSL and early TLS versions are out. The internet has collectively moved on to TLS 1.2 and 1.3, which offer significantly improved security through stronger encryption algorithms and better protocols for ensuring privacy and data integrity.

For IE 5 and 6, this means being unable to establish secure connections with the vast majority of modern websites, which now mandate the use of TLS 1.2 or higher for all secure browsing.

Furthermore, the push for more secure web standards has led to widespread adoption of HTTP Strict Transport Security (HSTS), a policy that forces browsers to connect to websites over HTTPS only, using modern encryption protocols. 

So, wanting to get the modern web on Windows 98, and therefore IE5 then 6 after updating, I wrote a simple proxy in Flask to rewrite URLs to get around HTTPS requirements. Run the Flask server, and navigate to your server and append the URL you'd like to visit (see above screenshot).


This mostly works, and works well enough for website that don't load natively on IE6 (which is pretty much everything).  You can check it out on Github here:

I've read about similar projects that others have wrote for their Mac 68k, Mac II Plus as well, to do the same thing: act as a proxy to serve web content, subverting modern security requirements. I've been wanting to do something similar with my Macintosh Plus II, but first I gotta purchase ZuluSCSI before the floppy drive fails hah.

A few projects of note that I'd recommend checking out:

Failing HTTPS proxy with Ngrok on

I've been a fan of free code to cloud deployment services (PaaS) like and to launch my fly-by-night ideas. They both offer generous free tiers that will allow you to run your code (Node, Python, etc) on their platform and host it for whatever use. The downside of all these different PaaS companies is that they call come with their own CLI that you must learn, with different syntax. Regardless, deploying from Git is easy enough and in fact simpler. Anyway.


I've been working on a new project that I think I'll turn into a SaaS eventually. A part of the project uses Google OAuth to login to the user dashboard. Setting up a new application in Google Cloud Platform is easy enough, make your keys, set your permission scopes and voila; instant Google SSO for your custom application. A part of this is specifying your redirect and callback URLs. 


If you're developing locally, of course you'd use Since we're dealing with authentication here, Google is gonna be picky and request HTTPS of course. What do you do? Generate a self signed SSL cert to get around Chrome's annoying popups. What if you want to share your cool, fun new app with your friends and get them to beta test your half-baked idea? ngrok is a reverse-proxy, allowing you to tunnel your local webserver to a free domain, allowing you to share your local application with anyone with the generated link. Gracefully, ngrok provides a SSL certificate to the service too, so you can focus on building your app.


I regularly use Flask for my Python projects, dabbling in FastAPI if I need something straightforward or Django is I feel like spending 5 days debugging why my routes don't work (I'm joking about the last part). I was encountering a weird issue where, upon loading my ngrok site in HTTPS, upon logining into Google SSO it would deny my request. Google displays the redirecting URI for you to debug, and I could see the redirect URI was http:// instead of https://. Why did that redirect happen when the HTTP domain was never accessed? Well, our webserver still serves content in HTTP but ngrok's reverse proxy does the magic and gives us that HTTPS part.


After some Googling, I was left with this solution:

from werkzeug.middleware.proxy_fix import ProxyFix

app.wsgi_app = ProxyFix(app.wsgi_app, x_proto=1, x_host=1)


Why does this work and why is it needed?


When using ngrok, it acts as a reverse proxy, creating a secure HTTPS tunnel to our local HTTP server. This is where ProxyFix plays a pivotal role. It's akin to an interpreter that correctly translates the communication between the secure ngrok layer and our Flask app. By configuring the wsgi_app attribute of the Flask app with ProxyFix, it effectively aligns the external HTTPS requests with the internal HTTP environment of the Flask server. This alignment is critical because, without it, the Flask app might misinterpret the secure HTTPS requests as insecure HTTP, leading to issues like the one I faced with Google SSO. The middleware specifically trusts the headers from the proxy – X-Forwarded-Proto and X-Forwarded-Host, through the x_proto=1 and x_host=1 arguments. This ensures that even though the Flask server itself is running on HTTP, it recognizes and correctly handles requests forwarded through the HTTPS tunnel provided by ngrok.


Simple fix that allowed me to use HTTPS fully with ngrok and Flask. Now, when deploying to, the situation is the same, except with a distinction:
app.wsgi_app = ProxyFix(app.wsgi_app, x_for=1, x_proto=1, x_port=1)

  1. x_for=1: This parameter ensures that your application uses the first 'X-Forwarded-For' header to determine the original client IP address. This is crucial in a cloud environment like Railway, where your app might be behind multiple layers of proxies.
  2. x_port=1: This indicates that the app should trust the first 'X-Forwarded-Port' header, which is important for accurately identifying the port number used in the client's original request. This can be essential for constructing URLs and for certain security checks.

Understanding x_host=1

  • What it does: Setting x_host=1 tells Flask to trust the X-Forwarded-Host header provided by a proxy server. This header indicates the original host requested by the client.
  • Why it's insecure: The primary security concern with trusting the X-Forwarded-Host header arises from the possibility of header spoofing. If the proxy isn't configured properly to overwrite or discard this header from incoming requests, a malicious user could inject a false host header. This might lead to incorrect URL generation, misleading redirects, or in worst cases, security vulnerabilities like open redirects or host header injection attacks.

While the change is small, the distinction should be made when deploying locally or remotely. I hope this post helps any other developer who landed in my place ;)



Search multiple grocery stores at once

tl;dr check out


The price of groceries in Canada has absolutely skyrocketed over the past six months. Everytime I go to the grocery store, I am no longer shocked at the price increase. In fact, it's almost a bit of a game finding something that hasn't gone up.

A great example is Loblaw's No Name potato chips. On sale they were 97 cents, regularly priced at 99 cents. Great value for something so simple. I went to an Independant Grocer two weeks ago, expecting to be shocked but what I received was a heart attack instead. The price increased by 150%! The sale price now is $5.00 for 2 bags.

Absolutely frustrated with running across town to five different grocery stores to get a deal, I decided to get to some hacking. What I wanted to do is create a central search engine, where I can compare unit prices. I also need the ability to change which grocery-chain store I'm querying.

(for info about the API endpoints themselves, see

President's Choice

This was the easiest API to reverse engineer of all of the grocery stores I attempted. Intercepting the HTTP traffic from the PC Mobile app using a proxy (like BurpSuite) makes recreating API requests trivial.

No authentication nessesary, but you will have to grap the X-Apikey from the request.

Example request:

curl -X POST \ \
  -H 'Host:' \
  -H 'Accept: application/json, text/plain, */*' \
  -H 'Site-Banner: superstore' \
  -H 'X-Apikey: 1im1hL52q9xvta16GlSdYDsTsG0dmyhF' \
  -H 'Content-Type: application/json' \
  -H 'Origin:' \
  -d '{
        "pagination": {"from": 0, "size": 48},
        "banner": "superstore",
        "cartId": "228fb500-b46f-43d2-a6c4-7b498d5be8a9",
        "lang": "en",
        "date": "05122022",
        "storeId": "your_store_number_here",
        "pcId": false,
        "pickupType": "STORE",
        "offerType": "ALL",
        "term": "your_search_query_here",
        "userData": {
            "domainUserId": "b3a34376-3ccf-4932-8816-7017bd33f2fc",
            "sessionId": "5580cec2-5622-4b34-8491-d94f9dd48480"

But how can we search specific stores? Thankfully the iOS PC Express mobile app has a flyer search functionality. Even better, it's an easy REST API with an included private API key. This means we won't have to worry about authentication by the user, in regards to logging into the mobile app.

mobile flyer search

With this flyer search, we can search for Loblaw store IDs by postal code. Just what we need to put together a universal grocery store search ;)

SaveOn Foods

Nothing much to say here, I used the same method for capturing web traffic and recreated the request in Python. Again, no authentication from the front end needed to query these APIs.

Example store query:

curl '{store_number}/preview?popularTake=30&q={search_query}' \
  -H 'X-Correlation-Id: b0bb5f7c-5c00-4cac-ae8a-f34712d0daad' \
  -H 'X-Shopping-Mode: 11111111-1111-1111-1111-111111111111' \
  -H 'X-Site-Host:' \
  -H 'Sec-Ch-Ua: 1' \
  -H 'Client-Route-Id: 26186555-b0d7-4251-91e1-fca38fd364aa' \
  -H 'Sec-Ch-Ua-Mobile: 1' \
  -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36' \
  -H 'Sec-Ch-Ua-Platform: 1' \
  -H 'Sec-Fetch-Site: same-site' \
  -H 'Sec-Fetch-Mode: cors' \
  -H 'Sec-Fetch-Dest: empty' \
  -H 'Origin:' \
  -H 'Accept: application/json; charset=utf-8'


Safeway doesn't offer online grocery delivery in all cities, just Montreal, Toronto and Vancouver. They launched a relatively new service,, for grocery delivery. Unfortunately, I cannot say if these prices returned by a query are applicable to other stores outside those cities.

Example request:

curl '<SEARCH_QUERY>' \
  -H 'Sec-Ch-Ua: 1' \
  -H 'Client-Route-Id: 26186555-b0d7-4251-91e1-fca38fd364aa' \
  -H 'Sec-Ch-Ua-Mobile: 1' \
  -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36' \
  -H 'Sec-Ch-Ua-Platform: 1' \
  -H 'Sec-Fetch-Site: same-origin' \
  -H 'Sec-Fetch-Mode: cors' \
  -H 'Sec-Fetch-Dest: empty' \
  -H 'Origin:' \
  -H 'Accept: application/json; charset=utf-8' \
  -H 'Cookie: VISITORID=NzEyMmMzZTEtOTYzNy00MmIwLWI2NTAtNjY0NjBlZWVhOTVjOjE2NzAyMzA1NDM2NjE=; global_sid=LvSLAl2jV2YrN3AeAIbMt_Tl8DWedrNo3lJ59CxyIMI0NeYPYfzDxY2UP7FJhEdl5xSWPxf6uvxynINrmMq5p1agATEZlVMM'


Oh boy, this was took a bit of work to recreate the request. All the other APIs were a simple POST or GET request, but not Walmart. I can only guess why they chose to use GraphQL over other common web service architechtures. Walmart routinely has the lowest prices, so this was something I wanted to attack.

Here's what our request body looks like:

        walmart_data_body = {
            "operationName": "getPreso",
            "query": 'query getPreso($qy: String, $cId: String, $miPr: String, $mxPr: String, $srt: Sort, $ft: String, $intS: IntentSource, $pg: Int, $ten: String!, $pT: String!, $gn: Boolean, $pos: Int, $sT: String, $sQ: String, $rS: String, $sp: Boolean, $aO: AffinityOverride, $dGv: Boolean, $pap: String, $ptss: String, $bSId: String, $ps: Int, $fSP: JSON, $fFp: JSON, $dId: String, $iCLS: Boolean! = true, $aQP: JSON, $vr: String, $fE: Boolean! = false, $iT: Boolean! = false, $tempo: JSON, $p13n: JSON) {\n  search(\n    query: $qy\n    prg: ios\n    cat_id: $cId\n    min_price: $miPr\n    max_price: $mxPr\n    sort: $srt\n    facet: $ft\n    intentSource: $intS\n    page: $pg\n    tenant: $ten\n    channel: "Mobile"\n    pageType: $pT\n    guided_nav: $gn\n    pos: $pos\n    s_type: $sT\n    src_query: $sQ\n    recall_set: $rS\n    spelling: $sp\n    affinityOverride: $aO\n    displayGuidedNav: $dGv\n    pap: $pap\n    ptss: $ptss\n    ps: $ps\n    _be_shelf_id: $bSId\n    dealsId: $dId\n    additionalQueryParams: $aQP\n  ) {\n    __typename\n    query\n    searchResult {\n      __typename\n      ...SearchResultFragment\n    }\n  }\n  contentLayout(\n    channel: "Mobile"\n    pageType: $pT\n    tenant: $ten\n    version: $vr\n    searchArgs: {query: $qy, cat_id: $cId, facet: $ft, _be_shelf_id: $bSId, prg: ios}\n  ) @include(if: $iCLS) {\n    __typename\n    modules(p13n: $p13n, tempo: $tempo) {\n      __typename\n      schedule {\n        __typename\n        priority\n      }\n      name\n      version\n      type\n      moduleId\n      matchedTrigger {\n        __typename\n        pageId\n        zone\n        inheritable\n      }\n      triggers @include(if: $iT) {\n        __typename\n        zone\n        pageId\n        inheritable\n      }\n      configs {\n        __typename\n        ... [TRIMMED]...
            "variables": {
                "aQP": {"isMoreOptionsTileEnabled": "true"},
                "dGv": True,
                "fE": False,
                "fFp": {"powerSportEnabled": "true"},
                "fSP": {
                    "additionalQueryParams": {"isMoreOptionsTileEnabled": "true"},
                    "channel": "Mobile",
                    "displayGuidedNav": "true",
                    "page": "1",
                    "pageType": "MobileSearchPage",
                    "prg": "ios",
                    "query": self.search_query,
                    "tenant": "CA_GLASS",
                "iCLS": True,
                "iT": True,
                "p13n": {
                    "page": "1",
                    "reqId": "6E9F7A17-ACE0-4D5F-AEC0-62522C13DB35",
                    "userClientInfo": {"callType": "CLIENT", "deviceType": "IOS"},
                    "userReqInfo": {
                        "refererContext": {"query": self.search_query},
                        "vid": "8B95354D-6FE8-4F18-904F-4ED9AE73EE24",
                "pg": 1,
                "pT": "MobileSearchPage",
                "qy": self.search_query,
                "tempo": {},
                "ten": "CA_GLASS",
                "vr": "v1",

A large portion of these parameters are unknown to myself, such as dGv, fE, and iCLS. If you happen to know what these stand for, feel free to leave a comment.

With all these APIs reverse engineered, I can finally query four grocery stores at once.

What I discovered is that each grocery store returns different product data. Walmart and SaveOn return unit price ($1.37/100g) but Safeway and President's Choice do not. The best way to deal with this is to normalize all the data so it's undifferentiated in our search application.

I wrote a parser that does just that: feed in our raw data and return the "cleaned" data.

Parallel threads

The next problem I encountered was speed. Sending four HTTP requests at once is not fast. So, how can we send them in paralell? ThreadPoolExecutor to the rescue! We can asynchronously execute multiple tasks and fetch the result using futures (also called a promise). It's a way of getting a result from a task that may or may not be executing.

Let's set the scene. We have four stores to search, each having their own function. Some store APIs will be faster than others. What we can do is stick all those functions in a list, and call them as needed. If we're not querying Safeway for example, don't include that function!

Here's an example

    # Set up a list of functions to send requests to
    functions = [

We have our two stores we are querying. Let's now send requests in parallel:

    # Use a ThreadPoolExecutor to send the requests in parallel
    with concurrent.futures.ThreadPoolExecutor() as executor:
        # Start the load operations and mark each future with its function
        future_to_function = {executor.submit(func): func for func in functions}
        results = {}
        for future in concurrent.futures.as_completed(future_to_function):
            func = future_to_function[future]
                result = future.result()
            except Exception as exc:
                print(f"Function {func.__name__} generated an exception: {exc}")
                results[func.__name__] = exc
                print(f"Function {func.__name__} returned result: {result}")
                results[func.__name__] = result

A future object represents the result of an asynchronous operation that has not yet completed. In this case, each future object corresponds to a function that is being executed in a separate thread. By submitting all the functions to the executor, we allow it to manage the creation and management of threads, and ensure that each function runs asynchronously.

Once all the functions have been submitted to the executor, we use the concurrent.futures.as_completed() method to iterate over the futures as they complete. This method returns an iterator that yields completed future objects, allowing us to retrieve the results of each function as they become available.


There is still much to be done (this is always true) but I'm happy with where the project stands. Users can query for items, change stores and sort by prices/name/unit price.

Python init system for Snacklinux

I've been thinking about this for a while. It's not really practical, but just for fun. Essentially rewriting the tools needed for a minimal Linux distro with just the kernel. I found this PyCon presentation(video on Youtube) about this very subject. Unfortunately there's no mention of it past 2006 but oh well. Another use for such a thing would like be similar to Docker but with the build process of SnackLinux. Being able to launch a customizable Python image with custom kernel is complete overkill but that's what makes programming fun I think.

Generating images for your podcast

I listen to Legends of the Old West podcast, it's a western-themed episodic podcast centered around outlaws.The narrator is great, the character actions are descriptive but I'm left wanting more.

With the surge of Stable Diffusion projects, I was inspired to make something AI generated art themed.

What I ended up with is transcribing the podcast audio into text, and then generating images based off of that. Take a look below for an example.

The bulk of the work is done by Vosk, an offline open source speech recognition toolkit. We convert the input MP3 to wav, send it through Vosk and receive a generated JSON output file.

from vosk import Model, KaldiRecognizer
import wave
import json

wav_file = "tk150-split.wav"
model_path = "vosk-model-en-us-0.22"
model = Model(model_path)
wf =, "rb")
rec = KaldiRecognizer(model, wf.getframerate())

text_lst = []

while True:

    data = wf.readframes(4000)
    if len(data) == 0:

    if rec.AcceptWaveform(data):
        word = json.loads(rec.Result())["text"]
        if len(word) > 0:
            print(" sentence...")


if text_lst != 0:
    with open("output-audio-file.txt", "w") as filehandle:
        json.dump(text_lst, filehandle)

With our output transcribed text, we can run it through whatever AI generating service we desire. For the purpose of speed, I used

import requests
import json
import time

# Get lexica AI generated art by prompt
def image(prompt):
    """Serve the image page."""

    results = requests.get("", params={"q": prompt})
    if results.status_code != 200:
        print("Requested URL: %s", results.url)
        print("Content: %s", results.content)
    results = results.json()
    if results and results["images"]:
        response = {
            "src": results["images"][0]["src"],
            "alt": results["images"][0]["prompt"],
        return response
    return {}

# Open output file for reading
with open("output-audio-file.txt", "r") as filehandle:
    output_audio_text = json.load(filehandle)

# Generate images per each value from our transcribed audio
images_list = []
for line in output_audio_text:
    a = image(line)
# Write image lists to file
if images_list != 0:
    with open("images-generated.json", "w") as f:
        json.dump(images_list, f, ensure_ascii=False)

For the demo, I used Flask to create a small web app and serve the images and text. 
Code on Github: podcast-image


Tuya Smart Home Temperature Monitor + Python (tinytuya)

Why look go outside and check the temperature if you can automate it and use Python? I purchased this inexpensive wifi-enabled temperature monitor from Aliexpress for approx $20 CAD. The goal was to log the temperature and humidty throughout the seasons and make a graph at the end of the year.

During the setup of the device, you have to use their cloud smart home app. You can only guess what they do with your data (hey, maybe they’ll see how cold it gets here in winter and feel bad for me). You’ll have to use the app to connect the developer account to your device.

tinytuya is a Python model for interacting with Tuya smart home devices


python -m pip install tinytuya

Scan for devices

python -m tinytuya scan

Developer Account

First things first: create a developer account. Next you’ll have to create a cloud project, link your device and get your secrets. Here’s the steps, copied from:

  1. TUYA ACCOUNT - Set up a Tuya Account (see PDF Instructions):
    • Create a Tuya Developer account on and log in. NOTE: Tuya makes changes to their portal and this process frequently so details may vary. Please create an issue or pull request with screenshots if we need to update these instructions.
    • Click on "Cloud" icon -> "Create Cloud Project"
      1. Remember the "Data Center" you select. This will be used by TinyTuya Wizard (screenshot).
      1. Skip the configuration wizard but remember the Authorization Key: API ID and Secret for below (screenshot).
    • Click on "Cloud" icon -> Select your project -> Devices -> Link Tuya App Account (see screenshot)
    • Click Add App Account (screenshot) and it will display a QR code. Scan the QR code with the Smart Life app on your Phone (see step 1 above) by going to the "Me" tab in the Smart Life app and clicking on the QR code button [..] in the upper right hand corner of the app. When you scan the QR code, it will link all of the devices registered in your Smart Life app into your Tuya IoT project.
    • NO DEVICES? If no devices show up after scanning the QR code, you will need to select a different data center and edit your project (or create a new one) until you see your paired devices from the Smart Life App show up. (screenshot). The data center may not be the most logical. As an example, some in the UK have reported needing to select "Central Europe" instead of "Western Europe".
    • SERVICE API: Under "Service API" ensure these APIs are listed: IoT CoreAuthorization and Smart Home Scene Linkage (NOTE: I didn’t need to add this). To be sure, click subscribe again on every service. Very important: disable popup blockers otherwise subscribing won't work without providing any indication of a failure. Make sure you authorize your Project to use those APIs:
      • Click "Service API" tab
      • Click "Go to Authorize" button
      • Select the API Groups from the dropdown and click Subscribe (screenshot)
  1. WIZARD - Run Setup Wizard:
    • Tuya has changed their data center regions. Make sure you are using the latest version of TinyTuya (v1.2.10 or newer).
    • From your Linux/Mac/Win PC run the TinyTuya Setup Wizard to fetch the Local_Keys for all of your registered devices:
      python -m tinytuya wizard   # use -nocolor for non-ANSI-color terminals e.g. Windows cmd
    • The Wizard will prompt you for the API ID key, API Secret, API Region (cn, us, us-e, eu, eu-w, or in) from your Tuya IoT project as set in Step 3 above.
      • To find those again, go to, choose your project and click Overview
        • API Key: Access ID/Client ID
        • API Secret: Access Secret/Client Secret
    • It will also ask for a sample Device ID. Use one from step 2 above or found in the Device List on your Tuya IoT project.
    • The Wizard will poll the Tuya IoT Cloud Platform and print a JSON list of all your registered devices with the "name", "id" and "key" of your registered device(s). The "key"s in this list are the Devices' Local_Key you will use to access your device.
    • In addition to displaying the list of devices, Wizard will create a local file devices.json that TinyTuya will use to provide additional details for scan results from tinytuya.deviceScan() or when running python -m tinytuya scan. The wizard also creates a local file tuya-raw.json that contains the entire payload from Tuya Cloud.
    • The Wizard will ask if you want to poll all the devices. If you do, it will display the status of all devices on record and create a snapshot.json file with these results.

Once we have our local key, it’s time to get out data points.


Let’s use tinytuya to query our temperature monitor.

DPS explained:

The Tuya devices send back data points (DPS) also called device function points, in a json string. The DPS attributes define the state of the device. Each key in the DPS dictionary refers to key value pair, the key is the DP ID and its value is the dpValue. You can refer to the Tuya developer platform  for definition of function points for the products.

import tinytuya

d = tinytuya.OutletDevice(



{'1': 40, '2': 262, '4': 100, '9': 'c', '10': 1500, '11': -190, '12': 832, '13': 0, '14': 'cancel', '15': 'cancel', '16': 0, '17': 30, '18': 31, '21': True, '23': 0, '24': 0}

What you’ll get back is a nice JSON string of all the available data points. I wasn’t able to find any information on this temperature monitor so here’s what I mapped out



DPS lookup table for Tuya Temperature Monitor

Note: For all temperatures, pad a decimal from the right. 452=45.2

2452Humidity (%)integer
4100Batteryinteger3.7V/500mAh, Built-in battery usage time: ≤4 hours (fully charged)
9cTemperature unitc/f
101500Alarm Temperature upper limitintegerTemperature alarm setting
11-190Alarm Temperature lower limitintegerTemperature alarm setting
121000Humidity upper limitintegerHumidity alarm setting
13244Humidity lower limit (24.4%)integerHumidity alarm setting
14cancelTemperature Alarmon/cancel
15cancelHumidity Alarmon/cancel
160Light intensityintegerLux
1730Temperature reportintegerAlarm duration in seconds
1830Humidity reportintegerAlarm duration in seconds
21trueBattery Switchtrue/falseEnable battery backup
230Temperature calibrationC +/-Manually set positive or negative offset
240Humidity CalibrationC +/-Manually set positive or negative offset

Merge big CSV files with Pandas and Python

I deal with large CSV at work, mostly database dumps. Google Sheets and Numbers/Excel just can't keep up with formula changes for 300k+ lines. What do we do? We use Python. The use case here is replicating a the VLOOKUP function with a left join. We want to get the matching criteria from our referenced CSV file, but only the matching (otherwise that would be a full outer join). Here's what our data looks like: We want to look up the lat/long in our reference file and join on zip code. Here's what that data looks like: Note the matching "zip" column names. Here's the code:

# import pandas
import pandas as pd
# read csv data
df1 = pd.read_csv('locations.csv',dtype={"zip": float})

# Our zipcodes
df2 = pd.read_csv('US_zips.csv')
Left_join = pd.merge(df1, 
                     on ='zip', 
                     how ='left')

Let's take a look:
df1 = pd.read_csv('locations.csv',dtype={"zip": float})
Here we're reading from our list of locations file, the "zip" column is of the data type float
Left_join = pd.merge(df1, 
                     on ='zip', 
                     how ='left')
We have several options on how we'd like to join: {‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’} or by default ‘inner’. If our columns had different names we'd like to join on, we could use:
Left_join = pd.merge(df1, 
                     left_on ='zip', 
                     right_on = 'zip_code',
                     how ='left')
When we run the code, this is our result: In this use case, I was mapping locations for use with and as such it requires latitude and longitude. A few resources haven't been updated in years regarding zip codes, but I was able to find this: You'll find all the zip codes with additional geographical names, great for looking up data. I didn't know much about zip codes (since we have postal codes up here) but I found this tidbit interesting: " ZIP codes are a very messy kind of geography. They were created by the U.S. Postal Service as a tool to help deliver the mail more efficiently. ("ZIP" is actually an acronym for "Zone Improvement Plan", where "Zone" is a reference to the 2-digit postal zones that were used by the post office prior to implementing nationwide ZIP codes back in the early 1960's. Because it is an acronym we always use the uppercase for it.) ZIP codes have been adopted by marketing people and by all kinds of other researchers as a standard geographic area, like a city or a county. We see maps of ZIP codes in telephone books and from commercial vendors that make us think of them as spatially defined areas with precise boundaries, similar to counties. But, from the perspective of the agency that defines them, the U.S. Postal Service, ZIP codes are not and never have been such spatial entities. They are simply categories for grouping mailing addresses. As such, ZIP codes do in most cases resemble spatial areas since they are comprised of spatially clustered street ranges. But not always. In rural areas, ZIP codes can be collections of lines (rural delivery routes) that in reality do no look much like a closed spatial area. In areas where there is no mail delivery (deserts, mountains, lakes, much of Nevada and Utah) ZIP codes are not really defined. You may see maps that show ZIP code boundaries that include such areas, but these are not post-office-defined official definitions. An area will not be assigned a ZIP code until there is a reason for it, i.e. until there needs to be mail delivered there. So the actual definition of a ZIP code "boundary" is quite fuzzy at best, and a purely extrapolated guess (at what it would be if someone were to start receiving mail there) at worst. If you have an application that requires extreme geographic precision, especially in sparsely populated areas, then you need to avoid using ZIP codes. "