Search multiple grocery stores at once

tl;dr check out https://grocerygoose.ca

Code: https://github.com/snacsnoc/grocery-app

The price of groceries in Canada has absolutely skyrocketed over the past six months. Everytime I go to the grocery store, I am no longer shocked at the price increase. In fact, it's almost a bit of a game finding something that hasn't gone up.

A great example is Loblaw's No Name potato chips. On sale they were 97 cents, regularly priced at 99 cents. Great value for something so simple. I went to an Independant Grocer two weeks ago, expecting to be shocked but what I received was a heart attack instead. The price increased by 150%! The sale price now is $5.00 for 2 bags.

Absolutely frustrated with running across town to five different grocery stores to get a deal, I decided to get to some hacking. What I wanted to do is create a central search engine, where I can compare unit prices. I also need the ability to change which grocery-chain store I'm querying.

(for info about the API endpoints themselves, see https://github.com/snacsnoc/grocery-app/blob/main/HACKING.md)

President's Choice

This was the easiest API to reverse engineer of all of the grocery stores I attempted. Intercepting the HTTP traffic from the PC Mobile app using a proxy (like BurpSuite) makes recreating API requests trivial.

No authentication nessesary, but you will have to grap the X-Apikey from the request.

Example request:

curl -X POST \
  https://api.pcexpress.ca/product-facade/v3/products/search \
  -H 'Host: api.pcexpress.ca' \
  -H 'Accept: application/json, text/plain, */*' \
  -H 'Site-Banner: superstore' \
  -H 'X-Apikey: 1im1hL52q9xvta16GlSdYDsTsG0dmyhF' \
  -H 'Content-Type: application/json' \
  -H 'Origin: https://www.realcanadiansuperstore.ca' \
  -d '{
        "pagination": {"from": 0, "size": 48},
        "banner": "superstore",
        "cartId": "228fb500-b46f-43d2-a6c4-7b498d5be8a9",
        "lang": "en",
        "date": "05122022",
        "storeId": "your_store_number_here",
        "pcId": false,
        "pickupType": "STORE",
        "offerType": "ALL",
        "term": "your_search_query_here",
        "userData": {
            "domainUserId": "b3a34376-3ccf-4932-8816-7017bd33f2fc",
            "sessionId": "5580cec2-5622-4b34-8491-d94f9dd48480"
        }
    }'

But how can we search specific stores? Thankfully the iOS PC Express mobile app has a flyer search functionality. Even better, it's an easy REST API with an included private API key. This means we won't have to worry about authentication by the user, in regards to logging into the mobile app.
 

mobile flyer search


With this flyer search, we can search for Loblaw store IDs by postal code. Just what we need to put together a universal grocery store search ;)

SaveOn Foods

Nothing much to say here, I used the same method for capturing web traffic and recreated the request in Python. Again, no authentication from the front end needed to query these APIs.

Example store query:

curl 'https://storefrontgateway.saveonfoods.com/api/stores/{store_number}/preview?popularTake=30&q={search_query}' \
  -H 'X-Correlation-Id: b0bb5f7c-5c00-4cac-ae8a-f34712d0daad' \
  -H 'X-Shopping-Mode: 11111111-1111-1111-1111-111111111111' \
  -H 'X-Site-Host: https://www.saveonfoods.com' \
  -H 'Sec-Ch-Ua: 1' \
  -H 'Client-Route-Id: 26186555-b0d7-4251-91e1-fca38fd364aa' \
  -H 'Sec-Ch-Ua-Mobile: 1' \
  -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36' \
  -H 'Sec-Ch-Ua-Platform: 1' \
  -H 'Sec-Fetch-Site: same-site' \
  -H 'Sec-Fetch-Mode: cors' \
  -H 'Sec-Fetch-Dest: empty' \
  -H 'Origin: https://www.saveonfoods.com' \
  -H 'Accept: application/json; charset=utf-8'

Safeway

Safeway doesn't offer online grocery delivery in all cities, just Montreal, Toronto and Vancouver. They launched a relatively new service, Voila.ca, for grocery delivery. Unfortunately, I cannot say if these prices returned by a query are applicable to other stores outside those cities.

Example request:

curl 'https://voila.ca/api/v5/products/search?limit=5&offset=0&sort=favorite&term=<SEARCH_QUERY>' \
  -H 'Sec-Ch-Ua: 1' \
  -H 'Client-Route-Id: 26186555-b0d7-4251-91e1-fca38fd364aa' \
  -H 'Sec-Ch-Ua-Mobile: 1' \
  -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36' \
  -H 'Sec-Ch-Ua-Platform: 1' \
  -H 'Sec-Fetch-Site: same-origin' \
  -H 'Sec-Fetch-Mode: cors' \
  -H 'Sec-Fetch-Dest: empty' \
  -H 'Origin: https://voila.ca' \
  -H 'Accept: application/json; charset=utf-8' \
  -H 'Cookie: VISITORID=NzEyMmMzZTEtOTYzNy00MmIwLWI2NTAtNjY0NjBlZWVhOTVjOjE2NzAyMzA1NDM2NjE=; global_sid=LvSLAl2jV2YrN3AeAIbMt_Tl8DWedrNo3lJ59CxyIMI0NeYPYfzDxY2UP7FJhEdl5xSWPxf6uvxynINrmMq5p1agATEZlVMM'

Walmart

Oh boy, this was took a bit of work to recreate the request. All the other APIs were a simple POST or GET request, but not Walmart. I can only guess why they chose to use GraphQL over other common web service architechtures. Walmart routinely has the lowest prices, so this was something I wanted to attack.

Here's what our request body looks like:

        walmart_data_body = {
            "operationName": "getPreso",
            "query": 'query getPreso($qy: String, $cId: String, $miPr: String, $mxPr: String, $srt: Sort, $ft: String, $intS: IntentSource, $pg: Int, $ten: String!, $pT: String!, $gn: Boolean, $pos: Int, $sT: String, $sQ: String, $rS: String, $sp: Boolean, $aO: AffinityOverride, $dGv: Boolean, $pap: String, $ptss: String, $bSId: String, $ps: Int, $fSP: JSON, $fFp: JSON, $dId: String, $iCLS: Boolean! = true, $aQP: JSON, $vr: String, $fE: Boolean! = false, $iT: Boolean! = false, $tempo: JSON, $p13n: JSON) {\n  search(\n    query: $qy\n    prg: ios\n    cat_id: $cId\n    min_price: $miPr\n    max_price: $mxPr\n    sort: $srt\n    facet: $ft\n    intentSource: $intS\n    page: $pg\n    tenant: $ten\n    channel: "Mobile"\n    pageType: $pT\n    guided_nav: $gn\n    pos: $pos\n    s_type: $sT\n    src_query: $sQ\n    recall_set: $rS\n    spelling: $sp\n    affinityOverride: $aO\n    displayGuidedNav: $dGv\n    pap: $pap\n    ptss: $ptss\n    ps: $ps\n    _be_shelf_id: $bSId\n    dealsId: $dId\n    additionalQueryParams: $aQP\n  ) {\n    __typename\n    query\n    searchResult {\n      __typename\n      ...SearchResultFragment\n    }\n  }\n  contentLayout(\n    channel: "Mobile"\n    pageType: $pT\n    tenant: $ten\n    version: $vr\n    searchArgs: {query: $qy, cat_id: $cId, facet: $ft, _be_shelf_id: $bSId, prg: ios}\n  ) @include(if: $iCLS) {\n    __typename\n    modules(p13n: $p13n, tempo: $tempo) {\n      __typename\n      schedule {\n        __typename\n        priority\n      }\n      name\n      version\n      type\n      moduleId\n      matchedTrigger {\n        __typename\n        pageId\n        zone\n        inheritable\n      }\n      triggers @include(if: $iT) {\n        __typename\n        zone\n        pageId\n        inheritable\n      }\n      configs {\n        __typename\n        ... [TRIMMED]...
            "variables": {
                "aQP": {"isMoreOptionsTileEnabled": "true"},
                "dGv": True,
                "fE": False,
                "fFp": {"powerSportEnabled": "true"},
                "fSP": {
                    "additionalQueryParams": {"isMoreOptionsTileEnabled": "true"},
                    "channel": "Mobile",
                    "displayGuidedNav": "true",
                    "page": "1",
                    "pageType": "MobileSearchPage",
                    "prg": "ios",
                    "query": self.search_query,
                    "tenant": "CA_GLASS",
                },
                "iCLS": True,
                "iT": True,
                "p13n": {
                    "page": "1",
                    "reqId": "6E9F7A17-ACE0-4D5F-AEC0-62522C13DB35",
                    "userClientInfo": {"callType": "CLIENT", "deviceType": "IOS"},
                    "userReqInfo": {
                        "refererContext": {"query": self.search_query},
                        "vid": "8B95354D-6FE8-4F18-904F-4ED9AE73EE24",
                    },
                },
                "pg": 1,
                "pT": "MobileSearchPage",
                "qy": self.search_query,
                "tempo": {},
                "ten": "CA_GLASS",
                "vr": "v1",
            },

A large portion of these parameters are unknown to myself, such as dGv, fE, and iCLS. If you happen to know what these stand for, feel free to leave a comment.

With all these APIs reverse engineered, I can finally query four grocery stores at once.

What I discovered is that each grocery store returns different product data. Walmart and SaveOn return unit price ($1.37/100g) but Safeway and President's Choice do not. The best way to deal with this is to normalize all the data so it's undifferentiated in our search application.

I wrote a parser that does just that: feed in our raw data and return the "cleaned" data.

Parallel threads

The next problem I encountered was speed. Sending four HTTP requests at once is not fast. So, how can we send them in paralell? ThreadPoolExecutor to the rescue! We can asynchronously execute multiple tasks and fetch the result using futures (also called a promise). It's a way of getting a result from a task that may or may not be executing.

Let's set the scene. We have four stores to search, each having their own function. Some store APIs will be faster than others. What we can do is stick all those functions in a list, and call them as needed. If we're not querying Safeway for example, don't include that function!

Here's an example

    # Set up a list of functions to send requests to
    functions = [
        products_data.query_saveon,
        products_data.query_pc,
    ]

We have our two stores we are querying. Let's now send requests in parallel:

    # Use a ThreadPoolExecutor to send the requests in parallel
    with concurrent.futures.ThreadPoolExecutor() as executor:
        # Start the load operations and mark each future with its function
        future_to_function = {executor.submit(func): func for func in functions}
        results = {}
        for future in concurrent.futures.as_completed(future_to_function):
            func = future_to_function[future]
            try:
                result = future.result()
            except Exception as exc:
                print(f"Function {func.__name__} generated an exception: {exc}")
                results[func.__name__] = exc
            else:
                print(f"Function {func.__name__} returned result: {result}")
                results[func.__name__] = result

A future object represents the result of an asynchronous operation that has not yet completed. In this case, each future object corresponds to a function that is being executed in a separate thread. By submitting all the functions to the executor, we allow it to manage the creation and management of threads, and ensure that each function runs asynchronously.

Once all the functions have been submitted to the executor, we use the concurrent.futures.as_completed() method to iterate over the futures as they complete. This method returns an iterator that yields completed future objects, allowing us to retrieve the results of each function as they become available.

Next

There is still much to be done (this is always true) but I'm happy with where the project stands. Users can query for items, change stores and sort by prices/name/unit price.

Python init system for Snacklinux

I've been thinking about this for a while. It's not really practical, but just for fun. Essentially rewriting the tools needed for a minimal Linux distro with just the kernel. I found this PyCon presentation(video on Youtube) about this very subject. Unfortunately there's no mention of it past 2006 but oh well. Another use for such a thing would like be similar to Docker but with the build process of SnackLinux. Being able to launch a customizable Python image with custom kernel is complete overkill but that's what makes programming fun I think.

Generating images for your podcast

I listen to Legends of the Old West podcast, it's a western-themed episodic podcast centered around outlaws.The narrator is great, the character actions are descriptive but I'm left wanting more.

With the surge of Stable Diffusion projects, I was inspired to make something AI generated art themed.

What I ended up with is transcribing the podcast audio into text, and then generating images based off of that. Take a look below for an example.


The bulk of the work is done by Vosk, an offline open source speech recognition toolkit. We convert the input MP3 to wav, send it through Vosk and receive a generated JSON output file.

from vosk import Model, KaldiRecognizer
import wave
import json


wav_file = "tk150-split.wav"
model_path = "vosk-model-en-us-0.22"
model = Model(model_path)
wf = wave.open(wav_file, "rb")
rec = KaldiRecognizer(model, wf.getframerate())
rec.SetWords(True)


text_lst = []


while True:

    data = wf.readframes(4000)
    if len(data) == 0:
        break

    if rec.AcceptWaveform(data):
        word = json.loads(rec.Result())["text"]
        if len(word) > 0:
            text_lst.append(word)
            print("....new sentence...")

    else:
        print("....processing.......")


if text_lst != 0:
    with open("output-audio-file.txt", "w") as filehandle:
        json.dump(text_lst, filehandle)

With our output transcribed text, we can run it through whatever AI generating service we desire. For the purpose of speed, I used Lexica.art.

import requests
import json
import time

# Get lexica AI generated art by prompt
def image(prompt):
    """Serve the image page."""

    results = requests.get("https://lexica.art/api/v1/search", params={"q": prompt})
    if results.status_code != 200:
        print("Requested URL: %s", results.url)
        print("Content: %s", results.content)
        results.raise_for_status()
    results = results.json()
    if results and results["images"]:
        response = {
            "src": results["images"][0]["src"],
            "alt": results["images"][0]["prompt"],
        }
        return response
    return {}


# Open output file for reading
with open("output-audio-file.txt", "r") as filehandle:
    output_audio_text = json.load(filehandle)

# Generate images per each value from our transcribed audio
images_list = []
for line in output_audio_text:
    a = image(line)
    images_list.append(a)
    time.sleep(1)
# Write image lists to file
if images_list != 0:
    with open("images-generated.json", "w") as f:
        json.dump(images_list, f, ensure_ascii=False)

For the demo, I used Flask to create a small web app and serve the images and text. 
Code on Github: podcast-image

Tags

Tuya Smart Home Temperature Monitor + Python (tinytuya)

Why look go outside and check the temperature if you can automate it and use Python? I purchased this inexpensive wifi-enabled temperature monitor from Aliexpress for approx $20 CAD. The goal was to log the temperature and humidty throughout the seasons and make a graph at the end of the year.

During the setup of the device, you have to use their cloud smart home app. You can only guess what they do with your data (hey, maybe they’ll see how cold it gets here in winter and feel bad for me). You’ll have to use the app to connect the developer account to your device.

tinytuya is a Python model for interacting with Tuya smart home devices

Install

python -m pip install tinytuya

Scan for devices

python -m tinytuya scan

Developer Account

First things first: create a developer account. Next you’ll have to create a cloud project, link your device and get your secrets. Here’s the steps, copied from: https://pypi.org/project/tinytuya/

  1. TUYA ACCOUNT - Set up a Tuya Account (see PDF Instructions):
    • Create a Tuya Developer account on iot.tuya.com and log in. NOTE: Tuya makes changes to their portal and this process frequently so details may vary. Please create an issue or pull request with screenshots if we need to update these instructions.
    • Click on "Cloud" icon -> "Create Cloud Project"
      1. Remember the "Data Center" you select. This will be used by TinyTuya Wizard (screenshot).
      1. Skip the configuration wizard but remember the Authorization Key: API ID and Secret for below (screenshot).
    • Click on "Cloud" icon -> Select your project -> Devices -> Link Tuya App Account (see screenshot)
    • Click Add App Account (screenshot) and it will display a QR code. Scan the QR code with the Smart Life app on your Phone (see step 1 above) by going to the "Me" tab in the Smart Life app and clicking on the QR code button [..] in the upper right hand corner of the app. When you scan the QR code, it will link all of the devices registered in your Smart Life app into your Tuya IoT project.
    • NO DEVICES? If no devices show up after scanning the QR code, you will need to select a different data center and edit your project (or create a new one) until you see your paired devices from the Smart Life App show up. (screenshot). The data center may not be the most logical. As an example, some in the UK have reported needing to select "Central Europe" instead of "Western Europe".
    • SERVICE API: Under "Service API" ensure these APIs are listed: IoT CoreAuthorization and Smart Home Scene Linkage (NOTE: I didn’t need to add this). To be sure, click subscribe again on every service. Very important: disable popup blockers otherwise subscribing won't work without providing any indication of a failure. Make sure you authorize your Project to use those APIs:
      • Click "Service API" tab
      • Click "Go to Authorize" button
      • Select the API Groups from the dropdown and click Subscribe (screenshot)
  1. WIZARD - Run Setup Wizard:
    • Tuya has changed their data center regions. Make sure you are using the latest version of TinyTuya (v1.2.10 or newer).
    • From your Linux/Mac/Win PC run the TinyTuya Setup Wizard to fetch the Local_Keys for all of your registered devices:
      python -m tinytuya wizard   # use -nocolor for non-ANSI-color terminals e.g. Windows cmd
      
    • The Wizard will prompt you for the API ID key, API Secret, API Region (cn, us, us-e, eu, eu-w, or in) from your Tuya IoT project as set in Step 3 above.
      • To find those again, go to iot.tuya.com, choose your project and click Overview
        • API Key: Access ID/Client ID
        • API Secret: Access Secret/Client Secret
    • It will also ask for a sample Device ID. Use one from step 2 above or found in the Device List on your Tuya IoT project.
    • The Wizard will poll the Tuya IoT Cloud Platform and print a JSON list of all your registered devices with the "name", "id" and "key" of your registered device(s). The "key"s in this list are the Devices' Local_Key you will use to access your device.
    • In addition to displaying the list of devices, Wizard will create a local file devices.json that TinyTuya will use to provide additional details for scan results from tinytuya.deviceScan() or when running python -m tinytuya scan. The wizard also creates a local file tuya-raw.json that contains the entire payload from Tuya Cloud.
    • The Wizard will ask if you want to poll all the devices. If you do, it will display the status of all devices on record and create a snapshot.json file with these results.

Once we have our local key, it’s time to get out data points.

Get DPS

Let’s use tinytuya to query our temperature monitor.

DPS explained:

The Tuya devices send back data points (DPS) also called device function points, in a json string. The DPS attributes define the state of the device. Each key in the DPS dictionary refers to key value pair, the key is the DP ID and its value is the dpValue. You can refer to the Tuya developer platform  for definition of function points for the products.

import tinytuya


d = tinytuya.OutletDevice(
       dev_id='a21a4b4ad3',
       address='192.168.1.137',
       local_key='deadb33f', 
       version=3.4)


print(d.detect_available_dps())

Result:

{'1': 40, '2': 262, '4': 100, '9': 'c', '10': 1500, '11': -190, '12': 832, '13': 0, '14': 'cancel', '15': 'cancel', '16': 0, '17': 30, '18': 31, '21': True, '23': 0, '24': 0}

What you’ll get back is a nice JSON string of all the available data points. I wasn’t able to find any information on this temperature monitor so here’s what I mapped out

Github

Code: https://github.com/snacsnoc/py-tuya-temp

DPS lookup table for Tuya Temperature Monitor

Note: For all temperatures, pad a decimal from the right. 452=45.2

DPSValueDescriptionUnitInfo
160Temperatureinteger
2452Humidity (%)integer
4100Batteryinteger3.7V/500mAh, Built-in battery usage time: ≤4 hours (fully charged)
9cTemperature unitc/f
101500Alarm Temperature upper limitintegerTemperature alarm setting
11-190Alarm Temperature lower limitintegerTemperature alarm setting
121000Humidity upper limitintegerHumidity alarm setting
13244Humidity lower limit (24.4%)integerHumidity alarm setting
14cancelTemperature Alarmon/cancel
15cancelHumidity Alarmon/cancel
160Light intensityintegerLux
1730Temperature reportintegerAlarm duration in seconds
1830Humidity reportintegerAlarm duration in seconds
21trueBattery Switchtrue/falseEnable battery backup
230Temperature calibrationC +/-Manually set positive or negative offset
240Humidity CalibrationC +/-Manually set positive or negative offset
Tags

Merge big CSV files with Pandas and Python

I deal with large CSV at work, mostly database dumps. Google Sheets and Numbers/Excel just can't keep up with formula changes for 300k+ lines. What do we do? We use Python. The use case here is replicating a the VLOOKUP function with a left join. We want to get the matching criteria from our referenced CSV file, but only the matching (otherwise that would be a full outer join). Here's what our data looks like: We want to look up the lat/long in our reference file and join on zip code. Here's what that data looks like: Note the matching "zip" column names. Here's the code:
#!/bin/python3


# import pandas
import pandas as pd
   
# read csv data
df1 = pd.read_csv('locations.csv',dtype={"zip": float})

# Our zipcodes
df2 = pd.read_csv('US_zips.csv')
   
Left_join = pd.merge(df1, 
                     df2, 
                     on ='zip', 
                     how ='left')


Left_join.to_csv('combined-retail.csv')
Let's take a look:
df1 = pd.read_csv('locations.csv',dtype={"zip": float})
Here we're reading from our list of locations file, the "zip" column is of the data type float
Left_join = pd.merge(df1, 
                     df2, 
                     on ='zip', 
                     how ='left')
We have several options on how we'd like to join: {‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’} or by default ‘inner’. If our columns had different names we'd like to join on, we could use:
Left_join = pd.merge(df1, 
                     df2, 
                     left_on ='zip', 
                     right_on = 'zip_code',
                     how ='left')
When we run the code, this is our result: In this use case, I was mapping locations for use with Kepler.gl and as such it requires latitude and longitude. A few resources haven't been updated in years regarding zip codes, but I was able to find this: http://download.geonames.org/export/zip/. You'll find all the zip codes with additional geographical names, great for looking up data. I didn't know much about zip codes (since we have postal codes up here) but I found this tidbit interesting: " ZIP codes are a very messy kind of geography. They were created by the U.S. Postal Service as a tool to help deliver the mail more efficiently. ("ZIP" is actually an acronym for "Zone Improvement Plan", where "Zone" is a reference to the 2-digit postal zones that were used by the post office prior to implementing nationwide ZIP codes back in the early 1960's. Because it is an acronym we always use the uppercase for it.) ZIP codes have been adopted by marketing people and by all kinds of other researchers as a standard geographic area, like a city or a county. We see maps of ZIP codes in telephone books and from commercial vendors that make us think of them as spatially defined areas with precise boundaries, similar to counties. But, from the perspective of the agency that defines them, the U.S. Postal Service, ZIP codes are not and never have been such spatial entities. They are simply categories for grouping mailing addresses. As such, ZIP codes do in most cases resemble spatial areas since they are comprised of spatially clustered street ranges. But not always. In rural areas, ZIP codes can be collections of lines (rural delivery routes) that in reality do no look much like a closed spatial area. In areas where there is no mail delivery (deserts, mountains, lakes, much of Nevada and Utah) ZIP codes are not really defined. You may see maps that show ZIP code boundaries that include such areas, but these are not post-office-defined official definitions. An area will not be assigned a ZIP code until there is a reason for it, i.e. until there needs to be mail delivered there. So the actual definition of a ZIP code "boundary" is quite fuzzy at best, and a purely extrapolated guess (at what it would be if someone were to start receiving mail there) at worst. If you have an application that requires extreme geographic precision, especially in sparsely populated areas, then you need to avoid using ZIP codes. "
Tags