tl;dr check out https://grocerygoose.ca
Code: https://github.com/snacsnoc/grocery-app
The price of groceries in Canada has absolutely skyrocketed over the past six months. Everytime I go to the grocery store, I am no longer shocked at the price increase. In fact, it's almost a bit of a game finding something that hasn't gone up.
A great example is Loblaw's No Name potato chips. On sale they were 97 cents, regularly priced at 99 cents. Great value for something so simple. I went to an Independant Grocer two weeks ago, expecting to be shocked but what I received was a heart attack instead. The price increased by 150%! The sale price now is $5.00 for 2 bags.
Absolutely frustrated with running across town to five different grocery stores to get a deal, I decided to get to some hacking. What I wanted to do is create a central search engine, where I can compare unit prices. I also need the ability to change which grocery-chain store I'm querying.
(for info about the API endpoints themselves, see https://github.com/snacsnoc/grocery-app/blob/main/HACKING.md)
President's Choice
This was the easiest API to reverse engineer of all of the grocery stores I attempted. Intercepting the HTTP traffic from the PC Mobile app using a proxy (like BurpSuite) makes recreating API requests trivial.
No authentication nessesary, but you will have to grap the X-Apikey
from the request.
Example request:
curl -X POST \
https://api.pcexpress.ca/product-facade/v3/products/search \
-H 'Host: api.pcexpress.ca' \
-H 'Accept: application/json, text/plain, */*' \
-H 'Site-Banner: superstore' \
-H 'X-Apikey: 1im1hL52q9xvta16GlSdYDsTsG0dmyhF' \
-H 'Content-Type: application/json' \
-H 'Origin: https://www.realcanadiansuperstore.ca' \
-d '{
"pagination": {"from": 0, "size": 48},
"banner": "superstore",
"cartId": "228fb500-b46f-43d2-a6c4-7b498d5be8a9",
"lang": "en",
"date": "05122022",
"storeId": "your_store_number_here",
"pcId": false,
"pickupType": "STORE",
"offerType": "ALL",
"term": "your_search_query_here",
"userData": {
"domainUserId": "b3a34376-3ccf-4932-8816-7017bd33f2fc",
"sessionId": "5580cec2-5622-4b34-8491-d94f9dd48480"
}
}'
But how can we search specific stores? Thankfully the iOS PC Express mobile app has a flyer search functionality. Even better, it's an easy REST API with an included private API key. This means we won't have to worry about authentication by the user, in regards to logging into the mobile app.
With this flyer search, we can search for Loblaw store IDs by postal code. Just what we need to put together a universal grocery store search ;)
SaveOn Foods
Nothing much to say here, I used the same method for capturing web traffic and recreated the request in Python. Again, no authentication from the front end needed to query these APIs.
Example store query:
curl 'https://storefrontgateway.saveonfoods.com/api/stores/{store_number}/preview?popularTake=30&q={search_query}' \
-H 'X-Correlation-Id: b0bb5f7c-5c00-4cac-ae8a-f34712d0daad' \
-H 'X-Shopping-Mode: 11111111-1111-1111-1111-111111111111' \
-H 'X-Site-Host: https://www.saveonfoods.com' \
-H 'Sec-Ch-Ua: 1' \
-H 'Client-Route-Id: 26186555-b0d7-4251-91e1-fca38fd364aa' \
-H 'Sec-Ch-Ua-Mobile: 1' \
-H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36' \
-H 'Sec-Ch-Ua-Platform: 1' \
-H 'Sec-Fetch-Site: same-site' \
-H 'Sec-Fetch-Mode: cors' \
-H 'Sec-Fetch-Dest: empty' \
-H 'Origin: https://www.saveonfoods.com' \
-H 'Accept: application/json; charset=utf-8'
Safeway
Safeway doesn't offer online grocery delivery in all cities, just Montreal, Toronto and Vancouver. They launched a relatively new service, Voila.ca, for grocery delivery. Unfortunately, I cannot say if these prices returned by a query are applicable to other stores outside those cities.
Example request:
curl 'https://voila.ca/api/v5/products/search?limit=5&offset=0&sort=favorite&term=<SEARCH_QUERY>' \
-H 'Sec-Ch-Ua: 1' \
-H 'Client-Route-Id: 26186555-b0d7-4251-91e1-fca38fd364aa' \
-H 'Sec-Ch-Ua-Mobile: 1' \
-H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36' \
-H 'Sec-Ch-Ua-Platform: 1' \
-H 'Sec-Fetch-Site: same-origin' \
-H 'Sec-Fetch-Mode: cors' \
-H 'Sec-Fetch-Dest: empty' \
-H 'Origin: https://voila.ca' \
-H 'Accept: application/json; charset=utf-8' \
-H 'Cookie: VISITORID=NzEyMmMzZTEtOTYzNy00MmIwLWI2NTAtNjY0NjBlZWVhOTVjOjE2NzAyMzA1NDM2NjE=; global_sid=LvSLAl2jV2YrN3AeAIbMt_Tl8DWedrNo3lJ59CxyIMI0NeYPYfzDxY2UP7FJhEdl5xSWPxf6uvxynINrmMq5p1agATEZlVMM'
Walmart
Oh boy, this was took a bit of work to recreate the request. All the other APIs were a simple POST or GET request, but not Walmart. I can only guess why they chose to use GraphQL over other common web service architechtures. Walmart routinely has the lowest prices, so this was something I wanted to attack.
Here's what our request body looks like:
walmart_data_body = {
"operationName": "getPreso",
"query": 'query getPreso($qy: String, $cId: String, $miPr: String, $mxPr: String, $srt: Sort, $ft: String, $intS: IntentSource, $pg: Int, $ten: String!, $pT: String!, $gn: Boolean, $pos: Int, $sT: String, $sQ: String, $rS: String, $sp: Boolean, $aO: AffinityOverride, $dGv: Boolean, $pap: String, $ptss: String, $bSId: String, $ps: Int, $fSP: JSON, $fFp: JSON, $dId: String, $iCLS: Boolean! = true, $aQP: JSON, $vr: String, $fE: Boolean! = false, $iT: Boolean! = false, $tempo: JSON, $p13n: JSON) {\n search(\n query: $qy\n prg: ios\n cat_id: $cId\n min_price: $miPr\n max_price: $mxPr\n sort: $srt\n facet: $ft\n intentSource: $intS\n page: $pg\n tenant: $ten\n channel: "Mobile"\n pageType: $pT\n guided_nav: $gn\n pos: $pos\n s_type: $sT\n src_query: $sQ\n recall_set: $rS\n spelling: $sp\n affinityOverride: $aO\n displayGuidedNav: $dGv\n pap: $pap\n ptss: $ptss\n ps: $ps\n _be_shelf_id: $bSId\n dealsId: $dId\n additionalQueryParams: $aQP\n ) {\n __typename\n query\n searchResult {\n __typename\n ...SearchResultFragment\n }\n }\n contentLayout(\n channel: "Mobile"\n pageType: $pT\n tenant: $ten\n version: $vr\n searchArgs: {query: $qy, cat_id: $cId, facet: $ft, _be_shelf_id: $bSId, prg: ios}\n ) @include(if: $iCLS) {\n __typename\n modules(p13n: $p13n, tempo: $tempo) {\n __typename\n schedule {\n __typename\n priority\n }\n name\n version\n type\n moduleId\n matchedTrigger {\n __typename\n pageId\n zone\n inheritable\n }\n triggers @include(if: $iT) {\n __typename\n zone\n pageId\n inheritable\n }\n configs {\n __typename\n ... [TRIMMED]...
"variables": {
"aQP": {"isMoreOptionsTileEnabled": "true"},
"dGv": True,
"fE": False,
"fFp": {"powerSportEnabled": "true"},
"fSP": {
"additionalQueryParams": {"isMoreOptionsTileEnabled": "true"},
"channel": "Mobile",
"displayGuidedNav": "true",
"page": "1",
"pageType": "MobileSearchPage",
"prg": "ios",
"query": self.search_query,
"tenant": "CA_GLASS",
},
"iCLS": True,
"iT": True,
"p13n": {
"page": "1",
"reqId": "6E9F7A17-ACE0-4D5F-AEC0-62522C13DB35",
"userClientInfo": {"callType": "CLIENT", "deviceType": "IOS"},
"userReqInfo": {
"refererContext": {"query": self.search_query},
"vid": "8B95354D-6FE8-4F18-904F-4ED9AE73EE24",
},
},
"pg": 1,
"pT": "MobileSearchPage",
"qy": self.search_query,
"tempo": {},
"ten": "CA_GLASS",
"vr": "v1",
},
A large portion of these parameters are unknown to myself, such as dGv
, fE
, and iCLS
. If you happen to know what these stand for, feel free to leave a comment.
With all these APIs reverse engineered, I can finally query four grocery stores at once.
What I discovered is that each grocery store returns different product data. Walmart and SaveOn return unit price ($1.37/100g) but Safeway and President's Choice do not. The best way to deal with this is to normalize all the data so it's undifferentiated in our search application.
I wrote a parser that does just that: feed in our raw data and return the "cleaned" data.
Parallel threads
The next problem I encountered was speed. Sending four HTTP requests at once is not fast. So, how can we send them in paralell? ThreadPoolExecutor
to the rescue! We can asynchronously execute multiple tasks and fetch the result using futures (also called a promise). It's a way of getting a result from a task that may or may not be executing.
Let's set the scene. We have four stores to search, each having their own function. Some store APIs will be faster than others. What we can do is stick all those functions in a list, and call them as needed. If we're not querying Safeway for example, don't include that function!
Here's an example
# Set up a list of functions to send requests to
functions = [
products_data.query_saveon,
products_data.query_pc,
]
We have our two stores we are querying. Let's now send requests in parallel:
# Use a ThreadPoolExecutor to send the requests in parallel
with concurrent.futures.ThreadPoolExecutor() as executor:
# Start the load operations and mark each future with its function
future_to_function = {executor.submit(func): func for func in functions}
results = {}
for future in concurrent.futures.as_completed(future_to_function):
func = future_to_function[future]
try:
result = future.result()
except Exception as exc:
print(f"Function {func.__name__} generated an exception: {exc}")
results[func.__name__] = exc
else:
print(f"Function {func.__name__} returned result: {result}")
results[func.__name__] = result
A future object represents the result of an asynchronous operation that has not yet completed. In this case, each future object corresponds to a function that is being executed in a separate thread. By submitting all the functions to the executor, we allow it to manage the creation and management of threads, and ensure that each function runs asynchronously.
Once all the functions have been submitted to the executor, we use the concurrent.futures.as_completed() method to iterate over the futures as they complete. This method returns an iterator that yields completed future objects, allowing us to retrieve the results of each function as they become available.
Next
There is still much to be done (this is always true) but I'm happy with where the project stands. Users can query for items, change stores and sort by prices/name/unit price.
Attachment | Size |
---|---|
IMG_7A0A29636D5B-1.jpeg | 698.26 KB |