Step 5 Downloading Data
APIs help you collect and analyze data about your products or related categories across various distribution channels. In this section we'll discuss how that data gets delivered and how you can download it.
Delivery Process
Walmart provides API requested data feeds based on the agreed Target Refresh Time between Walmart and the supplier and via four category endpoints: Snapshot, History, Status, and Incremental.
Walmart uses Cloud Storage buckets to deliver data feeds over encrypted Signed URLS. Signed URLs enable users to authenticate access to secured information using query strings, rather than additional credentials.
The following data delivery pipeline should be developed for sharing and receiving data:
-
After verifying access and after new releases, history feeds will be available. Using history endpoints consumers/suppliers should consume the history feeds.
Note: Consuming the history feeds is a necessary and mandatory step, before consuming any incremental feeds, to maintain accuracy of the historical data.
-
Once the history data is consumed, the incremental and snapshot feeds are available starting the next day.
-
Consumer/suppliers should verify the status of every API using the Status API Endpoint before calling. Based on the response status of the feeds, consumers/suppliers can decide which feed(s) to process. If a feed is available, the status will return as "available", otherwise it will be returned blank.
-
After verifying availability of the feed, consumers will be able to invoke (call) APIs to access and download the data.
Downloading Data
To learn how to download the data, watch the video below or continue reading. The download commands are included below.
Large split files
Please be aware that Walmart may split very large files into multiple downloads. This is usually the case with history runs, which can be up to 100 GB in size.
When files are split into multiple smaller files, you will receive multiple Signed URLs from which to download.
Once you receive the API response, you can generate an orc or parquet file by:
- Copying the full download URL from the data feed API response (see example below).
Response Download URL Example:
{ ... ... ... "downloadUrls" : [https://storage.googleapis.com/examplebucket/cat.jpeg?XGoogAlgorithm=GOOG4RA256&XGoogCredential=example%e] }
-
Open a terminal and run one of the following commands (see example below, select the file format you requested when initially onboarding):
-
For orc: wget -o 'downloadURL'.orc
-
For parquet: wget -o 'downloadURL'.parquet
-
Generate File Command Example:
wget -o https://storage.googleapis.com/examplebucket/cat.jpeg?XGoogAlgorithm=GOOG4RA256&XGoogCredential=example%eitemattributes.orc
If successful, your desired format file will be downloaded and ready for any further analysis/processing.
Please remember:
-
The incremental and history feeds are maintained for a maximum of 45 days.
-
The Signed URLs are activated with a Time-to-Live (TTL) of 10 mins. Once expired, new sessions are not allowed; however, existing sessions will not be stopped while downloading is in progress.
-
All endpoints are governed by a limited set of transactions per endpoint (see Throttling).
©️Walmart | All Rights Reserved | Confidential
Updated 14 days ago