Without using a database, how would one store and access Item informations by its GTIN?
We need a folder storage convention that is easy to access without sacrificing performance.
Let say you have a ficticious GTIN number 00123456789012. Then the folder path for this would be: 123/456/789/00123456789012/
Does this kind of convention create a security issue? What if I want my data to be private?
Security can be achieved by adding an API Key to the service, and disabling public access to the cloud bucket. Obscurity is not Security!
A Content Delivery Network (CDN) would go a long way to help increase the performance of Cloud Storage access. AWS also provide CDN through their Cloudfront service.
This section introduce pulling data from various vendors. We segment the Vendor’s storage path to help with data retrieval and/or purge if requested by the Vendor.
Primary
APIs with basically limitless calls and return an image. You can setup job to pre-cache your data based on some GTIN database.
- Syndigo - https://api.syndigo.com/ui/product/?skip=0&take=1
- Kwikee - https://api.kwikee.com/public/v3/data/gtin/%s
- Tesco - https://dev.tescolabs.com/product/?gtin=%s
- EAN Data - https://eandata.com/feed/?v=3&find=ean13&keycode=apikey&mode=json
[x] Open Food Facts - https://(world us).openfoodfacts.org/api/v0/product/%s.json
Secondary
APIs with low daily/monthly limit. These vendors can be use on-demand since it would be costly to use them for pre-cache. Also, some of these do not include image such as USDA and Nutrition APIs.
- DigitEyes - https://www.digit-eyes.com/gtin/v2_0/?upcCode=%s&field_names=all&language=en&app_key=%s&signature=%s
- Google Shopping - https://www.google.com/search?tbm=shop&tbs=vw:l,new:1&q=%s (scrape Google Shopping Web result) Note: seem to be constantly changing and unreliable. May also have algorithm to defeat scraping.
- Search UPC - http://www.searchupc.com/handlers/upcsearch.ashx?request_type=3&upc=%s&access_token=%s
- UPC ItemDB - https://api.upcitemdb.com/prod/trial/lookup?upc=%s
- Barcodeable - https://www.barcodable.com/api/v1/%s/%s
- Walmart - https://api.walmartlabs.com/v1/items?apiKey=%s&upc=%s
- USDA - https://ndb.nal.usda.gov/ndb/search/list?qlookup=%s
- Boycott - https://www.buycott.com/upc/%s, example: https://www.buycott.com/upc/078732004245
- EBAY
- BestBuy
- Amazon
- Target
Please feel free to submit any API or Web scraping integration request. We can discuss in the issue how to integrate them: Primary/Secondary/WebScraping etc…
Possibly integrate proxycrawl
or similar service?
POST or GET to store the GTIN data on AWS S3. POST body will become index.json
and url
query string parameter is downloaded as index.jpg
The optional vendor
parameter identify that this is to store Vendor’s specific data. type
can be media to store the additional media (image/video) in the media/
folder.
What we found during our API Integration with reguard to how others are storing their GTIN data
This project API integration code are written very genericly; in such way, that may violate certain Vendor’s API and Data Usage Policy. This is the reason why we segment Vendor data so User can purge per request of any Vendor. We also take additional step to segment API types, Primary/Secondary, to help User comply with majority of API.
We are not responsible for any mis-uses of Vendor’s API. User of our code must understand, fully comply and responsible for all external Vendor’s API and Data Usage Policy.
See LICENSE file.