Where does Brickset's data come from?
Posted by Huw,
A couple of weeks ago, after I'd added some links to other websites, I said I would create a diagram showing where Brickset gets its data. I've now finished the first draft and it's available for your perusal as a PDF.
(Update December 2019: the document needs updating but I have neither a Visio license or the original source document to be able to do so!)
It may surprise you to see just how little is actually maintained locally. Of course the core of the database -- information about sets, collected and curated since 1997 is -- but almost all the peripheral data is imported from elsewhere. For simplicity, many largely static lookup tables that support the main data sets, such as the list of currencies haven't been included and neither has user-generated data such as reviews and collections.
The diagram shows some 'data flows' that don't actually go anywhere near the Brickset database but appear to the user as if they have come from it, such as Rebrickable inventories.
I don't suppose this will stem the number of emails we receive requesting that we add this minifig to that set, or telling us that inventories are incomplete, but I live in hope...
6 likes
26 comments on this article
great work!
Very nice @Huw. It is not very ofter a developer documents willingly :)
I'm not a developer, but I work in IT in the NHS and to have this level of sharing and integration of data at this speed in my role would be a dream come true! - This is one of my favourite web sites, not just because I like LEGO, but because the site is so good!! Cheers Huw
^ Thanks!
It's really a great work! Managing information is not that easy!
This is so elegant! Your site is an example how different internet data can be combined and presented to a wider audience. Some other industries could learn a thing from your network, I guess. Thanks for all your effort and good work!
Flowcharts are cool. Nice, easy to understand diagram, Huw.
Good job!
This is very nice. I was very curious how this site works under the hood. Great job putting all of this together in one place in such a useful system
I think I want that printed up on a Brickset.com t-shirt!
Interesting chart. Could you tell me where in the menu to find the PaB info? Also, are you considering integrating http://www.brickbuildr.com/view/pab/ as well, if that's even possible?
And where is website 1000steine.de where you store your images? ;)
^^ PaB availability is shown on individual parts details pages, such as this one, http://brickset.com/parts/300101, on the right hand side. I suspect Brickbuildr duplicates Wall of Bricks data to some extent but if there's an API or a data export function, and the site owner wants to collaborate, I'd be happy to integrate it too.
^ The image repository is not really part of the database as such and thus not on the diagram but it is of course an extremely important part of the site and I am grateful to Rene at 1000Stene for providing the hosting and bandwidth.
very nice makes me like this site even more (if that were possible.)
Very nice Huw, is there another one showing which sites pull data from Brickset?
Very cool.
Interesting diagram. Love the site, thanks for your work. I always click through when I buy from Lego or Amazon. I'm still hoping we can get a time stamp on the Amazon discount page like we used to have? I know you said you were adding one, but I've never seen it. This is the page where I'd like to see the stamp: http://brickset.com/buy/vendor-amazon/order-percentdiscount/country-us/mycollection-Wanted
Huw, you put Brickimedia.com instead of Brickimedia.org. :O No big deal, lol. :P Great PDF though! I myself didn't know that so many third parties were involved. The only thing it doesn't include is when we manually add information, which I do kinda frequently. :P
It looks like brickset is the center of the Lego universe... :P :D
^^ Sorry, I'll get that corrected.
The manually maintained information is that in the orange circles. The effort we all put in to maintaining it should not be underestimated, should it!
@OTISsoft, no, that would be a good addition, but I'm not sure I know all that do.
Great work Huw.
@bjtpro, the date stamp isn't on that page, because it's generic to a number of suppliers and there's no obvious place to put it, but it is on the Amazon price comparison page, http://brickset.com/buy/amazon, now.
Very cool Huw. Great to be able to see (and for you to share) the level of complexity that you deal with on this site so that it is understood the work that goes into maintaining the site and database. Keep up the great work!
Very nice Huw!! I'm a datawarehouse developer here in the states and actually love to create Visio's of data movement processes I create. I love the behind the scenes look at processes as it gives you a different perspective and better picture of the complexity and/or simplicity. That is very cool. Thanks for everything you do! This has been my favorite site for the Lego addiction.
^^ & ^ Thanks!
Yes, as noted above I will change for in the next version.