Determining the sets you have in a box of random parts
Posted by Huw,
Ben Nicholson (bnic99) is currently in his 3rd year at Newcastle University studying Computer Science. For his dissertation project he has created a piece of software to solve a problem many of us have encountered at one time or another, and he'd appreciate your feedback:
Picture this, you buy a box of random LEGO parts from a second hand seller, and you wish to know what LEGO sets you are now in possession of. Where would you start? What would you do? A large box of parts can be very daunting especially if you don’t know what you are looking for...
This was the situation I was in one day when looking through a box of LEGO at my grandma’s. After a short while of scavenging through various parts I found part 2626 Boat, Bow Brick 6x 6x1 in old light grey, which was only ever available in 2 sets.
After a bit more searching and finding 6104 wing 8X8 in yellow and 30356 wing 6X12, left in old light grey I was reasonably confident that the set it had come in was 7141 Naboo Fighter.
This gave me an idea, would it be possible to program an algorithm that you could enter what parts you find then calculate the chances of which sets you might have.
When it was time to decide what to do for my University dissertation that is what I did, and I have made a system where the user can enter parts using drop down menus to search by type. For example, for part 2626 the user could search by tags for “Wedge”, “Brick”, and “Slope”.
Once selecting the correct part the user will also have to select the colour that part is before adding it.
Once a number of parts have been added users can then click the “show probability” button to get the system’s analysis on what sets are in the box based on what parts they have added.
Viewing the percentages allows users to view a picture of the suggested sets and confirm that it is in the box. Once confirmed the system will make the assumption that all parts found in that set will be used for reconstructing it and take them out of consideration when making new calculations.
I have been testing this system by getting others to try to identify which 10 sets I have broken up and put in to a box. With this I have found that the system is commonly able to identify 80%+ of the sets being tested on in just over an hour, with the sets not found being small polybag type sets containing just a few parts, none of which are rare.
The system works by communicating with the Rebrickable database to gather lists of sets that a part is in once the user has entered one. It will then check if any of these sets have already had one of their parts found and if they have it will update the chance of that set.
The program then calculates the probability of sets being owned proportional to the rarity of parts that have been found contained within it. For example a unique part would give the set it is from a 100% probability of being owned as the part could not have come from any other set, whereas common bricks will only raise the probability of sets a smaller amount.
At present the program only has a small subset of parts to search and add from, this is due to currently having to write them out manually along with a list of tags that the user can search by. In the future I plan to expand this subset so that many more parts are available.
I also plan to rework the user interface to make it more user-friendly for a better experience since the current one is just a prototype, as I was mainly interested in if the maths would work and the sets were able to be identified. Another reason for this interface is because I initially planned on using image recognition as the input device, however was not able to get that working and had to resort to manual input as a back-up.
I am also considering creating a website to provide convenience and easy access for people who wish to use it.
So, my questions to you are:
- Is this an analyser that you would find useful and would use?
- What ways would you like to see this system develop if I continue working on it and make it available for everyone to use?
Any feedback given will help in my dissertation. Thank you!
222 likes
81 comments on this article
1 - I would definitely use this. I'm currently trying to rebuild all my old sets for eBay, and a lot don't have boxes, so it's hard to be sure what I own.
2 - Can you search by part numbers? Or is it just dropdown menus? I use Brickset a lot to identify parts, so I'm used to their categories and naming conventions. I don't know if Rebrickable's database is any different, and it might be quicker to search Brickset, then use the part number in this program.
Oh yes! This would be super handy. I’ve used both brick link and brickset to locate where a part is from, usually I type in a set though with the same piece in another colour and then just go to the other colours the part comes in and compare it to the one I’ve got but anything to aid in set sorting is useful.
For me personally, I'd never use this as I'm not going to buy random boxes of parts in the hopes they'll contain a full set, as usually in my experience sets are sold in 'random bulk parts lots' because bits are missing, or have been used in other projects that have been kept. Nore am I someone whose ever going to disassemble and mix up multiple sets.
That said however, I can see this being very useful to alot of people as there are those that fall into the groups I'm not in. Also parents or teens trying to figure out what parts from their kids/childhood lego boxes go with what sets so they can clear them out/move them on/reassemble them etc.
I'd assume once you've got a probable set, you'd have some option to list the full parts list for that set, and then tick them off as you find them so you do know if you've got a complete or partial set.
This function already exists on Brickset. I am able to look up a part and can change the results so that only sets I own with said part appear. While this is harder to do with older parts (as not all older parts are in Brickset's inventory), I'm still able to do it pretty easily.
But then again, I really know my Lego. I'm able to find part numbers easily, while someone else who buys bricks in bulk at a garage sale or something might not be able to. I can see this being most useful for those who aren't really into Lego, but I am unsure of how you could setup search parameters to make it easy for them to search for parts without using part numbers.
So while I wouldn't use it, I can see others might - I just doubt if it will be worth the effort
This sounds like a fantastic dissertation project, great idea!
I would also definitely make use of a program like this if, like you mention, the UI was a little more conducive to speedy processing on the manual end. @EstragonHelmer's idea above of using design/element IDs could be a helpful addition (though complicated by the occasional subtle changes to parts which result in new IDs - you'd need to decide how precise you want to make the program in light of whether or not users will identify the exact version of a part).
Going it alone to add parts to the database sounds like a very labour-intensive proposition. You may find an active and willing community to help crowdsource a lot of that if the demand for this program is generally high and if you can find a way to implement it.
Best of luck as you finish your dissertation and near completion of your degree!
Does it work with printed parts? They are often unique and could speed thing up considerably.
Having undertaken this type of exercise a couple of times recently with bought loose collections I know that it is a lengthy process even with BrickLink/Brickset and the benefit of experience of the likely rare elements or telltale minifig parts.
Perhaps you should tread carefully with you idea of taking “already identified” parts out of the equation, particularly where the loose collection has multiple sets from the same theme.
The additional benefit from being able to solve this type of problem quickly and in a structured, recorded way is spring boarding from the outcome to BrickLink searches to obtain any missing elements
I would have welcomed this when I was first setting up my Brickset inventory - trying to track down old sets that some of the unusual parts came from. I spent days doing this exact search & eliminate method manually.
I'm also one of those who buys bulk lots and am interested in the source and rarity of some of the parts. It's useful not only to see if I happened upon complete sets but also whether it would be worthwhile to part out the missing bits.
Input is currently only via the dropdowns as it is a prototype system, but definately looking to add more useable ways of adding parts. And searching by part name/number is something I am investigating implimenting.
Printed parts do work, however parts with stickers on them do not.
Personally, I usually use the 'Appears In' option on Bricklink , But I do think this is a good idea!
I start with print and sticker pieces, then minifigs, to get a general idea whats in the box
I have a small enough collection and large amount of LEGO set knowledge, that I'll find little use of this program in the near future.
However, what I think would be cool and beneficial is if the program actively suggested a piece you could find, so that it could narrow down its search faster (Akinator style). The piece in question should be chosen by how unique it is to a particular set, as well as how easy it is for you to spot it (mostly its size).
Combine this with a conveyor belt and that new brick identifier and you've got a nice system...
Slightly off topic, but if there ever was a tool where any 5 year old could find the design number of a piece in the least amount of time (preferably without image recognition), that would be something...
@bnic99, Very interesting idea. Thanks for sharing.
I wonder if this type of problem has been dealt with in the statistics literature or in the statistical CS literature rather than the pure CS corpus.
One flaw in the formulation is this:
@swifty said:
"Perhaps you should tread carefully with you idea of taking “already identified” parts out of the equation, particularly where the loose collection has multiple sets from the same theme."
You have a possible snowballing problem from the human error of mis-confirming sets. Let’s say someone makes a mistake and confirms a set incorrectly. That will then impact the identification of the other sets that, in turn, will influence the identification of more sets and so on. So after mis-confirming set A as set X, instead of identifying sets A, B, C and D, you wind up thinking you have part of sets X, Y, Z and a load of pieces from no definite set.
Sounds interesting, for sure! But I don't think I'd use it, I prefer the additional challenge of using only Bricklink's database to figure out what I have from the parts; for me, a system that does that for me would take away half the fun of exercise xD
Though that said, I don't do this sort of thing very often; for someone who made a living out of doing this to sell the sets, rather than having it as a hobby, I can imagine it would vastly speed up the process ^^
I would use this! Our childhood Lego consisted of several crates of Lego that my dad bought for us second hand in the mid 1980's (no instructions). This pile of Lego has been sitting in my parents attic for decades now. I still plan to inventory the pile and figure out which sets we had, so software like this would be very welcome.
Given the size of the brick pile I need to process, some sort of bulk input method would be welcome.
Gonna be so easier than searching the internet for the parts to the identify sets
that sounds really cool.I would defenitely use it.
Or you can just make new loose parts list on reblicable, and use build function to find sets you can build.
This way I sorted a box of old lego from y childhood when I have next to no memory what sets we had. A lot of pieces were missing but I got most of the pieces sorted (one piece is still mystery, probably a quick grab of a piece from someone as a kid, so no sets, just a piece X"D bc it's quite unusual but I'm pretty sure we didn't have THAT set)
I love reading these articles and seeing the amazing things people are creating for school/work/life as a result of their Lego hobby. Congrats Ben on this fantastic idea and good luck with your studies, it sounds like you are doing really great work! :)
@ThatBionicleGuy said:
"Sounds interesting, for sure! But I don't think I'd use it, I prefer the additional challenge of using only Bricklink's database to figure out what I have from the parts; for me, a system that does that for me would take away half the fun of exercise xD
Though that said, I don't do this sort of thing very often; for someone who made a living out of doing this to sell the sets, rather than having it as a hobby, I can imagine it would vastly speed up the process ^^"
I agree with you. For me the best part of buying a random box of parts is the detective work. The joy of finding rare parts and figuring out what set they come from is the best.
I think it’s a great idea. My only worry is,
if any release to the public isn’t accurate enough to begin with, it will soon fail as a concept, due to the people trying it getting frustrated with it and giving up. Good luck with it though.
@V_14 Interesting idea with having it suggest bricks to look for, might have to look in to implimenting something like that.
@Zander thanks for pointing out the snowball effect of confirming sets incorrectly. This is definitely an issue with the system that I need to look at. As it did cause issues in one of my user tests
As someone who wants to buy salvaged parts one day this will be golden.
I would love to use this, additionally a mode where it only focuses on sets that are marked as owned would be really helpful for me.
. Can't wait will be a massive help
Always wanted this! I thought Bricklink would eventually get this type of functionality, but if you get there first good on you, a brilliant piece of work
I really like this idea...both from the LEGO application perspective and from just a plain nerdy perspective. I have thousands of pieces I have purchased over the years, along with parts to many sets. Analyzing the pieces to determine which sets can be created and ordering those sets from most to least available pieces is a great idea. The only drawback for me is the manual work required to catalog the pieces I have available. Beyond that, this is really great!
EDIT - While the idea is interesting and useful, I think there is a way to simplify it (at least for me). Just have the algorithm analyze all parts against all sets and then list (in order) the most complete to the least complete sets possible. Allow the user to select a set they want to create from the loose pieces, and then run the algorithm again without the parts for the set that was selected...and repeat. Have the algorithm indicate the missing pieces for the selected set. Then create a BrickLink list of all the required pieces to complete all the sets that were selected. My thoughts/comments here probably aren't dissertation material, but definitely an awesome tool for me. You'd think TLG would love this because it would definitely drive sales in BrickLink. Perhaps analyze part costs in BL to provide an estimated total cost for any generated parts list?
I wonder if the algorithm allows for color variations of part numbers? If so, it would be great to indicate in each set what colors were substituted where in the set.
@bnic99 said:
" @Zander thanks for pointing out the snowball effect of confirming sets incorrectly. This is definitely an issue with the system that I need to look at. As it did cause issues in one of my user tests"
No problem ;~)
My gut feeling - without having done a proper literature search - is that you should check what has already been done in the field of cluster analysis with imperfect data. The data are imperfect because a) the sets may be incomplete, b) a part may be common to several or all sets and c) errors in identifying parts. For the purpose of any cluster analytic technique, the strength of the relationship between any two parts can be taken as their known co-occurrence in sets.
If only it could help me find the parts I need to rebuild my phantom lol
@Sandinista said:
"This function already exists on Brickset. I am able to look up a part and can change the results so that only sets I own with said part appear. While this is harder to do with older parts (as not all older parts are in Brickset's inventory), I'm still able to do it pretty easily.
But then again, I really know my Lego. I'm able to find part numbers easily, while someone else who buys bricks in bulk at a garage sale or something might not be able to. I can see this being most useful for those who aren't really into Lego, but I am unsure of how you could setup search parameters to make it easy for them to search for parts without using part numbers.
So while I wouldn't use it, I can see others might - I just doubt if it will be worth the effort"
I didn't know this!!! So handy. Now I know which sets to raid when I need a key part!!! Thank you!
Couple of issues... many bulk buys have mixed/split collections. So you’ll get false positives of “complete” sets.
Bricklink works well to determine what set is likely in a bulk collection. Find a part you think is fairly rare and enter it. Usually narrows it down to 10 or less sets. Usually you can tell from the image or figs which set is in the collection. If the rare part is still connected to another part, even easier. Use the appears with and you’ll get the match for sure.
i only skimmed article
my question is what happens in null events?
Not talking about Nick Null & Tracy Lightman (known criminals) but talking about nonexisitent in sets but rare pieces.
An example would be the Dark Blue version of Mickey's Hat from disney's castle. It was NOT found in any set, but found in Build a Minifigure selections.
(I own the hat & entire MF)
So a real part of nonexisitent nature.
Also would be cool to analize picture to figure out which parts based on a picture, or series of pictures would also be useful to have app flag parts unidentemtified for more pictures aka have it have an overlay (HUD) that shows color code of parts identifed.
Similar to HUD of aircraft / cars used to id incoming missles or aircraft (jet hud) or deer / other hazards on road (car hud)
@milflinn
It uses the Rebrickable API to request the list of sets a part has been in once one has been entered, If this returns nothing (as it would with the case of that dark blue hat) then the system just ignores the part as it then has nothing to work off. I might add a pop up at some point that could inform the user when this occours.
The bins of random Lego I have acquired rarely include 100% of any given set, so some of your underlying assumptions as well as the testing method aren’t quite accurate to my situation. That said, this seems like a really cool tool and I would love to see where it goes.
@bnic99 said:
" @milflinn
It uses the Rebrickable API to request the list of sets a part has been in once one has been entered, If this returns nothing (as it would with the case of that dark blue hat) then the system just ignores the part as it then has nothing to work off. I might add a pop up at some point that could inform the user when this occours.
"
17349pb02
listed on bricklink as part of PAB 2019.
My background is in IT, may I suggest you look at Delcious Monster as inventory software that uses exceptionally great User Interface. I hate feeling like I am using a Windows 3.1 Beta Application in 2021.
Good bones is one thing a finished product is another.
Delecious Monster inventories various things books, other collectibles.
Isn’t this just what rebrickable does? Enter the part in a parts list, run the build tool and see what sets you can build. Once you’ve found a set you combine the parts into the set to remove the loose parts and add the set to your collection.
I may be missing something but I’m not really sure how this is different.
this would be an interesting thing. a coworker just gave me all of her kids lego sets. so i got a giant box of lego. but my memory with lego sets i was able to pick out over two dozen sets just by eye alone. there are a few that had interesting parts that i then looked up and figured out from there. of course missing parts so i then went to bricklink and picked up a few of the missing parts to complete the sets that i wanted complete
@jaredhinton
Rebrickable rates sets based on completeness how much of a set you have, so 1 part out of 100.
This does calculations based on part rarity, so a unique part could only have only come from one set.
Good idea, and it looks like good execution. Thumbs up. Personally, I am in the camp of folks who enjoy the forensic work of manually sorting through a random box and cross-referencing the pieces on Brickset or Bricklink. As noted by @ELH2806, it's relatively easy to start with printed pieces and stickers, then minifigures, then large pieces, etc. to determine a set of origin. For example, stickers that depict vehicle license plates usually just use the set number. Otherwise, one minifigure and a somewhat unusual part/color are usually enough to identify the set. But it does take time!
This would be a great tool for me. When I buy "random" parts, it is usually from a single source and not by weight. Since they come from an individual household in most cases, it will be a mix of a few to a large number of sets. Identifying potential sets, will allow me to complete each set either sorting through the parts or ordering them. I have been using existing resources but your effort appears to give additional options.
Currently I am sorting through a large tub of parts. (I have already culled 100s of minifigures and expect the number of sets in double digits.) While I have been able to identify sets from several large parts, there are assemblies which I would prefer to leave intact. but using existing resources have not been able to identify some. An example is a partial assembly that includes a large 1/4 circle blue piece with a wedge edge. Having the ability to search on key words will be very useful.
If you want volunteers as beta testers, count me in!
I would definitely use this
I really like to buy used Lego, at flea markets for example. I normally do these things by hand, but a tool would be appreciated.
@bnic99 yup making it specific & unique. There aren't official sets other than star wars that have R2-D2 for example. (at least that I know of)
Whitelist (includes all sets)
Blacklist (isn't included in)
Bluelist (list of extreme overlap)
Example Sets that share lots of similar parts but not necessairly "average parts" a 1x2 headlight brick in common core colors
Redlist (list of limited overlap)
Example yellow 2 x 6 plate is less common than others in same color / style but 2x6 in orange is smarter to sort by as less common.
Yellow list (exceptionally common pieces not generally considered good to ID a set from Red 2x4 brick)
You could increase speed of query by signifigant elimination overlap in a weighted sort (not sorting yellow list first)
Order of sort should be
White list (all possible)
Red List (limited overlap)
Bluelist (Common overlap)
Yellow list (Extremely common)
Blacklist (parts not made by lego)
Basically find the "average" parts etc then only use limited variance in large group sort (dont look for common parts until last for fine grain sort)
he could exploit date of parts avilability to speed it up to. You don't need to look in 1989 for a color not introduced until 2013 for example.
Basically process of elimination married to unique rapid identification.
programer doesnt seem like an idiot and feels useful
Idea: someone needs to make a Lego brick search where the user answer few questions and it gives a list of bricks with that characteristics: How many studs? How many anti-studs? Does it has a slope? Is it curved? Is it transparent? Is it cylindrical?
Definitely useful. If it helps at all I've used a similar site before at http://www.spoofdata.com/setfinder.
Being able to put a set number in and having the unique and rare parts listed at top with pictures of each part would be most useful. Currently, getting to this in brickset requires some drilling down or manipulating the url path. This feature helps me decide which sets I am going to buy when new sets come out.
I think most of what I was going to suggest has already been mentioned. Ask the user to start with the oddest element they can find, provide a list of elements from other sets to choose from, etc…
I would suggest grouping the colours into families if possible. Yellows, reds, browns, greys, and so forth. Some colours can be difficult enough to tell apart when they are in your hands, never mind trying to hold each of them up to a likely un-calibrated screen for identification.
Speaking of holding things up to your screen; in the first screen shot where the user selects a colour, the thumbnails for Light Bluish Grey and Pearl Light Grey are indistinguishable from each other. There are other examples of this, such as Light Pink v Light Salmon, etc. It may be beneficial to replace those thumbnails with images of actual parts or with 3D renders as Rebrickable does on their color page.
Sounds like a fun project. I wish you luck in your endeavour.
According to what I see on facebook LEGO groups EVERY SINGLE DAY, I'd say you'd fill the niche that's so needed for some.
1. I would most definitely use this feature as I commonly run into this problem.
2. As there are a significant number of Bricklink users as well as are Rebrickable, it would be beneficial to have those specific part numbers in the inventory.
I would love to see a feature like this in the feature.
Rebrickable "build" already does this...
Idea is nice, and could be great for sellers. For me, finding which sets are in a lot is part of the fun of collecting. It's a game of hunt and puzzle-solving which I will not want to do via software.
I recently did this a couple years ago when I was visited my parents. In addition to using Brickset’s inventory of parts, I also dug out my childhood memories. As I got closer to the end, I had a few unique pieces I didn’t recognize, but again, using the inventory, I was able to put together a few more sets I wasn’t aware of. Now I’m left with a small box of extra pieces that I don’t know what they’re for.
I think this is a great idea.
@Norikins said:
"Rebrickable "build" already does this..."
Yes, but Rebrickable's build feature is only useful once you've already catalogued what sets or parts you have in your collection. This project seems to be aiming for identifying a set starting from a single part most probable to be unique to the set and before any cataloging of the acquired bin of parts has been attempted yet.
Like @GrizBe, I’m not in the market for bulk used parts, so I doubt I’d ever use this. Several members of my LUG do buy quite a few bulk lots, and often compile a few sets to resell and cover their cost. If it saves time in that process, I’m sure they’d be interested in at least trying it. I might even get less questions about minifig parts if they can identify the sets and see what minifigs came with them.
“So, my questions to you are:
Is this an analyser that you would find useful and would use?”
Absolutely! This is one of those “for fun & profit” products- its useful for casual exploration, or for processing bulk lot purchases from second hand sources (I’m looking at you, @gicwecommerce). I hope you build this out for public use!
“What ways would you like to see this system develop if I continue working on it and make it available for everyone to use?”
Layout- user interface is everything. Or get it embedded in Bricklink or here, at brickset, let them do the design work.
Sorting results by multiple tags, factors- might already have this function?
Saving part searches
Removing or denoting parts in a saved search after identifying them as a potential set.
Whatever you do with your research and program, it’s a great idea! Solid foundation!
Now to get mindstorms to sort my bulk bins for me...
@Slave2lego:
In my observation, they’ve been pushing Seller requests to the back burner for years now while they chase new Buyers. This is geared more towards Sellers, and is probably not even at the top of their list of requests, so I wouldn’t hold your breath on that one.
@rab1234:
Even that can be helpful. Many people who buy bulk lots want to identify the sets to either sell or assemble, and to do either you’d need to identify which pieces you’re missing first.
@milflinn:
You’ll get one or two things. Either the database you’re working from will recognize BAM/PAB parts and note them (Bricklink has started cataloging parts as being from BAM due to the high number of exclusive parts they release each year now), or it will note that they don’t belong to any set. But the purpose of this isn’t to identify which set a part belongs to, but which sets you have in a pile of parts. If the part didn’t come in a set, there’s nothing to identify.
@jaredhinton:
I can see an immediate difference. Rebrickable is geared towards taking a list of sets you own and identifying every set you don’t own but can build if you part out your entire collection. With this, you don’t care about a full list of every set that you can build one at a time, but what combined list of sets you’re probably looking at. As you identify a set, you can trim the list of parts and see what’s probably in the leftover pile.
@bnic99:
I have some notes on other notes you’ve received. You already have access to part numbers that pair with part descriptions. Just add a switch that lets you change from one to the other on the input side. Or add an input box where you can type in a part number and get a list of associated parts instead of having to hunt through the entire database.
For misidentified parts, rather than removing the pieces completely, “park” them in a list of identified sets. As pieces for that set are added, use them to fill in any gaps, and then drop any remainders in the list of parts left to identify. If things don’t make sense, give users the ability to release any or all sets back to the part list so they can try a different combination of sets until they find one that works.
This article was not what I expected it to be. I was looking for an article on someone's process with existing tools to assess their childhood collection or another collection that has been obtained with limited knowledge of the original sets.
That being said, I love this concept but am dubious of its practicality for me as I am usually dealing with multiple thousands of unsorted bricks from which I want to determine the original sets. Entering that many pieces in to be sure of eliminating pieces from the search seems impractical. The biggest aid in searching is recognizing unique/rare parts and colors. Second may be finding larger quantities of specific parts. I will say one big curveball would be the basic creator brick sets that are out there with an assortment of generic pieces. 4105 Creator Imagine and Build is one example I know I have in my possession because of the ubiquitous red bucket in which it was sold. A couple of the parts in there are relatively unique printed parts but still would make for a difficult narrowing down from within a larger set of pieces.
Over the past couple years I have retrieved my brothers' and my Lego from our childhood home and cleaned and sorted them. I had built some custom bits but recently broke all of the custom bits apart and sorted them and have begun the fun but arduous task of turning the sets back into individual units to return the ones I can identify to my brothers for their and their children's future enjoyment and to get an inventory of what I have to assess a keep/"sell to buy more" list. My older brother's sets are easy enough to distinguish from that of my little brother and my own as he was mostly done buying/receiving Lego before my brother and I had begun.
That being said, Brickset and Bricklink have been very helpful. I wonder if a short article on my process would be of interest or help as I have been thinking over writing such if there is interest. My current strategy is to get all of my most recent purchases built from the sorted bits using the manuals I've kept. I will rule out all the newest acquisitions, then work backwards in time to build out all of mine and my little brother's sets then move on down the timeline to my older brother's sets last since there was the most time allotted for those pieces to break or disappear. @Huw, let me know if you'd like to see something like this.
Aha! Something about which I am passionate enough about to log in and comment.
I spend the majority of my LEGO time buying bulk bins and sorting through them -- sometimes for my kids, sometimes for myself -- so this kind of product would save me countless hours if it were even semi-automated. I probably could build something like this myself as a statistician of 20 years, but even if I had one I think I'd prefer to do it by hand as there are things I do that no software probably could without 3D part recognition. I even started a YouTube channel to document my process.
https://www.youtube.com/channel/UCiHGa-bOSwjmwCeui6q1NDA
In short, the best bins yield the best sets when well chosen. I look hard at the listing pictures to see if there is anything there I like to see. Transparent yellow plates or bricks, for example, will almost certainly yield me Space sets. I don't often buy any old bin without scrutiny and the times I do, I make sure I get it really cheap (both scrutiny and cheap bins come a little less easy these days).
Here is my method:
0) Remove all non-LEGO pieces. If I suspect a piece might be LEGO or I haven't seen it before I put it in a temporary "maybe" pile.
1) Pull out all assemblies whether they look obvious as a set build or not, even if it is only a few pieces put together. Often I will be able to identify a set by comparing an assembly with a set photo, especially if there are no instructions (which is the case for me well over half the time). The exceptions to keeping assemblies intact are the obvious single-width towers of bricks or plates.
2) Sort the remaining single pieces by colour, picking out decorated elements and minifig parts and accessories along the way. If I have a large volume I will also sort by brick, plate, Technic and so on to reduce used up table space.
3) Start identifying sets by minifig parts (if they exist, for many sellers pull minifigs) and decorated elements. Stickers are great for this as often they will appear in only one or two sets. Printed elements are a little more common but not by much. Then I will look for assemblies that support that hypothesis. This nets me results at least 90% of the time.
4) If there are no minifigs and few decorated elements, go by uncommon elements or elements in uncommon colours. This requires some a priori knowledge, but that comes with experience.
5) In the absence of minifigs and many printed elements, I then go by stampings on the undersides of parts, the rationale being that the same set will have more or less the same stamping pattern for multiples of the same part in that set. This is not a bulletproof method and I do know that LEGO itself will put different variants of the same part in the same set. I have yet to calculate probabilities of this particular method, but I have high confidence in it. It also helps to keep from putting all of the same part in a single set, having none left for later sets. If the bulk bin is small enough, I will often knoll out all of the parts of a type by underside stamping.
Normally this gets me to within 95% of the parts of a bin assigned to a set. The best success I have ever had from a bulk bin using this method was only enough parts to fit in the palm of my hand and the box was quite large!
Thank you for the opportunity to contribute.
Awesome idea!
Sorry if this is a repeat suggestion, but I don't get to read everybody's comments.
If it was possible to have camera recognition, plus be built to run on mobile devices, this would be amazing for me personally.
I did just attempt to read the part number on a minifig baseplate after taking a pc with my iPhone, but no amount of zooming could make it legible, so I don't like your chances there.
Good luck with it. I'll buy a copy of it regardless of what final form it takes, just so I can support this great community.
Not sure what happened to my post which is too long to recreate but, an absolute yes! I will use it.
I'm a volunteer for beta testing if you are looking for any.
@DrKold:
Once you open a page, there's a time limit on how long you have to post before it'll just erase the comment box when you hit the button. Refreshing the page resets that time limit. I don't remember how long the timer runs (Huw did increase it in the last year), but if you suspect you've let the page sit too long, copy the post before you submit it.
Note that there's also a limit on how long a post can be. If it's too long, it'll just cut off at the limit. So if you write a post that feels really long, copy it before you post. Then you can either split it into two comments, or trim it down to fit in one.
Certainly sounds like an interesting (and yes, useful) project!
Bricklink inventories are (or were, before the TLG takeover) probably the most accurate in general, but creating a way to check against the "official" Lego inventories as used on Brickset as well might aid in set identification and be a very useful trick for people who don't want to have two sets of windows up while they do their searching. Also a function that equates a Bricklink part number to the appropriate "official" part number would be very useful--and for the maximum bonus points, translating the descriptions from one system to another! I realize this adds an extra layer of complexity to the process, as well as to the user interface, but I personally would find it very useful, and I doubt I'm the only one who finds having two different systems confusing. (Of course, it's possible that TLG is forcing Bricklink to replace their system with the "official" one, in which case all the above is wasted breath...frankly, I haven't been doing as much on Bricklink as I used to, mainly because I'm now mostly buying new sets for my collection, not trying to fill in gaps from before I began collecting!)
This is something I would definitely use. Adding the possibility to search on the Lego part number would be a useful addition.
The method I use after sorting non-lego parts out, is sorting by colour, keeping odd parts apart.
Then I check the part number on an odd part then go to Bricklink and check in which sets this part is used.
When there is more than 1 set, look for other parts used in the 'probable' sets and continue to search those. This way I found 95% of the sets. When I am sure I found a set I print a inventory list from Bricklink and complete the set. After completing this search I know what parts are missing and add these to a Bricklink wish list.
This leaves less bricks remaining to search for the next set.
At this moment I am going through 5 bins and found already 75 sets this way.
I also noticed that a lot, from a household, mostly contains sets from a certain period. This narrows down the number of sets to be searched. The lot I am sorting now contains sets between 2008-2012. A previous lot contained sets between 1985-1993
That programme would actually destroy my hobby :D What I like the best about this hobby is buying a second hand collection and then identifying what sets might be in there. The identifiication is the fun part! Why would you let a computer to that part?
Wow! I was only talking to my wife about this idea last night, saying that if someone could make an algorithm that could identify your kits from the parts you have it would be awesome! I've got into buying bulk Lego, and while I am COMPLETELY ADICTED to buying in bulk as I have scored some AMAZING sets, I now have about 50KG of random lego that I do not have either a definite list of sets or have I cleaned and sorted it all yet. I basically go through the rough batch and pick out identifiable pieces and check out brick link, rebrickable and brickset and see what I can identify, but it's not a great process. Also, it only identifies the major sets, the ones that have really identifiable pats. What about the left overs once I've picked out those set's parts?
I'd love to see this worked up more.
Judging by the number of set identification questions asked in various Lego oriented Facebook groups and on Bricks.SE, there is definitely a demand for this kind of application. On the other hand, the askers of these questions are usually not versed enough to know about the relevant resources, so I guess this software would mainly be used by AFOLs trying to identify their bulk lots or by regular answerers of questions like this: https://bricks.stackexchange.com/q/16086/3631
I'm seconding the idea of kyrodes above about a feature based, step by step brick identification system too, that would be an enormous help in answering such questions.
Just out of curiosity, and pardon me if someone else has thought of this already, but have you considered using an item recognition program? I've seen android apps that use the phone camera and scan for matches (works well with plants and insects) and this would certainly help with your database. The user just takes a snapshot of their Legos and then gets the results. This even solves your entire tagging problem.
@EricJC
This was my first Idea, and what I initially attempted. but wasn't able to get it working. which is why I went to searching via drop downs as a back up.
@sklamb:
I still trust Bricklink inventories more. Brickset inventories are subject to post-editing. If a part is permanently retired, they will often sub in a different part in inventories for retired sets, which automatically replaces the correct inventory on Brickset with one that is anachronistic.
Back in the late 1990s, I entered my second Dark Age and had put some assembled sets into a plastic tub for storage. Most sets had been put into ZipLock bags and those were put into boxes or at last into tubs with boxes separate...but this big tub must have been packed in a hurry. Five moves later (including going cross country TWICE), I opened the tub and, as expected, it was all broken down into small pieces. I don't recall which sets were in this tub, but I do have a general idea of which ones it could be. I would love to use a tool like this to narrow it down to start building them. ...and every one I built would likely make it easier to build the remaining ones.
I often buy from Goodwill, Yard Sales, etc, and would certainly use this wonderful tool!
Perhaps the awesome gang at Brickset could host it?
@Lego_mini_fan said:
I subbed!
Thank you, that is the best news I've heard all day!
I think what would be most useful is a way of identifying old, long-discontinued bricks. I already figured out pretty much everything I have from my childhood collection (it helped that I still had many of the instructions and some sets were built or at least partially built. However, my biggest challenge was not identifying the sets, but identifying the parts.
For whatever reason, Brickset seems to use non-standard part numbers for stuff that has been out of production since the 80’s. Bricklink howeveem seems to be better with this. But regardless, there is no good way of identifying bricks. I have some fairly unique stuff, but my biggest problem was searching for it. For example, a helicopter blade construction with so many blades built on top of a certain sized brick and in a certain color. Entering information like that should narrow down the possibilities very easily, yet there is nothing available that allows you to do this.
Next step: add a conveyor belt, drop the mystery box contents on it, add a camera to scan the bricks and use AI to solve the sets!
I would find that very usefull, so yes, I am interested. Web site or an app would be good.
Oh yes...got a massive bag of lego from a friend with LOTS of parts, some look really old...so keen to try this software too
I would have loved to have this 4 years ago, around the time my kid was born I bought a giant lot of over 60KG with in it over a 100 sets, and about a 1/3rd of the instructions included. I spend days and days trying to identify which sets were in it. Just going through the pile and add a lot of numbers of the more rare/special pieces in an app that would return which sets I likely have could have saved me a lot of time.
Great idea