Here's a problem that sounds simple and isn't: how much does a single skin add to a Fortnite account?
You can't just look it up, because skins aren't sold on their own. They come attached to whole accounts. Every price you see in the wild is the price of a bundle — hundreds of cosmetics, a platform, an account level, an email status — all rolled into one number. Nobody ever lists "one Black Knight, $50."
So if you want to know what Black Knight is worth by itself, you have to work backwards from thousands of bundles. That's exactly what our current model does. Recently we went a step further and explored a very different way to do it — a method called hierarchical Bayesian inference — and the early results looked promising enough that we wanted to share what we learned. This post explains what that is — no statistics degree required — and shows why some well-known skins moved quite a lot in our prototype.
To be upfront: this is an exploration, not a shipped change. We built a prototype to see whether the approach holds up. We haven't rolled it out — the values you see on the site today still come from our existing model. If the experiment keeps looking this good, putting it live is the natural next step.
The grocery receipt problem
Imagine you've got a stack of thousands of grocery receipts. Each receipt shows a basket of items and one total at the bottom. None of them lists individual prices. Can you figure out the price of milk?
Surprisingly, yes. If one receipt has milk, bread, and eggs for $9, and another has bread and eggs for $6, then milk is about $3. Do that across thousands of overlapping receipts and the individual prices fall out.
Fortnite accounts are grocery receipts. The "items" are skins, the "total" is the asking price, and we're trying to recover the price of each item from how the totals change as the baskets change.
Why our current model double-counts
Our current model handles each skin one at a time. To value a skin, it compares accounts that have it against accounts that don't, and chalks up the difference to that skin.
That works fine for milk. It breaks for items that always travel together.
Think about it: if every receipt with caviar also has champagne, you can never tell whether the high total is because of the caviar or the champagne. The one-at-a-time method blames both — it hands the full premium to caviar, then turns around and hands the same premium to champagne. Everything gets double-counted.
Fortnite is full of "caviar and champagne" sets. The Season 2 Battle Pass skins — Black Knight, Blue Squire, Royale Knight, Sparkle Specialist — show up together constantly, because anyone who finished that pass got all of them. The current model sees an expensive account, notices Sparkle Specialist on it, and credits Sparkle Specialist with a big chunk of value that actually belongs to Black Knight sitting right next to it.
The idea: estimate every skin at once
The approach we explored doesn't go skin by skin. It solves for every skin's value simultaneously, the way you'd solve a giant system of equations — each account is one equation, and all the skin prices have to add up consistently across all of them at the same time.
Now the set members have to compete to explain the price. If Black Knight and Sparkle Specialist always appear together, the model figures out which one actually moves the total by looking at the rarer accounts where they appear apart — and it splits the credit honestly instead of paying it twice.
This single change is the heart of the idea. Here are the three pieces that make it work:
What "Bayesian" actually means here
The word Bayesian sounds intimidating. The idea behind it is something you already do every day: you combine what you expected with what you observed.
Say a brand-new restaurant has two five-star reviews. Do you believe it's the best restaurant in town? Of course not — two reviews isn't enough. You mentally blend those two glowing reviews with your general sense of what a typical new restaurant is like, and you land somewhere in the middle. As more reviews come in, you lean more on the actual reviews and less on your prior expectation.
That blending is Bayesian reasoning, and it's perfect for skins, because the amount of data we have per skin is wildly uneven. Some skins appear on huge numbers of accounts. Others show up only a handful of times. The famous ones are easy; the long tail is where naive methods fall apart and start reporting nonsense from tiny samples.
The "hierarchical" part: rare skins borrow strength
Here's the clever bit, and it's where "hierarchical" comes in.
When a skin is rare, we don't guess its value from its handful of sales alone. We let it borrow strength from similar skins — ones from the same rarity and the same era of the game. If we've barely seen a particular rare 2018 skin, but we have seen plenty of other rare 2018 skins, the model starts from "what's a rare 2018 skin usually worth?" and then nudges that estimate based on the little data it does have.
It's the restaurant logic again, one level up: a new restaurant with few reviews gets judged partly by how restaurants in its neighbourhood tend to do. Skins live in neighbourhoods too — rarity, release season, whether they're an OG skin — and rare skins lean on their neighbours.
The payoff is that every skin gets a sensible value, not just the popular ones. Skins so rare we could never have scored them before now get a reasonable estimate, clearly labelled as lower-confidence.
The model in one picture
The whole idea — borrowing strength, honest ranges — comes down to one movement: we start from a wide guess about all skins, narrow it to skins of the same kind, then narrow it again to the single skin in front of us. At each step the average sharpens and the spread shrinks. Here's that movement drawn out:
If you only take away one thing, take the shape: a rare skin's value isn't read off its own handful of sales in isolation. It starts from what its group is worth, and only moves away from that group average as far as its own data justifies — a little for a skin we've barely seen, a lot for one we've seen thousands of times. That "pull toward the group" is exactly what keeps the long tail sane.
A closer look at each layer
For readers who want the actual machinery, here's what each layer is doing and why we built it that way.
The controls come first. Before any skin gets credit, the model accounts for the boring stuff that moves price regardless of which skins are present: how big the locker is (more cosmetics, higher price), the platform, and the marketplace. We feed in the log of the skin count rather than the raw number, because going from 10 to 20 skins matters far more than going from 510 to 520. Stripping out these effects first is what stops a skin from looking valuable simply because it tends to sit in large, expensive lockers.
Each skin's value is a small nudge on a group expectation. This is the partial-pooling trick, written out. A skin's value is its neighbourhood prior plus a personal adjustment, and the size of that adjustment is itself something the model learns. Skins with thousands of sales earn a big personal adjustment and drift far from their group. Skins with three sales get an adjustment so small they stay glued to the group. A skin with zero sales simply is its group prediction. Nobody has to choose a cutoff for "enough data" — the math slides smoothly from "trust the group" to "trust the skin" as evidence accumulates.
Prices are heavy-tailed, and the model is built to expect that. A handful of accounts sell for hundreds or thousands of dollars — orders of magnitude above the typical listing. A naive model treats those as catastrophic errors and bends the whole fit trying to chase them, distorting everyone else's value in the process. We instead draw prices from a heavy-tailed bell curve (a Student-t), which treats the occasional blockbuster account as expected rather than shocking. The extreme sales stop dragging the ordinary skins around.
The whole thing is one formula. Because every layer is linear on the log-price scale, the finished model isn't a black box you have to query — it's a closed-form equation. That's a nice property: it would let a tool like our locker pricer score a basket of skins instantly in the browser, just by adding up a base, a size term, and the value of each skin, then converting back from log-dollars. The heavy lifting happens once, when the model is fit; scoring an account afterwards is plain arithmetic.
What we found — and why some skins moved
In our prototype, once skins had to compete instead of double-dipping, the rankings reshuffled in ways that line up with what the community already believes:
| Skin | Current model | Bayesian prototype | Why it moved |
|---|---|---|---|
| Wildcat | +$15 | +$101 | Standalone exclusive — was badly undervalued |
| IKONIK | +$47 | +$72 | Standalone exclusive — credit no longer split |
| Galaxy | +$29 | +$45 | Standalone exclusive |
| Sparkle Specialist | +$59 | +$4 | Always travels with bigger Season 2 skins |
| Royale Knight | +$39 | +$1 | Same Season 2 set — credit was double-counted |
| Elite Agent | +$21 | +$1 | Same Season 3 set |
| Astro Jack | +$23 | ~$0 | Rides alongside Travis Scott |
Two patterns jump out. Set skins fell — Sparkle Specialist, Royale Knight, and Elite Agent dropped toward zero, because their value was never really theirs; it belonged to the headline skins they ride alongside. Standalone exclusives rose — Wildcat, IKONIK, and Galaxy went up, because they don't come bundled with a famous set, so once the double-counting stopped, they kept all of their (substantial) credit.
Wildcat is the headline surprise. Accounts carrying it sell for a lot even after you account for how stuffed those lockers tend to be — it's a region-locked console exclusive, genuinely rarer than most people assume. The current model buries that signal. Here's what the prototype put at the top of the table:
Every value now comes with a range
This is the part we're most excited about. Our current model gives you a single number and no sense of how sure it is. A "+$50" backed by thousands of sales looks identical to a "+$50" backed by five.
The Bayesian approach reports a confidence range for every skin. When there's lots of data, the range is tight. When there isn't, it's honestly wide — a signal to take the number with a grain of salt rather than a false promise of precision.
One important note on what that range means: it's the uncertainty about a skin's typical contribution across the market — not a prediction that your specific account will sell in exactly that window. Any single account swings much more than that, depending on everything else in the locker. For a whole-account estimate, the calculator is still the right tool.
Why we focus on one marketplace
A quieter change matters just as much: we now build this model on a single marketplace rather than blending several.
Different marketplaces are different worlds — different buyers, different price levels, different habits in how listings get written. Pooling them and hoping a single adjustment evens it out doesn't make the prices truly comparable; it just smears two distributions together. In our experiments, training on one consistent marketplace actually made the prototype more accurate, not less — it explained roughly 79% of the variation in account prices, with a typical miss of only single-digit dollars on the skins where we have real data.
Being honest about the limits
No model that reads whole-account prices can perfectly isolate every skin, and we'd rather say so:
- Some skins are genuinely inseparable. A few cosmetics always appear together and never apart. The model can't fully split those — and the wide ranges on them are the honest signal that it can't.
- Listings are asking prices, not confirmed sales. They track the market closely, but they're what sellers hope to get.
- The rare tail leans on the prior. A skin we've barely seen is mostly being estimated from its neighbours. That's far better than a wild guess from three data points — but it's not the same as deep evidence, which is exactly why we show the range.
What's next
To be clear about where this stands: it's a prototype, and it isn't live yet. The values on our per-skin pages, the calculator, and the locker pricer all still come from our current model — nothing on the site has changed. This was an experiment to see whether the hierarchical Bayesian approach could fix the double-counting problem, value the long tail of rarer skins the old approach couldn't reach, and attach an honest confidence range to every number. So far it's done all three, which is why we're optimistic about it.
If the results keep holding up, the natural next step is to roll these values into the live per-skin pages and pricing tools. We'll share more as we go.
The big idea is simple, even if the math behind it isn't: value every skin together, let the rare ones learn from the common ones, and always be honest about how sure you are. That's the whole philosophy — applied to Fortnite lockers instead of grocery receipts.
Curious what your account is worth? Try the calculator, or browse standalone skin values on the skins pages.