this post was submitted on 13 Jun 2024

43 points (100.0% liked)

TechTakes

42 readers

15 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 1 year ago

MODERATORS

dgerard@awful.systems

using GitHub CoPilot leads to the obvious consequence (archive.is)

submitted 5 months ago by dgerard@awful.systems to c/techtakes@awful.systems

68 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] Imacat@lemmy.dbzer0.com 5 points 5 months ago (4 children)

UUIDs make great primary keys in some applications. If you generated 100 trillion UUID4s, there’s about a 1 in a billion chance of finding a duplicate. Thats usually good enough for my databases.

The issue here was that they used a single UUID instead of generating a new one for each record.

[–] gnomicutterance@awful.systems 18 points 5 months ago

There are countless issues here. They didn't do exception handling, they used a string to store their UUIDs (even if this was a DB constraint, you use sqlalchemy.Uuid and let the ORM and DB handle the translation), and as the person you're replying to stressed, they're using non-monotonic UUIDs. Also if you have a unique user_id and you're never exposing your primary keys, you don't need to get fancy, just let the ORM handle it with auto-incrementing, for most use cases. And so many other tragic things about this one tiny blog post.

tl;dr if you're going to copy code you don't understand, copy it from the docs, not from everything in the kitchen thrown into a blender.

[–] mawhrin@awful.systems 13 points 5 months ago* (last edited 5 months ago) (2 children)

they also stored this thing as a fucking string. looking up strings is costly.

[–] froztbyte@awful.systems 13 points 5 months ago (1 children)

naw bro we've got indexes bro it'll be fine bro

[–] froztbyte@awful.systems 10 points 5 months ago

can't wait for the Clever Idea to offload costly string indices to an external source composed of redis box and some shitfuck app doing tf-idf after a Extensive Research into how to make string lookups be faster

[–] ChairmanMeow@programming.dev 3 points 5 months ago (4 children)

This sounds like a case of premature optimization to me. We have plenty of databases using strings as Ids and they're all more than fast enough for any of our purposes. And that's with considerable volume going through.

I've never seen bad performance from string ids be an issue.

[–] sinedpick@awful.systems 13 points 5 months ago (2 children)

so we're calling "not doing pointless unnecessary work" premature optimization now? cool cool

[–] 200fifty@awful.systems 13 points 5 months ago

Making me learn how to do things the right way is premature optimization

[–] ChairmanMeow@programming.dev 2 points 5 months ago (1 children)

Have you never worked on a large distributed system before? There are good reasons not to use integer ids:

Generating ids on multiple machines is a hassle and requires careful configuration, which is not necessary when using uuid/cuid2 or something.
Ids have to be generated by the database, which is a huge degradation of overall system performance. If the application can generate the id, then the database insert is not a blocking operation anymore and you can just continue.
The performance difference is highly negligible, as it's massively outweighed by fetching rows for example. With a proper database design, the difference is anywhere between 1-5%. If that makes the difference for your application, you've already made poor design decisions elsewhere that are far more important.

We use prefixed incrementing base63 uuids. It's highly performant and we can generate it in the application, saving a lot of time in many processes because we don't have to wait for the database anymore.

I'm sure doing int indexes over strings was once considered the gold standard but that's not been true for years now. Yes, it's slightly better for database performance. No, it's not better overall for a slew of reasons, including system performance.

[–] self@awful.systems 6 points 5 months ago

ok shut up now

[–] ebu@awful.systems 11 points 5 months ago (2 children)

"what are you talking about? a hammer removes bolts just fine. i personally don't have an issue with the tiny bit of extra elbow grease to wedge the claw around the bolt-head and twist; if anything, it's saving me effort from having to use a wrench."

[–] froztbyte@awful.systems 2 points 5 months ago

I really should reread that sometime

Also some Mickens

[–] ChairmanMeow@programming.dev 1 points 5 months ago

See https://programming.dev/comment/10515517 There's good reasons to use something like a uuid over integers.

[–] mawhrin@awful.systems 7 points 5 months ago

why is it always programming dot dev?

[–] V0ldek@awful.systems 7 points 5 months ago

I’ve never seen bad performance from string ids be an issue.

You haven't seen shit then, simple as that.

[–] froztbyte@awful.systems 11 points 5 months ago (1 children)

You’re missing the entire point of the post you replied to

[–] Imacat@lemmy.dbzer0.com 5 points 5 months ago (2 children)

I was reading it as an endorsement for autoincrementing int primary keys and a condemnation of uuids in general which is a genuine stance I’ve known people to take. Is that not it?

[–] froztbyte@awful.systems 9 points 5 months ago (1 children)

indeed, that is not it

hint: don't try to "read in" any extra meanings. just read the actual statement that was posted.

[–] froztbyte@awful.systems 8 points 5 months ago (1 children)

second hint: throw "monotonic UUIDs" into your search engine of choice

[–] Imacat@lemmy.dbzer0.com 3 points 5 months ago (1 children)

Would they not have monotonic uuids after altering the code in the article to use a function or lambda as they suggested?

[–] ebu@awful.systems 7 points 5 months ago* (last edited 5 months ago) (2 children)

~~you might know what "monotonic" means if you had googled it, which would also give you the answer to your question~~

edit: this was far too harsh of a reply in retrospect, apologies. the question is answered below, but i'll echo it: a "monotonic UUID" is one that numerically increases as new UUIDs are generated. this has an advantage when writing new UUIDs to indexed database columns, since most database index structures are more efficient when inserting at the end than at a random point (non-monotonic UUID's).

[–] Imacat@lemmy.dbzer0.com 4 points 5 months ago (2 children)

I’ve more of a math background than cs so monotonic is a word I know well but it apparently means something slightly different to me. Monotonicity isn’t mentioned anywhere in that link.

[–] gnomicutterance@awful.systems 7 points 5 months ago (2 children)

okay, for some reason, I feel the need to help.

The given link defines the function that creates a UUID:

uuid.uuid4(): Generate a random UUID.

In mathematics, can you generate a monotonic function by generating random numbers?

[–] Imacat@lemmy.dbzer0.com 6 points 5 months ago

Thanks for trying to explain it. I was hung up on thinking all UUIDs looked like UUID v4. I read up a little on UUID v7 and it’s making sense. Probably should’ve done that sooner.

[–] ebu@awful.systems 5 points 5 months ago

you are probably a better person than i am for actually giving an explanation

[–] froztbyte@awful.systems 6 points 5 months ago

just stop digging, sheesh

load more comments (1 replies)

[–] JackbyDev@programming.dev 4 points 5 months ago* (last edited 5 months ago) (3 children)

Everything after this is so pointlessly condescending and confusing. Even if someone knows what monotonic ids are it doesn't automatically mean they're going to have any clue about what that means with regards to index performance. In the spirit of not being an asshole, I'll write it out here based on my research since everyone else just seems interested in putting others down rather than being helpful.

"Monotonic" implies something that is always increasing (or decreasing). You'll never get a result that's lower than one you've gotten before (or higher if you're dealing with monotonically decreasing stuff).
Random UUIDs are not monotonic because they're random.
Even time based UUIDs are not monotonic because of the format. Rather than being store high, medium, low, they're stored low, medium, high. Think of it like storing numbers like "1 20 300" for 321. 322 would be "2 20 300". To make it worse, the end of them is "random" (a MAC address). So, not monotonic at all because MAC addresses can change. (See here for proposed new formats, where they mention this as a problem https://www.ietf.org/archive/id/draft-peabody-dispatch-new-uuid-format-04.html)
Monotonic primary keys are useful because they're more easily inserted into an index because you're always inserting into one specific part of the index rather.

[–] ebu@awful.systems 10 points 5 months ago* (last edited 5 months ago)

putting my 2¢ forward: this is a forum for making fun of overconfident techbros. i work in tech, and it is maddening to watch a massively overvalued industry buy into yet another hype bubble, kept inflated by seemingly endless amounts of money from investors and VCs. and as a result it's rather cathartic to watch (and sneer at) said industry's golden goose shit itself to death over and over again due to entirely foreseeable consequences of the technology they're blindly putting billions of dollars into. this isn't r/programming, this is Mystery Science Theater 3000.

i do not care if someone does or does not understand the nuances of database administration, schema design, indexing and performance, and different candidates for the types of primary keys. hell, i barely know just enough SQL to shoot myself in the foot, which is why i don't try to write my own databases, in the hypothetical situation where i try to engineer a startup that "extracts web data at scale with multimodal codegen", whatever that means.

if someone doesn't understand, and they come in expressing confusion or asking for clarification? that's perfectly fine -- hell, if anything, i'd welcome bringing people up to speed so they can join in the laughter.

but do not come in here clueless and confidently (in)correct the people doing the sneering and expect to walk away without a couple rotten tomatoes chucked at you. if you want to do that, reddit and hacker news are thataway.

[–] slopjockey@awful.systems 7 points 5 months ago (2 children)

Yeah, I'm all for dunking on promplets, but just being wrong about best practice isn't a big deal. The reaction here is excessively harsh.

[–] self@awful.systems 10 points 5 months ago (1 children)

agreed. we’ve veered a bit too close to slashdot’s tone on this one.

with that said, I’m also acutely aware of the tactics that programming.dev reply guys use to generate these kinds of responses. to our guests: it’s best to take your questions about database best practices literally anywhere else but here.

[–] froztbyte@awful.systems 6 points 5 months ago

with that said, I’m also acutely aware of the tactics that programming.dev reply guys

I wasn't actually aware of this, and will be taking note of it in future. for my part I tried to make my reply "uhh go look at $x and learn" post without, y'know, overtly making things into a not-meant-for-here debate setup, but that didn't seem to have worked out entirely well :)

[–] V0ldek@awful.systems 9 points 5 months ago (1 children)

Just to be clear, if a person is wrong about best practices then it's not a big deal.

In context of spicy autocomplete as coding assistance, it better output immaculate, robust code every fucking time or we should be clowning on it with zero remorse.

[–] slopjockey@awful.systems 6 points 5 months ago

Wait a second...to err is to be human. Programmers err sometimes. ChatGPT shits itself all the time...😟. Yud et al. were right

[–] froztbyte@awful.systems 6 points 5 months ago (1 children)

Read the sidebar. This is literally not the place.

[–] JackbyDev@programming.dev 1 points 5 months ago (2 children)

The fuck is a side bar? My app doesn't have that. Be more specific, please.

[–] blakestacey@awful.systems 11 points 5 months ago

If your "app" cannot show basic information about the forum to which you are posting, your "app" is bad.

[–] self@awful.systems 10 points 5 months ago

no, programming.dev, let’s fucking not

[–] Hexarei@programming.dev 4 points 5 months ago* (last edited 5 months ago) (1 children)

They're good for large, distributed applications for sure. Better than incrementing integers for those kinds of applications at the very least.

For the folks in the article though? lol they were making no good decisions

[–] gnomicutterance@awful.systems 14 points 5 months ago* (last edited 5 months ago) (1 children)

when you do not yet have (1) customers, (B) unit tests, (ג) developers who can write their own code, or (IV) exception handling, the term-of-art that comes to mind for doing anything besides auto-incrementing primary keys is YAGNI. (Especially because nobody who is making thoughtful, careful database tuning decisions is using chat-gippity to convert their models. And more to the point, they aren't using SQLAlchemy of all things to make large, distributed applications that need UUID primary keys.)

[–] Hexarei@programming.dev 3 points 5 months ago* (last edited 5 months ago) (1 children)

Oh for sure, the article folks are inept and absolutely not the people I was talking about. I'm just talking about stuff more like Discord or Steam that are huge distributed systems that don't use centralized databases.

Edit: that don't use centralized databases. I blame the ADHD.

Edit 2: I am agreeing with this person

[–] ebu@awful.systems 9 points 5 months ago (2 children)

I'm just talking about stuff more like Discord or Steam that are huge distributed systems that don't use databases.

huh???

[–] Hexarei@programming.dev 8 points 5 months ago (1 children)

Whoops, I flubbed that message hard and didn't catch it at the time: Meant to say "don't use centralized databases." They definitely use databases lmao. No idea how I screwed that message up so hard. I blame ADHD for not proofreading.

Just so we're on the same page, let me be more specific. I'm saying the individuals in the article were making terrible decisions. Lots of them.

I am also saying that UUIDs are good primary keys for very specific purposes: Large, distributed systems that handle large amounts of small data, powered by databases like Cassandra that are designed to handle millions of record insertions per hour across several hundred nodes, to the point where inserts are very likely to happen at the exact same time on two different replicas of the same schema.

Hope that makes more sense than my previous flub. lol

[–] ebu@awful.systems 5 points 5 months ago* (last edited 5 months ago)

okay that's a little more sensible lol

i think the original comment that this thread is in reply to is avoiding non-monotonic UUIDs. i don't think anyone is contesting that autoincrementing ints create headaches when trying to distribute the database

[–] froztbyte@awful.systems 5 points 5 months ago (2 children)

See, reason being is they use aethernet - that’s the only way you get to get scale it like this. Without that, communication and storage would just be impossible!

[–] Hexarei@programming.dev 8 points 5 months ago (1 children)

I accidentally a word in the original comment, it was supposed to say they don't use *centralized databases. Instead it said I'm a moron lmao.

[–] froztbyte@awful.systems 5 points 5 months ago

Bit of a whoopsie, that :)

[–] froztbyte@awful.systems 6 points 5 months ago

And I just saw what that poster’s domain is, fuck me