this post was submitted on 04 Jan 2024
24 points (100.0% liked)

Programming

13380 readers
1 users here now

All things programming and coding related. Subcommunity of Technology.


This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 1 year ago
MODERATORS
 

cross-posted from: https://programming.dev/post/8121843

~n (@nblr@chaos.social) writes:

This is fine...

"We observed that participants who had access to the AI assistant were more likely to introduce security vulnerabilities for the majority of programming tasks, yet were also more likely to rate their insecure answers as secure compared to those in our control group."

[Do Users Write More Insecure Code with AI Assistants?](https://arxiv.org/abs/2211.03622?

top 16 comments
sorted by: hot top controversial new old
[–] t3rmit3 21 points 10 months ago (3 children)

This is just an extension of the larger issue of people not understanding how AI works, and trusting it too much.

AI is and has always been about exchanging accuracy for speed. It excels in cases where slow, methodical work is not given sufficient time already, because the accuracy is already low(er) as a result (e.g. overworked doctors examining CT scans).

But it should never be treated as the final word on something; it's the first ~70%.

[–] scrubbles@poptalk.scrubbles.tech 11 points 10 months ago

I feel like I've been screaming this for so long and you're someone who gets it. AI stuff right now is pretty neat. I'll use it to get jumping off points and new ideas on how to build something.

I would never ever push something written by it to production without scrutinizing the hell out of it.

[–] sonori 7 points 10 months ago (1 children)

Didn’t it turn out that the CT scan analysis thing was just the model figuring out the rough age of machine, becuse older machines tend to be in poorer places with more cancer and are more likely to only be used on serious illnesses?

[–] ericjmorey@programming.dev 2 points 10 months ago (1 children)

If taking into account the older machines results in better healthcare, that seems like a great thing to be discovered as a result of the use of machine learning.

Your summary sounds like it may be inaccurate, but it's interesting enough for me to want to know more.

[–] sonori 4 points 10 months ago (1 children)

I believe it was from a study on detecting Tuberculosis, but unfortunately google isn’t been very helpful for me.

The problem with that would be that people in poorer areas are more at risk from TB is not a new discovery, and a model which is intended and billed as detecting TB from a scan should ideally not be using a factor like hospital is old and poor to determine if a scan has diseased tissue, given that intrinsically means your model is more likely to miss it in patients at better hospitals while over-diagnosing it in poorer ones, and that of course at risk people can still go to newer hospitals.

A Doctor will take risk factors into consideration, but would also know that just because their hospital got a new machine doesn’t mean that their patients are now less likely to have a potentially fatal disease. This results in worse diagnosis, even if it technically scores better with the training set.

[–] ericjmorey@programming.dev 3 points 10 months ago (1 children)

A Doctor will take risk factors into consideration

Unfortunately we see that the data doesn't support this assumption. Poor populations are not given the same attention by doctors. Black populations in particular receive worse healthcare in the US after adjusting for many factors like income and family medical history.

[–] sonori 2 points 10 months ago* (last edited 10 months ago)

It’s unfortunately not certain that they will take such measures with their patients even though most try, and indeed ethic discrepancies are one of the things likely to be made worse with machine learning given that there is often little thought or training data given to them, but age of the hospitals machine is not a good proxy for risk factors. It might be statistically corralled, the actual patients risk isn’t. Less at risk people may go to a cheaper hospital, and more at risk people might live in a city which also has a very up to date hospital.

[–] ericjmorey@programming.dev 4 points 10 months ago

It's a decent first screen for pattern recognition for sure, but it is fast which is where I see most of its value. It can process information that people would never get to.

[–] Artyom@lemm.ee 20 points 10 months ago (1 children)

Anyone who's going to copy and paste code that they don't understand is inherently a security vulnerability.

[–] thebardingreen@lemmy.starlightkel.xyz 14 points 10 months ago (1 children)

People are including AI generated code in their projects without fully reading it or understanding how it works.

[–] ericjmorey@programming.dev 13 points 10 months ago

The same ones that were blindly copying and pasting from StackOverflow previously found a more convenient way to make their code "work".

[–] TheFriendlyArtificer 14 points 10 months ago (1 children)

My argument is thus:

LLMs are decent at boilerplate. They're good at rephrasing things so that they're easier to understand. I had a student who struggled for months to wrap her head around how pointers work, two hours with GPT and the ability to ask clarifying questions and now she's rockin'.

I like being able to plop in a chunk of Python and say, "type annotate this for me and none of your sarcasm this time!"

But if you're using an LLM as a problem solver and not as an accelerator, you're going to lack some of the deep understanding of what happens when your code runs.

[–] jherazob 7 points 10 months ago

The thing is that this is NOT what the marketers are selling, they're not selling this as "Buy access to our service so that your products will be higher quality", they're selling this as "this will replace many of your employees". Which it can't, it's very clear by now that it just can't.

[–] jarfil 5 points 10 months ago* (last edited 10 months ago) (1 children)

People tend to deify LLMs, because of the vast amounts of knowledge trained into them, but their answers are more like a single "reasoning iteration".

How many human coders are capable of sitting down, typing a bunch of code at 100 WPM out of the blue, then end up with zero security flaws or errors? About absolutely none, not even if they get updated requirements, and the same holds up for LLMs. Coding is an iterative job, not a "zero shot" one.

Have an LLM iterate several times over the same piece of code ("think" about it), have it explain what it's doing each time ("reason" about it)... then test run it, fix any compiler errors... run a test suite, fix for any non-passing tests... then ask it to take into account a context of best practices and security concerns. Only then the code can be compared to that of a serious human coder.

But that takes running the AI over and over and over with a large context, while AIs are being marketed as "single run, magic bullet"... so we can expect a lot of shit to happen in the near future.

On the bright side, anyone willing to run an LLM a hundred times over every piece of code, like in a CI workflow, in an error seeking mode, could catch flaws that would otherwise take dozens of humans to spot.

[–] ericjmorey@programming.dev 2 points 10 months ago

Excellent points!