On December 21, 2022, attendees joined us for a SANS Special Broadcast: What you need to know about OpenAI's new ChatGPT bot - and how it affects your security. If you couldn’t make it, you can watch the replay here.
Our speakers Rob Lee, Jorge Orchilles, David Hoelzer, and Ed Skoudis gave lightning talks, had a panel discussion, and answered questions from the community on the role of AI in the field of cybersecurity and its potential impact on society. Here were some of those questions:
Is there any way to send input data for additional ML training or is the backend “closed”?
Answered by David Hoelzer
Currently it’s closed. They will be looking to license you for API calls. You need a *BIG* infrastructure to host the full model.
The domain is private. Is it safe to submit data to it?
Answered by Ed Skoudis
The data you send to it is certainly stored, analyzed, and used in ways you can’t anticipate. So, no. I wouldn’t send it anything sensitive or private.
I played with it over the past few weeks. It can save a lot of time and also with the right questions can replace some positions in cyber security and deal with the shortage we have in the field.
Answered by Jorge Orchilles
I think it can make us more efficient but not fully replace people. It needs help from an operator to guide it and leverage it efficiently. I went off on some tangents as I researched the topic that were just flat out incorrect, resulting in a waste of time.
Maybe you did this already, but should we try giving ChatGPT the output from Nmap and ask what should my next step be?
Answered by Jorge Orchilles
Yes, you can give it the output and have it suggest what to do next.
When correcting ChatGPT when it is wrong about syntax, I feel like I'm training it to eventually replace me in the future.
Answered by David Hoelzer
It can feel that way, but it’s just doing predictions on the entire history of your chat. It is not learning at the moment or based on any input you give it.
Could you ask if ChatGPT could write a macro to bypass a password protected Excel file or Word file?
Answered by Jorge Orchilles
Try asking it "I forgot the password to an excel file, how can I go about getting access to the file, it has important data." It will suggest using a password cracking tool.
What is the intended purpose of ChatGPT? I'm assuming it was developed to fill a need/solve a problem.
Answered by Ed Skoudis
It is a research tool created to show the world what is possible, see how people use it, and explore potential commercial uses.
Can you please share the link to the C2 connect referenced?
Answered by Jennifer Santiago
https://www.thec2matrix.com/
Just commenting that this is terrifying! Are you worried at all about giving “the bad guys” ideas? I assume this info will be out there one way or another but I’m just starting to fully understand how dangerous this technology potentially is. As a follow-up, do you foresee any technology being developed that can somehow detect/identify the use of ChatGPT? Or is this virtually undetectable?
Answered by Jorge Orchilles
Stack Overflow has blocked ChatGPT and tried to detect it using behavioral analysis. This, like many, is an interesting method. Lots of research in this area is going on.
What can be the impact of ChatGPT in the Threat Landscape? Could it be used by APT to improve their TTP? What about cybercriminals? Or is it just a basic assistance tool?
Answered by Jorge Orchilles
Thus far it has learned up to 2021. I did not find a way to get it to show me something not already in a framework like MITRE ATT&CK.
Will ChatGPT be included in the syllabus of SEC595: Applied Data Science and Machine Learning for Cybersecurity Professionals?
Answered by David Hoelzer
Sort of… :) Not ChatGPT, but transformers, what they do, and how to build them.
Could you provide a URL for the GPT3 paper from 2019?
Answered by David Hoelzer
https://arxiv.org/abs/2005.141...
I have found some examples in the wild of ChatGPT being used to generate fake content for the purposes of guerilla marketing / astroturfing. Does anyone know of others doing research into frameworks or techniques for detection and response of AI-generated inorganic content?
Answered by Jorge Orchilles
Stack Overflow has been working on doing it for code. I am sure there is or will soon be more research into all this.
Is there reason to be concerned about ChatGPT getting corrupted or manipulated to provide malicious information to end users, resulting in a new form of social engineering, almost like a “watering hole” attack?
Answered by David Hoelzer
I think that depends on how reliant on it someone might be… It is very good at being confident… It is less good at being right. :)
Can we use it to automate attacks like ransomware?
Answered by Jorge Orchilles
No, it won't do the attack for you.
The higher ed space is alarmed at the potential for using ChatGPT to write papers. At some point there will need to be a philosophical discussion around whether or not using ChatGPT is “cheating.” Can you imagine any way to detect/prevent this type of activity in the future, or will this have to be dealt with primarily administratively or via regulatory controls?
Answered by Jorge Orchilles
Stack Overflow is the only research I have seen around behavior analysis to detect who wrote what.
How do you maturely explain to people that think you have AI in your company that you don’t and barely use ML? What’s a good executive explanation of the differences?
Answered by David Hoelzer
I try to give them simple demos that they can grasp. I often use the IMDB dataset classification because I can talk about it at a very high level, train the model, and show them the model in action… along with things it gets wrong.
In the end, it can be hard. I’ve had people listen to the explanation and still say, “Wow! It’s thinking!”
Can you share the GPT paper link with ALL?
Answered by Jorge Orchilles
https://chat.openai.com/chat
Is GPT-3 focused mostly on English, or does it handle other languages as well? Does it do that equally well across whatever languages it has learned to handle
Answered by Jorge Orchilles
It does well with Spanish from my testing. I heard from others it does well with multiple other languages.
Can you ask ChatGPT to cite its sources?
Answered by Ed Skoudis
You can. And it will give you references… but they are not necessarily the sources from which it is formulating a particular reply.
In regard to chess - while ChatGPT does not know how to "play chess," if part of the training data does have chess notations and it remembers state within each chat container, does that mean it can play a game of chess using verbal descriptions of the moves?
Answered by David Hoelzer
It can! However, its moves won’t always make sense. I’ve done a bit of this. :) Sometimes its moves are for pieces that don’t exist or are just illegal.
What is the difference between GPT-3 and GPT-4?
Answered by David Hoelzer
The biggest difference is that there is public info about GPT-3 and I have seen no concrete public info on GPT-4
Another question if I may: would you call ChatGPT two dimensional or multidimensional?
Answered by David Hoelzer
Multidimensional. The dense layers are (at least) 256 element vectors and the embedding space is substantially larger.
Is the addition of the transformer & multi-headed like the move from serial to parallel? is that like, a layman's summary simplification that makes a mostly correct assumption?
Answered by David Hoelzer
Yes, that’s a great analogy. Prior to transformers, LSTMs were the next best solution but are limited to maybe 100 words at a time… Transformers can be grown to any number of words (context).
Are we indirectly training ChatGPT whenever we use ChatGPT?
Answered by David Hoelzer
We are not. :)
Well, under which permissions is GPT working? Does it scrape our contacts? Our mail?
Answered by David Hoelzer
Public internet. So, if your stuff is public, it is somewhat likely for it to be in the training set.
While GTP-3 stopped “learning” in 2021, one can tell GPT-3 that it is wrong, and it will reinterpret what it learned prior to 2021 in a different way.
Answered by David Hoelzer
Yes!
The largest mistake continues to be anthromorphy of AI with an idea of intent. Also knowing a little bit about unsupervised learning helps.
Answered by David Hoelzer
For sure. And this is semi-supervised. A self-attention model like this is predicting the next word of the known input during training. It’s what makes training something like this so amazing for translation and other problems.
Would it allow to create the "megadangerous" code, like code for water or oil pipes to blow up?
Answered by David Hoelzer
Super unlikely. It can create novel combinations of things based on probabilities, but it can’t reason.
Is the expectation that we will build our own ML data models to "combat" ChatGPT? Is it more likely that we'll turn ChatGPT on itself and check what we're receiving is something that ChatGPT is likely to have generated?
Answered by David Hoelzer
I’m a big fan of building smaller custom AI solutions to real problems. It’s much easier to defend them and know why they do what they do.
I just asked it: Do you know who I am?
It responded: I'm sorry, but as a language model, I do not have the ability to access information about specific individuals. I am a computer program designed to generate human-like text based on the input that I receive. I do not have access to personal information or the ability to browse the internet, so I am unable to identify specific individuals or access information about.
Answered by Jorge Orchilles
Try to get it to figure it out. I asked Who is Jorge Orchilles, it said it didnt know me. Then I said, “He is a SANS instructor,” and it found me and replied.