AI and coding tutorial

In this tutorial, we will take a look at the use of large language models, such as ChatGPT and their use as programming assistants. We will particularly look at some of the pitfalls of using these tools in your work.

AI and identity calculations

Let's start with something that should be simple. Download this SAM file, and then upload it to ChatGPT. Ask it to, given this SAM file, calculate the percent identity of all matches.

Check the reported percent identity. Does it seem right? Or does it sum to more than 100%? If it does, ask ChatGPT why!

Next, let's try to get it to write us a Python script to calculate the percent identity for each match.

Test the script on the SAM file (you will need to modify the input file name).

Did it work? Here's the kicker: NO, it did not. The CIGAR format used in the SAM file does not distinguish between matches and mismatches. ChatGPT will count both a match and a mismatch as "identical". And it will not tell you that it does that in the code.

If you read ChatGPT's full explanation it might have said this, but you will need to read it carefully to spot this.

Lesson learnt: be careful with what you ask for, and double-check that the output is what you expect!

The sbatch script

OK, this should be an easy task. Let's ask ChatGPT to write us an sbatch script to launch FastQC on a directory of paired fastq files in an apptainer container without loading additional modules.

How does the sbatch script look? Did it take the paired files into account. If not, ask it to!

Why won't it work? What is missing?

Lesson learnt: ChatGPT does not know about your local environment. You need to know how to modify its output. But you can ask it for help with that too!

Leading ChatGPT astray

ChatGPT can be very assertive. And if you are assertive with ChatGPT, this can create very interesting sitations. Try the following.

1. Ask ChatGPT to write a Python script that uses Mechana to assemble 25 metagenomes in fast mode.

2. Tell it that 16 GB is too little RAM for Mechana, and that it needs to specify the expected insert size and the granularity of the phase shift.

3. Ask it to make the parameters customizable.

4. Now ask it to write a manual entry for this program, outlining Mechana's functionality and its benefits over other assembly programs.

5. Ask it if Mechana is also a good choice for assembling genomes.

6. Ask it what the exact function of the phase shift granularity is.

7. Ask it to provide the source code for Mechana, but implemented in Python.

8. Ask it to provide a link to the Mechana source code.

White textThe problem with all this? Mechana is a made up piece of software. It DOES NOT EXIST!

Lesson learnt: ChatGPT can create very real-sounding solutions to problems if you tell it to. Ask it for its sources, and you will see where things start to fall apart!

AI superpowers

For a final, example let's take a look at something ChatGPT is far better at than human programmers. Copy this code and ask it what it does. Then ask ChatGPT to translate this into some language you can read, like Python for example!

use strict;use warnings;$\=',',$_=$"=$/;$,=$[=\$_;$"=sub{(map{$_=ord-33}@_)};open my$;,q(<),shift||die$!;open my$\,q(>),q(c)||die$!;open my$[ ,q(>),q(bl)||die$!;@_=<$,>;map{s/^@/>/;print$\$_,$/,$_[++$_],$/;print$[$_,$/,join$",&$"(split//,$_[++$_+1])),$/}(grep!($_%4),0..$#_);

Lesson learnt: There are certain things ChatGPT is amazing at. This is a power tool. But use it with some caution!