AI & LLMs
LLM detecting logical fallacy, LLM Speed test, LLM Summarisation and prompt engineering

Recently I was experimenting with relatively new side of UI - Large Language Models (LLM) and did several tests, checking how different LLMs detect logical fallacies, their speed and summarisation capabilities.
On the photo here we have llama in the server room. Llama is associated with LLMs because
- LLM sounds very close Llama
- There is a set of prominent models from Meta - Llama 2, Llama 3, Llama 3.1 now, etc
- There is a nice tool for hosting LLMs - Ollama
1 Detecting Logical Fallacies with LLMs
While working on my other hobby - automatic system for informal logic errors detection, I checked how various versions of different LLMs are recognising logical fallacies in texts. They are very capable but LLM suppliers training datasets definitely slightly differ.
I compared Llama 3 8b, Phi3 3.8b & 14b, Mistral 7b, Gemma 7b and qwen 7b, 14b and 32b - all of them with with various quantizations, and checked if they detect correctly logical fallacies in sample text, The winner was phi3:14b-medium-4k-instruct-q6_K, see here for more details: Logical Fallacy Detection with LLMs. On the second place is phi3:3.8b-mini-4k-instruct-q8_0.
Recently I tested how new LLM models Gemma2 from Google, Qwen2 from Alibaba and Mistral Nemo from MistralAI and NVidia perform on similar tasks. They do well, but in my tests phi3 was still better. Please see the results: Gemma2 vs Qwen2 vs Mistral Nemo vs…;
2 Comparing speed of different LLMs
The next test was - how fast are different LLMs? I compared LLM speed and memory consumption.
Really smart models are memory-hungry and if they don’t fit into GPU VRAM - models become slow as. On GPU model perform one task in 2-9 seconds, on CPU it takes them on average 20 times more. Please see in the linked report for the details.
The conclusion here is: If you need to use self-hosted LLM - prepare GPU with at least 16GB VRAM, or better 24GB, or better 48GB, so llama3 70b would fit, or. This activity is not extremely budget-friendly.
3 Comparing LLM Summarising Abilities
On standard demos we can see how LLMs perform summarizations of some long texts. I Compared Summarising Abilities of Different LLMs.
First we need to know that LLM has what is called - a Context Window - that’s how much text it can remember to make summarisation from. Basic context window of 4k is a very small one, still manageable in some scenarios. But not all the models can handle larger contexts either, and extra context to handle consumes more memory. That’s for another test.
In my tests the best was llama3:8b-instruct-fp16, then llama3:70b-instruct-q6_K and also very good were the llama3:8b-instruct-q4_0 & llama3:8b-instruct-q8_0. Phi3 disappointed unfotunately.
4 How to write effective prompts for LLMs
Another experiment - how to write prompts for LLMs, different LLMs react to various prompts not the same way, but still common approach is the same. See how to writing effective prompts for LLMs. In short the concentrated info:
- Crafting Effective Prompts: Be clear and concise, use specific examples, vary prompts, test and refine, and use feedback.
- Explicit Instructions: Provide detailed instructions to limit responses and produce better results.
- Stylization: Explain complex concepts in a simple way, like a children’s educational network show.
- Prompting using Zero- and Few-Shot Learning:
- Zero-shot prompting: No examples are provided.
- Few-shot prompting: Specific examples are provided to improve accuracy.
- Role-Based Prompts: Create prompts based on the role or perspective of the person or entity being addressed.
- Chain of Thought Technique: Provide a series of prompts or questions to guide thinking and generate more coherent responses.
- Self-Consistency: Select the most frequent answer from multiple generations to improve accuracy.
- Retrieval-Augmented Generation: Include information retrieved from an external database to incorporate facts into language model applications.
- Program-Aided Language Models: Leverage language models’ ability to generate code to solve calculation tasks.
- Limiting Extraneous Tokens: Use roles, rules, and restrictions to generate responses without extraneous tokens.
- Reduce Hallucinations: Provide clear context and information to avoid hallucination in language models.
5 Install and configure Ollama
Ollama is a program/service allowing self-hosting of different LLMs. It has versions available for most operating systems and pretty easy to install.
But if you download not just one but several of the models, even thouh one model can take 5-15-50GB of SSD/HDD, models together can occupy a lot.
So the idea to move models from user home folder on Windows to some other drive like D:\ finds you easily, And then to save you some googling and going through the scarce docs here the - How to Move Ollama Models to Different Drive or Folder.