Some Limitations of Gemini
For several days now I have been playing with Gemini, Google Takeout files and Flick Export files, first to re.marry json data to relevant exif fields, and then to create a folder structure library by year, month and day. In the process I have had to itterate, and itterate, and think lateraly in order to achieve what I wanted to achieve.
Verbose
One of the biggest frustrations I have is that when I ask a follow up question for more specific detail of something it re-calculates everything. If you ask “Is this output expected” and it says “yes” it then repeats everything from the previous post into the new post. In so doing it gives us a lot of reading, but it also makes it hard to keep track of what it’s doing.
The work around would be to say “I want a one phrase answer” but, in my eyes Gemini should tell from context whether it needs to give a long or short answer. I could use a gem to alter the behaviour. I think this logic could be in-built. If you’re paying for tokens this is wasteful.
Collaborating on text
With Google Docs, Word, Pages and in coding, sometimes it would be nice to get the output text, and be able to edit it within Gemini. If you’re working on a motivation letter or a command prompt it would be nice to edit the LLM’s output, and then press enter, and have the changes taken on board directly, rather than re-evaluating the entire document.
I use the motivation letter as an example because we know that AI is used to mark us as relevant or irrelevant, so AI, within this context is not as lazy. I wouldn’t use it for blog posts because the writing process has value, to clarify and elaborate on ideas.
Blog Outline via LLM
Two or three blog posts ago, when I went through iteration after iteration to achieve what I wanted, because of the verbose nature of LLMs, as well as the LLM getting stuck in a logic loop, it takes more and more time to skim through to find the key steps. An LLM can do this in seconds.
Of course, it tends to want to explain. I asked for a phrase per point, and it added half a paragraph per point. I could feel that it wanted to write the post, rather than the skeleton outline.
Prone to Stereotyping
It often comes out with the phrase “As a person who runs or cycles between Lausanne and Geneva you …” and then it continues with the answer. In plenty of contexts this has no value, and when I told it “I use a Suunto” it ignored that update and kept repeating the same two or three stereotypes.
It’s a shame you can’t present an LLM with a link to your blog, and your CV, and several motivation letters, and have it respond according to this information. Imagine if it could learn to give answers within the context of your professional background.
The Dangers of LLMs
LLMs are prone to flattery and generalisations. They are designed to give answers that feed your ego. That’s something I found not to be the case with Euria. With Euria I asked two or three questions about news and current affairs and it, to my recollection, said, “You’re wrong” and then explained why.
Experimenting with MyAI and Euria
Initially I tried to get Euria and MyAI to help me with this challenge but the responses were getting errors, and when I gave the error code one of the Swiss options simply put the mistake on me, rather than understanding what was causing the error. Bash was mis-reading the command due to a lack of ‘’. This segways into my next point.
Thinking in Straight Lines and Laterally
Location Data Loop
When we were getting the E6 error when exporting geotags from Flickr to Exiftool Gemini understood the problem but kept going around in circles until I thought laterally and told it to do something different. Eventually we got the location data to be migrated properly.
Missing Video Data Exif Loop
The second negative feedback loop came when trying to marry json data with video data. The LLM couldn’t find video data because Flickr is designed to think in terms of photos. It knew about the thumbnails but it failed to put two and two together.
After some trial and error, and feeling like abandoning the task I decided to get Gemini to check if some video names matched to photo names. When they did Gemini was able to create a prompt to marry the data. The solution is simple, but it requires being able to think laterally.
And Finally - Why Gemini?
Gemini will soon be merged with Siri. Now that I have encountered some of the limitations of Gemini I can use other LLMs and see if I encounter the same issues, but I can also write more elegant prompts to avoid having the same pitfalls.