THE THREAD OF DOOM

#12
by jukofyork - opened

Just realised I deleted the old "thread of doom" as it was attached to the earliest alpha version of the control vectors :(

jukofyork pinned discussion

Okay, I was wondering if we crossed some sort of line.

Anyway.. the INCREDIBLY important thing I was saying before the thread disappeared was... I have a feeling it is going to be just like they say. They are going to be liberal with grants. I suspect they will target people who are using the space outside the purpose that was intended... somewhere out there, someone has all their RAW 8k videos of their cats...

Anyway.. the INCREDIBLY important thing I was saying before the thread disappeared was... I have a feeling it is going to be just like they say. They are going to be liberal with grants. I suspect they will target people who are using the space outside the purpose that was intended... somewhere out there, someone has all their RAW 8k videos of their cats...

Yeah, it's a pity it got deleted (I should have checked more carefully what was linked), but it was getting a bit out of hand with all that scrolling so perhaps not such a bad thing.

I'm just gonna keep up the models that people have downloaded the most and get rid of all the "experimental, but likely broken" stuff with 15 downloads as they really weren't serving much of a purpose.

Also, all the old versions of the control vectors were vastly inferior to the final version due to me figuring out how to get them working as I went along, so it's probably better to just keep up the final v3.0 ones to avoid a lot of the confusion.


image.png

image.png

It looks a lot more like I'm just uploading quality models that people like/use now at least... The creative-writer-v0.1-35b and creative-writer-v0.2-35b models will be going as soon as I get the v1.0 version uploaded, and possible Dusk-Miqu-70B if they do set a hard-limit (I still think Dark-Miqu-70B is worth keeping whatever though).


Also if anybody really misses any I have uploaded, then I can in theory recreate them and upload a LoRA created from the delta using extract_lora.py, but I strongly suspect most of the models nobody will even notice they have gone... Of all that I have created I've only ever used Dark-Miqu-70B myself!

:( Damn there was some good info in that thread.

If you've still got Firefox tabs open somewhere, you'll be able to save some of the thread.

Unfortunately, I cleaned my browser tabs up about an hour ago.

And yeah, if people were using it as free cloud storage then it makes sense. I just think they could have gone about it better, rather than having us wake up and see the limit.

I'm curious, did your quota drop after deleting that? I wonder if all the PNG files attached there were "billed" to you.

@jukofyork I think you're good man. If they start enforcing it, you'll get an exemption for sure.

I come across your contributions randomly all over the place, even on github repos like some fine tuning tool lol

I should probably deduplicate my quants. Often, I was making one because I could not find what I was looking for, then it would turn out a few of us just happened to be making them at the same time, Then I started getting requests. So I just decided I would make a bunch. Need a Huggingverse quant global dedupe...

@jukofyork Thank for the advice, but how's the intelligence affected by it? I'd rather not dumb the model down. For me intelligence>speed.

Just to make sure I'm reading that right, is Mixtral 8x22b v0.3 also a true base model?

Yes. The non-instruct one is a base model, but not a great one.

image.png
Here are the top probs for DeepSeek-V3 instruct. 42% for K, that's new record for overconfidence. As you can see, DeepSeek is clearly arenamaxxing, they included fancy markdown(*) to get extra votes.

I've played around a bit more with DS. It has superior knowledge of trivia(I was surprised that it got a quite obscure reference right) compared to Largestral and is much better at solving riddles and following instructions. The writing style however is pure GPTslop, and it is more difficult to break it out of it than Largestral. What's worse is that while Largestral remained relatively unslopped in the other language that I tested, DS just wrote the same shit it wrote in English, with same sentence structure and all, which sounded very awkward and unnatural, as if the text was pulled through google translate.

Same story as with base, for comparison.

The sun was a merciless beast, its fiery breath scorching the earth and turning the once-thriving town into a desolate wasteland. The cultist, named Elias, staggered into the ghost town, his yellow robes tattered and filthy, clinging to his sweat-soaked skin like a second layer of torment. His vision blurred, the edges of his sight darkening with every step. The arrow lodged in his shoulder throbbed with a pain that seemed to echo the rhythm of his erratic heartbeat. 

The town was eerily silent, save for the occasional gust of wind that stirred the sand and rattled the broken shutters of the abandoned buildings. Elias's parched throat screamed for water, but he knew better than to hope for such luxuries. His mind was a chaotic storm of delirium and fear. The cult had branded him a traitor, and the memory of their twisted rituals and the malevolent whispers of their leader haunted him. He clutched the small, mysterious mirror tightly in his hand, its surface cool against his feverish skin. It was his only possession, stolen in a moment of desperation from the cult's forbidden vault. He didn't fully understand its power, but he knew it was more than just a trinket.

Elias stumbled into the shadow of a crumbling building, leaning heavily against the wall as he tried to catch his breath. His thoughts were a jumbled mess, fragments of memories and hallucinations intertwining. He could hear the voice of the cult leader, a rasping, venomous whisper that slithered into his mind. 

"You think you can escape us, Elias? You are ours. Always have been, always will be."

"Shut up," Elias muttered, squeezing his eyes shut in a futile attempt to block out the voice. "I'm not going back. I won't."

The mirror in his hand seemed to pulse faintly, a subtle vibration that made his skin crawl. He glanced down at it, his reflection distorted and ghostly in the cracked surface. For a moment, he thought he saw something move within the glass, a shadowy figure that twisted and writhed. He quickly looked away, his heart pounding in his chest.

"You're losing it," he whispered to himself, his voice trembling. "You're losing your mind."

He forced himself to move, pushing away from the wall and staggering down the deserted street. The town seemed to stretch on endlessly, each building a mirror of the last, decaying and lifeless. His strength was fading fast, and he knew he needed to find shelter before the sun fully claimed him.

As he approached what appeared to be an old inn, the door creaked open on its own accord, the sound grating against the silence. Elias hesitated, his instincts screaming at him to turn and run. But where would he go? He was in no condition to travel any further. With a resigned sigh, he stepped inside, the cool darkness enveloping him like a shroud.

The interior was dimly lit by the sunlight filtering through the cracks in the boarded-up windows. The air was thick with dust and the scent of decay. Elias collapsed onto a wooden bench, his body finally giving in to the exhaustion. He set the mirror down beside him, its presence both comforting and unnerving.

As he leaned back, his mind drifted to the events that had led him here. The escape from the cult, the frantic chase through the desert, the arrow that had found its mark in his shoulder. He had thought he was free, but now, alone in this forsaken town, he couldn't shake the feeling that he was still a prisoner.

The mirror shimmered faintly, catching the dim light in a way that made it seem almost alive. Elias reached out to touch it, his fingers trembling. As soon as his skin made contact, a sharp, icy pain shot through his arm, and the world around him seemed to dissolve into darkness.

When he opened his eyes, he was no longer in the inn. He stood in a vast, desolate landscape, the sky a swirling mass of black clouds and crimson light. Before him stood a figure, cloaked in shadows, its eyes gleaming with an otherworldly intensity.

"Welcome, Elias," the figure said, its voice a deep, resonant echo that seemed to come from everywhere and nowhere at once. "You have brought the mirror. Good. There is much we need to discuss."

Elias felt a surge of fear and confusion. "Who are you? What is this place?"

The figure stepped closer, the shadows around it shifting and writhing like living things. "I am the one who has been waiting for you. And this is just the beginning."

Before Elias could respond, the world around him began to dissolve once more, and he found himself back in the inn, gasping for breath. The mirror lay before him, its surface now still and unremarkable.

He stared at it, his mind racing. What had just happened? Was it a hallucination, a product of his fevered mind? Or was it real? He didn't know, but one thing was certain: the mirror was far more dangerous than he had ever imagined.

Markdown asterisks as top probabilities 🤦‍♂️
The cultist, named Elo*ra....
I tried RP and creativity with DS FP8. It was rough in my experience. The amount of prompt and sampler voodoo to squeeze anything out was ridiculous. I also kept coming across really egregious grammar issues even with pretty vanilla samplers. I had zero control and visibility of the backend configuration though, so I don't know if I can blame the model.

https://huggingface.co/MiniMaxAI/MiniMax-Text-01
https://filecdn.minimax.chat/_Arxiv_MiniMax_01_Report.pdf
New 46A456B MoE model came out, claims to have 4M(!) context.

image.png
They made an outrageous claim by ranking Sonnet the lowest on creative writing, but one look at their evaluators makes it very clear why:

The lyrics are effective due to their vivid imagery, emotional depth, and narrative structure. They create a mysterious and atmospheric setting with phrases like "moonbeams" and "ancient walls," while also conveying the emotional journey of the traveler. The repetition in the chorus reinforces the central theme, making the song memorable. The poetic language and space for interpretation add layers of intrigue and emotional resonance, making the song both engaging and thought-provoking.

The story demonstrates strong world-building and an engaging narrative. The concept of Aetheria is imaginative, with vivid descriptions of floating mountains, crystal rivers, and mystical creatures that evoke a sense of wonder. The protagonist, Elara, is well-developed, with a clear arc from curiosity to heroism, which makes her relatable and inspiring. The pacing is effective, with a balanced mix of adventure, emotional growth, and moments of tension. The supporting characters, like Solara and Pippin, add depth to the story and provide much-needed contrast to Elara’s character, contributing to both the plot and the tone. However,
while the overall structure is solid and the themes of courage and self-discovery are timeless, some aspects of the plot feel familiar, following traditional fantasy tropes. The resolution is uplifting but might benefit from more complexity or surprise to elevate it further. Overall, the story shows strong creative potential, with an imaginative world, a compelling heroine, and an uplifting message.

This poem is powerful for its rich imagery and balance between change and continuity. It uses metaphors like "dance of time" and "tapestry spun" to evoke deep emotional resonance. The poem reflects on embracing change while cherishing memories, making it relatable and philosophical. Its rhythmic flow and universal themes of acceptance and personal growth create a harmonious and reflective reading experience.

Their human evaluators decided to cheat the system and offloaded all of their work to GPT4, making that leaderboard essentially GPT4 preference benchmark. I don't have high expectations for this model.

Time to run RULER on that sucker...

The protagonist,

<*"Ela**ra"*>
makes me shiver, yet feel warm...

I guess I should have looked before I ran my mouth.. they did run RULER 🙈

@jukofyork Thank for the advice, but how's the intelligence affected by it? I'd rather not dumb the model down. For me intelligence>speed.

Yeah, it will make it a bit dumber and probably best to keep at Q8 if not bothered about the speed.

Sorry for not updating on the creative writer models but been busy and not happy enough with the outcomes to release anything yet (masking the gradients and labels of \n and \n\n tokens just caused other weird tokens like space-newline to start appearing).

I think fundamentally I've got two problems:

  1. My data is a steaming pile of shit, full of tables of contents, author's notes and so on.
  2. The sample size is too small by about an order of magnitude.

I hope to fix (2) by starting with 15k books instead of 1k books like before.

I'm fixing (1) now by:

  1. Aggressively filtering the 15k ebooks for obvious crap (bad PDF scans somebody has converted to EPUB, EPUBs with every line bolded and/or italicised, EPUBs which have lost the paragraph information and all on a couple of huge lines, and so on).
  2. Using another LLM (currently gpt-4o-mini called from a Bash script using curl, then gpt-4o to redo all the failures) to prune away "front mater" and "end mater".
  3. Then call the LLM again to classify each as fiction or nonfiction (I only want to train on fiction for now).
  4. Finally I dedupe by stripping all but alphabetical characters, and then hashing every line in every book to find overlap (is: remove not just outright dupes, but books/stories within anthologies, etc).

I'm about 1/2 through now and down to 11k books remaining (I decided to do the "front mater" and "end mater" removal twice: once on the HTML that comes from EPUB --> HTML conversation, then again on the Markdown that comes from HTML conversation) - to be extra sure not to repeat the "disclaimer" problem again!


The mistake I made with the sample size is trying to compare the control vector sample size that I used:

  • Each set of control vectors uses 3 x hidden_dim samples of sequences but we ultimately only look at the single next token generated.
  • I've been counting a sequence of 8192 tokens as 8192 samples and really it should counted as a single sample (yes, it's larger than the control vectors sequence, but the control vectors I used the trick of subtraction from a baseline and also had much better data to use).

So for command-r we have hidden_dim of 8192, so:

8192 samples of sequences of length 8192 = 64M tokens

Each LoRA rank uses 2 vectors instead of 1 for the control vectors:

64M * 2 = 128M

Each control vector was made from two samples (subtracted from a baseline but we'll ignore that for now):

128M * 2 = 256M

So my 200M tokens I got from 1k books isn't really enough to train even a rank-1 Multiplicative-LoRA with a similar number of sequences as was used by the control vectors!!!

This is likely why I'm finding it so hard to get anything useful and the comparatively huge rank-16 or even rank-64 LoRAs are just learning weird shit to do with the formatting of my (terrible) dataset!


I hope to have around 2B samples after the above process, and may try dropping the sequence length to 4096 or even 2048.

This is just about the limit of what I can train using 6 x A6000 in a sensible amount of time.

I obviously won't be able to release the dataset, but I will tidy up and then open the GitHub repo with the Bash scripts and prompt templates I've worked out.

I'm pretty much done with trying to use Python for any large scale data handling now (and basically anything else unless I absolutely have to use for stuff like Pytorch).

Sign up or log in to comment