X

Spinning text with python

As you may know I started programming because I work in SEO.

I’ve been doing SEO for over 12 years and back when I started I was using software called scrapebox. It was really effective at discovering blogs based on keywords you provided and then posting comments on people’s websites.

With this and the discovery of blackhatworld I did a lot of damage.

Fast forward to now and I’m building a piece of tech that requires text to be spun & then processed to rank suitability in context to it’s generation. The idea being to be able to create a somewhat intelligent content creation engine.

A common way in which to define spun variations is to use { & | separators.

So I found myself needing to turn

"Hello {Dave|Tom} it's a {lovely|beautiful|rainy} day today, perfect for going on a {long walk|bike ride}"

into variations such as

"Hello Dave it's a lovely day today, perfect for going on a bike ride"
"Hello Tom it's a beautiful day today, perfect for going on a bike ride"
"Hello Dave it's a rainy day today, perfect for going on a long walk"
"Hello Dave it's a beautiful day today, perfect for going on a long walk"
etc

You can tell the some of these are more relevant than others so we’ll address how we handle that in a future post.

Here is how I did it

Step 1) Split the text string  into chunks using a regular expression

comment ="Hello {Dave|Tom} it's a {lovely|beautiful|rainy} day today, perfect for going on a {long walk|bike ride}"
chunk = re.split('(\{[^\}]+\}|[^\{\}]*)',comment)

print(chunk)

['', 'Hello ', '', '{Dave|Tom}', '', " it's a ", '', '{lovely|beautiful|rainy}', '', ' day today, perfect for going on a ', '', '{long walk|bike ride}', '']

The opening and closing bracket in the regex statement means that the split characters are maintained this is important as we are going to use these as identifiers in the next part.

The next step is to pass over each value in this list and where we want to spin the values with the spin statement create the multiple options. Ultimately returning with a list of lists.

def options(s):
    # If the chunk is not empty or the chunk start with the split parameter
    # return the split by the variable | of the paramter
    if len(s) > 0 and s[0] == '{':
        return [opt for opt in s[1:-1].split('|')]
    return [s]

comment ="Hello {Dave|Tom} it's a {lovely|beautiful|rainy} day today, perfect for going on a {long walk|bike ride}"
chunk = re.split('(\{[^\}]+\}|[^\{\}]*)',comment)

# Return a list of lists of variations that can be combined
opt_lists = [options(frag) for frag in chunk]

print(opt_lists)
[[''], ['Hello '], [''], ['Dave', 'Tom'], [''], [" it's a "], [''], ['lovely', 'beautiful', 'rainy'], [''], [' day today, perfect for going on a '], [''], ['long walk', 'bike ride'], ['']]

No we have our lists of lists to spin the options we need to loop over and create variations. Micheal on this answer used a great function to do this which you can read about here. My original code was much clunkier but once I saw this answer I admittedly adapted my approach as his generator is much cleaner.

There you have it your very own text spinner. I’ll post updates on how I then score

import re
import itertools

def options(s):
    # If the chunk is not empty or the chunk start with the split parameter
    # return the split by the variable | of the paramter
    if len(s) > 0 and s[0] == '{':
        return [opt for opt in s[1:-1].split('|')]
    return [s] # return empty in list to keep a list of lists

comment ="Hello {Dave|Tom} it's a {lovely|beautiful|rainy} day today, perfect for going on a {long walk|bike ride}"
chunk = re.split('(\{[^\}]+\}|[^\{\}]*)',comment)

# Return a list of lists of variations that can be combined
opt_lists = [options(frag) for frag in chunk]

for spec in itertools.product(*opt_lists):
    print((''.join(spec)))
Will Cecil: Digital Marketer, Python Tinkerer & Tech Enthusiast. Follow me: Website / Twitter / Github
Related Post