Coloring texts in html and React

It’s easy to add markup to the text with your hands. You can mark up the text right here on Habré, and then copy it to the site. You can do a search with a replacement in Notepad ++ or in Atom.

If it is 1 text. If there are a lot of texts, I want to have a tool for extracting text fragments with html tags or generating source code for React. On Python, this is not difficult (a few lines of code per color).



If you know Python and regular expressions, follow the link.

There are examples and source codes. Under katom a detailed description.

Text markup for example coloring Javascript source code


Consider the function:

def jsToHtml(s):

At the input, the source text returns html.

We set the variables that define the attributes of the blocks. In the example, styles are used for clarity, in the future it is more logical to replace them with classes.

comm = 'style="color: red;"' #   
blue = 'style="color: blue;"' # 
...

Markup.

The first thing to do is to escape the characters '&', '<', '>'

s = s.replace('&', '&amp;').replace('<', '&lt;').replace('>', '&gt;')

'&' is escaped to display the letter combinations '& lt;', '& gt;' and other '& ...;', The

characters '<' and '>' are escaped so as not to conflict with tags.

You can screen a lot of things, but, in my opinion, in utf-8 this is enough.

Markup Algorithm:

  • We take a re-template and look for all the text fragments that satisfy it.
  • We cut out each fragment, add markup to it and save the marked fragment in an array (at the same time, save the original text: it will come in handy).
  • In its place, insert the stub with the number.
  • And so for every color.
  • When everything is painted, replace the stubs with colored fragments from the array.

The stub should be unique, but in our text there is not a single character '<' and not a single '>'.

Making a stub:

f'<{i}>'

where i is the stub number. There is definitely no such thing in the text.

What to do if there is already a stub inside the fragment.

For example, in the source text there were lines:

`  :   /*     */ `
/* , `     ` */

Options 2:

  1. Ignore. In this case, the string will have a double color.
  2. Find nested stubs and replace them with the original (unmarked) text.

We make a function that implements all this (10 lines of code, 10 lines of comment):

def oneRe(reStr, s, attr, ls, multiColor=False):
    '''
        s  ,          ls,
           (<0>, <1> ...<1528>...).
          .
    
    reStr - re
    s - 
    attr -  style/class/etc
    ls -   
    multiColor=False,     
    '''
    for block in set(re.findall(reStr, s)):
        toArr = block
        if not multiColor: #  multiColor==False,   
            for prev in set(re.findall(r'<[\d]+>', block)): #  : <0> ... <21>
                iPrev = int(prev[1:-1], 10)                #     
                toArr = toArr.replace(prev, ls[iPrev][1])  #   '<0> qwe <21>'  (<0>,<21>)  . 
        ls.append([f'<span {attr}>{toArr}</span>', toArr]) #     ls 2 :     
        s = s.replace(block, f'<{len(ls)-1}>')
    return s

This is a demo: regular expression markup may be incorrect, escaped characters (\ ', \ ") are not processed.

For reference: expression:

s = s.replace(/A + B/g, 'A - B');

Notepad ++ and Atom editors have different colors.

Now that there is oneRe , coloring fragments is done simply. An example coloring the lines in apostrophes and quotation marks:

s = oneRe(r"'[\s\S]*?'", s, green, ls, multiColor=False) #    ''   ls
s = oneRe(r'"[\s\S]*?"', s, green, ls, multiColor=False) #    ""   ls

An example of more complex coloring. Need js to color expressions in multi-lines
`    ${A + B}  ${A - B}`


for mStr in set(re.findall(r'`[\s\S]+?`', s)): #    
    newFstr = mStr
    for val in set(re.findall(r"\$\{[\s\S]+?\}", mStr)):
        ls.append([f'<span {darkRed}>{val}</span>', val])
        newFstr = newFstr.replace(val, f'<{i}>')
        i += 1
    s = s.replace(mStr, newFstr)
    
s = oneRe(r'`[\s\S]+?`', s, green, ls, multiColor=True) #    ``   ls

first we find the multi-

strings : re.findall (r`` [\ s \ S] +? '', s) returns a list of blocks of text between backquote characters.

Each block contains either spaces or non-spaces ([\ s \ S] ie anything).

Block length 1 or more (+).

Without greed ("?" Means that there is no symbol "` "inside the block).

Copy the found block ( mStr variable ) to the newFstr variable .
We find in the block subblocks with the expressions $ {...}.

re.findall (r "\ $ \ {[\ s \ S] *? \}", mStr) returns a list of such sub-blocks. We save the markup in the ls array and replace the sub-block with the stub in the newFstr variable .
When the sub-blocks run out, replace in the original string soriginal block value to new.

Set is not superfluous. If findall returns several identical blocks, when processing the first block in the source code, all identical blocks are replaced with stubs at once. When processing the second same block, it will no longer be in the source text. Set removes duplication.

JsToHtml.py file
# -*- coding: utf-8 -*- 
'''
AON 2020

'''

import re

# *** *** ***

def oneRe(reStr, s, attr, ls, multiColor=False):
    '''
        s  ,          ls,
           (<0>, <1> ...<1528>...).
          .
    
    reStr - re
    s - 
    attr -  style/class/etc
    ls -   
    multiColor=False,     
    '''
    i = len(ls) #   
    for block in set(re.findall(reStr, s)):
        toArr = block
        if not multiColor: #  multiColor==False,   
            for prev in set(re.findall(r'<[\d]+>', block)): #  : <0> ... <21>
                iPrev = int(prev[1:-1], 10)                 #     
                toArr = toArr.replace(prev, ls[iPrev][1])   #   '<0> qwe <21>'  (<0>,<21>)  . 
        ls.append([f'<span {attr}>{toArr}</span>', toArr])  #     ls 2 :     
        s = s.replace(block, f'<{i}>')              #       
        i += 1
    return s

# *** *** ***

def operColor(s, ls, color):
    '''
     .
      ,       ,  
    '''
    i = len(ls)
    for c in ['&lt;=', '&gt;=', '=&lt;', '=&gt;', '&lt;', '&gt;', '&amp;&amp;', '&amp;',
              '===', '!==', '==', '!=', '+=', '-=', '++', '--', '||']:
        ls.append([f'<span {color}>{c}</span>',0])
        s = s.replace(c, f'<{i}>')
        i += 1
    for c in '!|=+-?:,.[](){}%*/':
        ls.append([f'<span {color}>{c}</span>',0])
        s = s.replace(c, f'<{i}>')
        i += 1
    return s

# *** *** ***

def jsToHtml(s):
    '''
      .
     ,        <span>.
    '''

    black = '''style="font-family: 'Courier New', monospace;
        background: #fff; 
        color: black;
        font-weight: bold;
        border: 1px solid #ddd;
        padding: 5px;
        text-align: left;
        white-space: pre;"'''
    comm = 'style="color: red;"'
    green = 'style="color: green; font-style: italic;"'
    blue = 'style="color: blue;"'
    red2 = 'style="color: #840;"'

    s = s.replace('&', '&amp;').replace('<', &'&lt;').replace('>', '&gt;')   #   '&', '<', '>'

    ls = []

    i = 0
    for mStr in set(re.findall(r'`[\s\S]+?`', s)): #    
        newFstr = mStr
        for val in set(re.findall(r"\$\{[\s\S]+?\}", mStr)):
            ls.append([f'<span {darkRed}>{val}</span>', val])
            newFstr =newFstr.replace(val, f'<{i}>')
            i += 1
        s = s.replace(mStr, newFstr)
        
    s = oneRe(r'`[\s\S]+?`', s, green, ls, multiColor=True) #    ``   ls
    s = oneRe(r"'[\s\S]*?'", s, green, ls, multiColor=False) #    ''   ls
    s = oneRe(r'"[\s\S]*?"', s, green, ls, multiColor=False) #    ""   ls
    s = oneRe(r'/[\s\S].*?/g\b', s, green, ls, multiColor=False) #    re-   ls
    s = oneRe(r'/[\s\S].*?/\.', s, green, ls, multiColor=False) #    re-   ls
    s = oneRe(r'/\*[\s\S]+?\*/', s, comm, ls, multiColor=False) #    /*  */   ls (  - )
    s = oneRe(r'//[\s\S]*?\n', s, comm, ls, multiColor=False) #    //    ls (  - )

    i = len(ls)

    #    
    for c in ['new', 'JSON', 'Promise', 'then', 'catch', 'let', 'const', 'var', 'true', 'false', 'class', 'from', 'import', 'set', 'list', 'for', 'in', 'if', 'else', 'return', 'null']:
        ls.append([f'<span {blue}>{c}</span>',0])
        s = re.sub (r'\b%s\b' % c, f'<{i}>', s)
        i += 1

    #     
    for c in ['window', 'doc', 'cmd', 'init','init2', 'recalc', 'hide', 'readOnly', 'validate']:
        ls.append([f'<span {darkRed}>{c}</span>',0])
        s = re.sub (r'\b%s\b' % c, f'<{i}>', s)
        i += 1

    s = operColor(s, ls, darkBlue) #    

    for j in range(len(ls), 0, -1):  #  , , ,    
        s = s.replace(f'<{j-1}>', ls[j-1][0])

    return f'<div {black}>{s}</div>'

# *** *** ***


Html in React


Convert html to React source code at htmltoreact.com . There is also a link to GitHub.

This did not suit me: firstly, it does not form exactly what I need, and secondly, how I will drag this miracle to my server.

I wrote my own.

Install the lxml library (pip install lxml or pip3 install lxml).

We import:

from xml.dom.minidom import parseString
from lxml import html, etree

Convert html text to xhtml text. It's almost the same thing, but all tags are closed.

doc = html.fromstring(htmlText)
ht = etree.tostring(doc, encoding='utf-8').decode()

The resulting xhtml parsim to the tree house using the mini-house.

dom = parseString(ht)

We make a function that recursively jumps over the nodes and generates the result in the form of the React source code.

After calling parseString, the dom tree is a dad node that has children nodes that have children, etc.

Each node is a dictionary containing its description:

  • nodeName - node name, string
  • childNodes - children nodes, list
  • attributes- attributes, dictionary
  • A node called #text has nodeValue (string)

Example:

<div class="A" style="color: red;">Red of course,<br> </div>

After the transformations we get:
{ 'nodeName':'div',
  'attributes': {'style': 'color: red;', 'class': 'A'},
  'childNodes': [
    {'nodeName':'#text', 'nodeValue': 'Red of course,'},
    {'nodeName':'br'},
    {'nodeName':'#text', 'nodeValue': ''},
  ],
}

Converting dom to a string is easy (there is a pprint), when generating the React code, I replaced the class with className and redid the style attribute.

In text nodes, '{', '}', '<', '>' are escaped.

HtmlToReact.py file
# -*- coding: utf-8 -*- 
# -*- coding: utf-8 -*- 

from xml.dom.minidom import parseString
from lxml import html, etree

# *** *** ***

_react = ''

def htmlToReact(buf):
    '''
    buf - html-
     ReactJS- 
    '''
    global _react
    _react = ''

    try:
        r = re.search('<[\\s\\S]+>', buf)
        if r:
            doc = html.fromstring(r.group(0))
            ht = etree.tostring(doc, encoding='utf-8').decode()
            xHtmlToReact(parseString(ht).childNodes[0], '')
            return _react
        else:
            return '<empty/>'
    except Exception as ex:
        s = f'htmlToReact: \n{ex}'
        print(s)
        return s

# *** *** ***

def sU(a, c):
    '''
    xlink:show   ->  xlinkShow
    font-weight  ->  fontWeight
    '''
    l, _, r = a.partition(c)
    return ( (l + r[0].upper() + r[1:]) if r else a).strip()

# *** *** ***

def xHtmlToReact(n, shift='\n    '):
    '''
       -
      ,        upperCase
       global  _react
    '''
    global _react

    if n.nodeName.lower() in ['head', 'script']:
        return
    
    _react += shift + '<' + n.nodeName.lower()
    if n.attributes:
        for k, v in n.attributes.items():
            if k == 'style':
                style = ''
                for s in v.split(';'):
                    if s.strip():
                        l, _, r = s.partition(':')
                        style += f'''{sU(l, '-')}: "{r.strip()}", '''
                if style:
                    _react += ' style={{' + style + '}}'
            elif k == 'class':
                _react += f' className="{v}"'
            else:
                kk = k.replace('xlink:href', 'href') # deprcated
                _react += f''' {sU( sU(kk, ':'), '-' )}="{v}"'''
        
    _react += '>'
    if n.childNodes:
        for child in n.childNodes:
            if  child.nodeName == '#text':
                tx = child.nodeValue
                for x in ['{', '}', '<', '>']:
                    tx = tx.replace(x, '{"' + x + '"_{_')
                tx = tx.replace('_{_', '}')
                if tx[-1] == ' ':
                    tx = tx[:-1] + '\xa0'
                _react += tx.replace('\n', '<br/>')
            else:
                xHtmlToReact(child)
                
    _react += f'{shift}</{n.nodeName.lower()}>'

# *** *** ***



Examples and source code here.

PS On Habré the coloring of the Python code (and maybe others) is not ideal. If the python string has the string & amp; , it displays as & . In my codes I tweaked to make it look right. If Habr corrects a mistake, my texts will cross each other.

All Articles