This code is designed to extract song lyrics from a...

July 1, 2025 at 04:55 AM

+        # Genius has the lyrics either in multiple divs with class attributes
+        # beginning with "Lyrics__Container", or in a single div with class
+        # attribute "lyrics"
+        lyric_tag = html.find("div", class_="lyrics")
+        if lyric_tag is None:
+            class_matcher = re.compile("^Lyrics__Container")
+            lyric_tags = html.find_all("div", class_=class_matcher)
+            if not lyric_tags:
+                self._log.debug(u'Genius page {0} has no lyric tags', page_url)
+                return None
+            lyrics = u'\n\n'.join(tag.get_text() for tag in lyric_tags)
+        else:
+            lyrics = lyric_tag.get_text()
+        # remove leading and trailing whitespace
+        return lyrics.strip()
 
     def fetch(self, artist, title):
         search_url = self.base_url + "/search"

This code is designed to extract song lyrics from a Genius.com page (or another web page with a similar structure). Here's a summary of what the code does:

It attempts to find lyrics embedded in an HTML document (html) provided as input:
- First, it looks for a <div> element with the class attribute "lyrics".
- If such an element is found (lyric_tag is not None), it retrieves the text content of the element using get_text().
If the "lyrics" class is not found (lyric_tag is None):
- It looks for all <div> elements with class names that begin with "Lyrics__Container". These are matched using a regular expression (re.compile("^Lyrics__Container")).
- If no such elements are found, it logs a debug message noting that no lyric tags were found on the page and returns None.
If multiple <div> elements with "Lyrics__Container" are found:
- It joins the text content of those elements into a single string, separating each block of text with two newline characters (\n\n).
After locating and combining the lyrics, it removes any leading or trailing whitespace using .strip() and then returns the cleaned lyrics as a string.

The purpose of this code is likely to handle different HTML structures for lyrics on Genius (or similar websites) by accounting for two common patterns:

A single container with the class "lyrics"
Multiple containers with classes starting with "Lyrics__Container"

It gracefully handles cases where no lyrics are found by returning None.

Generate your own explanations

Download our vscode extension

Read other generated explanations

Built by @thebuilderjr