How to remove CDATA from title and description? (stitcherFM and w3validator) #97

vincentntang · 2020-11-25T16:23:04Z

how do I remove the CDATA from here? Here's the RSS feed we're using https://www.codechefs.dev/rss.xml

<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" version="2.0">
<channel>
<title>
<![CDATA[ Code Chefs - Hungry Web Developer Podcast ]]>
</title>
<description>
<![CDATA[ Looking to expand your skills as a Web Developer? Vincent Tang and German Gamboa break down topics in Javascript, NodeJS, CSS, DevOps, AWS, and career development! ]]>
</description>

I'm running into issues getting this feed into getting this through https://www.stitcher.com/ which uses https://validator.w3.org/

I'm using GatsbyJS which is using node-rss behind the scenes for the gatsby-plugin-feed

I read elsewhere that if I remove the CDATA for title, it should fix RSS feed issues. #71

The text was updated successfully, but these errors were encountered:

williamwgant · 2021-05-02T15:36:48Z

I'm seeing the same issue. Did you have any luck sorting it out?

dmythro · 2021-05-02T16:07:39Z

Just passing by... but have no idea how removing CDATA should fix validation as it is already valid, and it is a standard approach and works like that for ages (it was the same like 20 years ago). Are you sure it is the actual problem?

williamwgant · 2021-05-02T18:14:56Z

To be honest, I'm not sure. I've seen different things on this, and the validator I just tried to use didn't call it out. So maybe it's ok?

The validator did find a lot of other dumb stuff I did (with more complicated issues), so I would expect that it is ok. But I haven't tried pushing it into stitcher yet, as I'm moving an existing podcast feed.

dmythro · 2021-05-07T06:58:02Z

A long time ago I wrote a specialised CMS, with RSS feeds, using PHP. And all the output, including RSS, was manual back in time. All the plain text/html was inside CDATA and it still works like that, no issues with Feedly or other stuff I used over time. So if validator highlights it — it's weird. Checked the feed with W3C validator and no issues with CDATA. There are others already as it's outdated a bit, but that's alright :)

tazwar9t63 · 2022-04-04T03:08:21Z

how did u solve it ? I'm facing the same issue @vincentntang

FerrariAndrea · 2023-04-17T14:19:00Z

Same issue here, some RSS validators don't like the ![CDATA[.
(For now, I'm testing the string output, not the XML exposing it from a server, at the end of that message you will understand why I underlined that)

I think that if for my scope I will need to remove them, I will build an algorithm that will find and remove ![CDATA[ from the output string format (should be easy to do).

Something like:

  const rss_data =feed.xml({ indent: true }).replaceAll("&", "&amp;");
  let offset = 0;
  let buffer = "";
  const skip_item=false;
  let eof_targeth= rss_data.length;
  if(skip_item){
    eof_targeth=rss_data.indexOf("<item>");    
  }
  while(offset<eof_targeth){
    const start_i_title = rss_data.indexOf("<title>",offset);
    const start_i_desc = rss_data.indexOf("<description>",offset);
    let start_i =-1;
    let xml_tag;
    if(start_i_title>start_i_desc){
      start_i=start_i_desc;
      xml_tag="</description>";
    }else{
      start_i=start_i_title;
      xml_tag="</title>";
    }
    if(start_i>-1 &&start_i<eof_targeth ){
      const end_i = rss_data.indexOf(xml_tag,start_i+1);
      const text = rss_data.substring(start_i,end_i);
      const cleanned = text.replaceAll("<![CDATA[","").replaceAll("]]>","");
      buffer+=rss_data.substring(offset,start_i)+cleanned;
      offset=end_i;
    }else{
      buffer+=rss_data.substring(offset,rss_data.length);
      offset=eof_targeth;
    }
  }

buffer will contain the XML string with not <![CDATA[ inside the titles and descriptions.
if skip_item is true, the algorithm will not iter inside item tags.

Guys be careful with special chars, if you remove <![CDATA[, you need to sanitize the string, as I did in the algorithm for the "&" with .replaceAll("&", "&");.
I still need to study the standard of RSS feed, I think that the validators are wrong, not that repo.
For example this one: https://www.rssfeedexpert.com/ToolsRSSFeedXMLFormatter.aspx
I noted that if you validate the normal output text of "feed.xml" here the validator will write: This is NOT a valid XML document
But if you edit the text by adding a space for example somewhere and then click on "collect" it will show:
"Success - See reformatted text below" and it parses the XML correctly 😓

I will update you when I will test the RSS output directly from the URL instead validate it from the string output.

Update:
I'm using Google News as a reader of RSS output directly from the URL and here all is fine.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to remove CDATA from title and description? (stitcherFM and w3validator) #97

How to remove CDATA from title and description? (stitcherFM and w3validator) #97

vincentntang commented Nov 25, 2020

williamwgant commented May 2, 2021

dmythro commented May 2, 2021

williamwgant commented May 2, 2021

dmythro commented May 7, 2021

tazwar9t63 commented Apr 4, 2022

FerrariAndrea commented Apr 17, 2023 •

edited

How to remove CDATA from title and description? (stitcherFM and w3validator) #97

How to remove CDATA from title and description? (stitcherFM and w3validator) #97

Comments

vincentntang commented Nov 25, 2020

williamwgant commented May 2, 2021

dmythro commented May 2, 2021

williamwgant commented May 2, 2021

dmythro commented May 7, 2021

tazwar9t63 commented Apr 4, 2022

FerrariAndrea commented Apr 17, 2023 • edited

FerrariAndrea commented Apr 17, 2023 •

edited