Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to remove CDATA from title and description? (stitcherFM and w3validator) #97

Open
vincentntang opened this issue Nov 25, 2020 · 6 comments

Comments

@vincentntang
Copy link

how do I remove the CDATA from here? Here's the RSS feed we're using https://www.codechefs.dev/rss.xml

<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" version="2.0">
<channel>
<title>
<![CDATA[ Code Chefs - Hungry Web Developer Podcast ]]>
</title>
<description>
<![CDATA[ Looking to expand your skills as a Web Developer? Vincent Tang and German Gamboa break down topics in Javascript, NodeJS, CSS, DevOps, AWS, and career development! ]]>
</description>

I'm running into issues getting this feed into getting this through https://www.stitcher.com/ which uses https://validator.w3.org/

I'm using GatsbyJS which is using node-rss behind the scenes for the gatsby-plugin-feed

I read elsewhere that if I remove the CDATA for title, it should fix RSS feed issues. #71

@williamwgant
Copy link

I'm seeing the same issue. Did you have any luck sorting it out?

@dmythro
Copy link

dmythro commented May 2, 2021

Just passing by... but have no idea how removing CDATA should fix validation as it is already valid, and it is a standard approach and works like that for ages (it was the same like 20 years ago). Are you sure it is the actual problem?

@williamwgant
Copy link

To be honest, I'm not sure. I've seen different things on this, and the validator I just tried to use didn't call it out. So maybe it's ok?

The validator did find a lot of other dumb stuff I did (with more complicated issues), so I would expect that it is ok. But I haven't tried pushing it into stitcher yet, as I'm moving an existing podcast feed.

@dmythro
Copy link

dmythro commented May 7, 2021

A long time ago I wrote a specialised CMS, with RSS feeds, using PHP. And all the output, including RSS, was manual back in time. All the plain text/html was inside CDATA and it still works like that, no issues with Feedly or other stuff I used over time. So if validator highlights it — it's weird. Checked the feed with W3C validator and no issues with CDATA. There are others already as it's outdated a bit, but that's alright :)

@tazwar9t63
Copy link

how did u solve it ? I'm facing the same issue @vincentntang

@FerrariAndrea
Copy link

FerrariAndrea commented Apr 17, 2023

Same issue here, some RSS validators don't like the ![CDATA[.
(For now, I'm testing the string output, not the XML exposing it from a server, at the end of that message you will understand why I underlined that)

I think that if for my scope I will need to remove them, I will build an algorithm that will find and remove ![CDATA[ from the output string format (should be easy to do).

Something like:

  const rss_data =feed.xml({ indent: true }).replaceAll("&", "&amp;");
  let offset = 0;
  let buffer = "";
  const skip_item=false;
  let eof_targeth= rss_data.length;
  if(skip_item){
    eof_targeth=rss_data.indexOf("<item>");    
  }
  while(offset<eof_targeth){
    const start_i_title = rss_data.indexOf("<title>",offset);
    const start_i_desc = rss_data.indexOf("<description>",offset);
    let start_i =-1;
    let xml_tag;
    if(start_i_title>start_i_desc){
      start_i=start_i_desc;
      xml_tag="</description>";
    }else{
      start_i=start_i_title;
      xml_tag="</title>";
    }
    if(start_i>-1 &&start_i<eof_targeth ){
      const end_i = rss_data.indexOf(xml_tag,start_i+1);
      const text = rss_data.substring(start_i,end_i);
      const cleanned = text.replaceAll("<![CDATA[","").replaceAll("]]>","");
      buffer+=rss_data.substring(offset,start_i)+cleanned;
      offset=end_i;
    }else{
      buffer+=rss_data.substring(offset,rss_data.length);
      offset=eof_targeth;
    }
  }
  • buffer will contain the XML string with not <![CDATA[ inside the titles and descriptions.
  • if skip_item is true, the algorithm will not iter inside item tags.

Guys be careful with special chars, if you remove <![CDATA[, you need to sanitize the string, as I did in the algorithm for the "&" with .replaceAll("&", "&amp;");.
I still need to study the standard of RSS feed, I think that the validators are wrong, not that repo.
For example this one: https://www.rssfeedexpert.com/ToolsRSSFeedXMLFormatter.aspx
I noted that if you validate the normal output text of "feed.xml" here the validator will write: This is NOT a valid XML document
But if you edit the text by adding a space for example somewhere and then click on "collect" it will show:
"Success - See reformatted text below" and it parses the XML correctly 😓

I will update you when I will test the RSS output directly from the URL instead validate it from the string output.

Update:
I'm using Google News as a reader of RSS output directly from the URL and here all is fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants