Friday, September 9, 2016

HTML Scrubbing !

Just very recently I extended a little help to one of my UI developers (an Angular one :-)) and thought I'd share with the community a little piece of C# code..

Context : As with many legacy application data, we ran into one that has persisted a lot of garbage HTML data, copy-pasted from a Rich Text Editor, without application validating much of it..That accumulated a lot of HTML code persisted, when read and displayed using Angular.js framework, did not display the right content. The right content was intended to strip off all the HTML tags including   as well.

The below little utility as C# static function does exactly what I intend to put forth the above context..

        public static string ScrubHtml(string value)
            var step1OfScrubbedHtml = Regex.Replace(value, @"<[^>]+>|&nbsp;", "").Trim();
            var step2OfScrubbedHtml = Regex.Replace(step1OfScrubbedHtml, @"\s{2,}", " ");
            return step2OfScrubbedHtml;

Have fun!


No comments:

Post a Comment