Tuesday, November 1, 2011

Archeology of HTML

Repost | 11/01/2011

I was helping Vanguard Technology (VT) migrate content from a small association's Website to a content-managed redesign. VT is using the Telerik Sitefinity CMS, which has really impressed me. I was given a step-by-step procedure and pretty much am following it in "robot mode." I have Sitefinity and the Website open in Chrome and the spreadsheet matrix of all the pages I'm working on open in Explorer via Google Docs, arranging the windows to click from one to the other. Underneath them all I'm running Spotify and listening to music as I have always done throughout my career. I think it's my ADD solution for concentration: overwhelm senses and focus ... something like that.

My big question: whatever happened to macros? Word allows you to create macros but it's so much smarter than me. I load an HTML file into it and without my consent it "value adds" more than two cents' worth of absolutely unnecessary code when my project calls for refining the code to bare essentials. Give me the pre-Windows WordPerfect, where back in the day, this ability to create macros meant in mere seconds you have a transformed file exactly the way you wanted it rather than Microsoft's idea of "what you really wanted." So it goes, and so I have a routine using Notepad deleting all the unwanted formatting codes.

I load the html content into Notepad and from the bottom use the up key to view the left-hand margin for the various abuses of p-breaks or no-breaking spaces between p-breaks (thank you Word Press for not letting me use codes here) to create line spacing. Need to find span for underlining and other abuses before I can whack the span breaks and so on.

As I go through year 2003 to the present, it's interesting to see the code getting less exasperating, for the most part, and I wonder if the early versions of these newsletters suffered from being done in Word with all the erratic coding that it often introduced. Example: the presence of extensive style codes and color designations tells me Word has been fussy about fonts. Just Saying. The association's Web editors seemed to tumble to the excesses of Word and in so doing, the files get cleaner as we near the present day. (One of the last ones I worked on took a half hour because almost every p-break was formatted, requiring a hunt of every instance of "span" and "style=" I could find.) Of course, we're migrating the content, the final shakedown in repurposing old text-based documents, kicking and screaming into the new age of electronic publications.

I'm not going to say it was fun, but it beat the hell out of filling my day scanning Indeed.com for work.

Note added 11-2-11: I decided to see if Codelobster had macro capabilities. It does not, but it saves all the items I've searched for so I can use a dropdown rather than type them in, and also serving as a reminder of what I need to hunt down. It highlights search items in yellow and gives me a count of replaced items, so I know that if I get rid of a specific span item (they used underline alot), and nuke the end-span codes, if the numbers don't match, I need to go after more spans. I also don't have to worry about word wrap issues that Notepad presents. Big thumbs up for Codelobster and a head scratch for why it took me so long to tumble to using it.