Monday 25 July 2011

JavaScript: get data from an xml file (like a Blogger backup file) and display it (or print it) - Part 2

   


As promised, in the following article I will explain the code published in my previous post. I don't know how many of you are interested in explanations, but because I think that understanding things is better than merely copying examples, I will torture you with this. You've been warned!


The code
I will explain the code bit by bit, so that you can eventually modify it to your needs.
<body>
<div align="center"><img src="http://4.bp.blogspot.com/-R9bApN8j4zM/TdOPdENmWKI/AAAAAAAAAH4/zNtY4Q-oWTQ/s1600/twt.gif" alt="The Web Thought"><br>
  A place where I can share my thoughts on web development and programming. The web is such a big place...<br>
  <br>
</div>
This first part is just the heading of our web page. It contains an image and a sub-title. Nothing's special here. After that, the JavaScript snippet:
<script type="text/javascript">
if (window.XMLHttpRequest)
  {// code for IE7+, Firefox, Chrome, Opera, Safari
  xmlhttp=new XMLHttpRequest();
  }
else
  {// code for IE6, IE5
  xmlhttp=new ActiveXObject("Microsoft.XMLHTTP");
  }
xmlhttp.open("GET","thought.xml",false);
xmlhttp.send();
xmlDoc=xmlhttp.responseXML;
With the above code we prepare to fetch the data from the xml file. There are two methods of doing it, according to the browser used. I must say I tested it with FireFox only, but I believe there's no problem with other browsers. Please, have a look at the xmlhttp.open part: it contains the name of your xml file (in the example is "thought.xml"). Remember to change it according to your xml file.
I repeat this important note: because we are using XMLHttpRequest, we should put the source xml file and the html page we are building on the same domain, otherwise you will get an Access Denied error from your browser. However, that is NOT always true: in my experience FireFox and Safari work perfectly locally, while Chrome and Internet Explorer need the files to be served by ISS on the same domain.
var x=xmlDoc.getElementsByTagName("entry");
The above part is getting the parent node (<entry>) using the tag name, and stores it in a variable. Blogger backup files have no tag ids, so we have to work directly on tags.
for (i=x.length-1;i>=1;i--)
  {
We then start a for cycle. This is tricky because it is actually doing it starting from the node with the highest number. That is because we want to show the post from the oldest. Why is i>=1 and not i>=0? Because we are not going to get the first <entry> which is the "rubbish" mentioned in my previous article.
 var sea = x[i].getElementsByTagName("category")[0].getAttribute("term");
  if (sea.search("comment")==-1 && sea.search("page")==-1)
  {
Here we are basically checking what's inside the <category> node. Specifically we search in the term attribute for the words "comment" and "page". If the two terms are not found, we know that the <entry> is a post (and not a comment or a page). If so, we can display the node.
  document.write("<h2>");
  document.write(x[i].getElementsByTagName("title")[0].childNodes[0].nodeValue);
  document.write("</h2>");
Here we start building the core of our page and we insert the first post title inside <h2> tags.
  var pubdatetime=(x[i].getElementsByTagName("published")[0].childNodes[0].nodeValue);
  var pubdate = pubdatetime.substr(0,10);
  var pubtime = pubdatetime.substr(11,8);
We then deal with the publishing date which has a relatively strange format. We basically store it in a variable and then we remove things we don't want. The result will be two variables (pubdate and pubtime) representing respectively the publishing date and time.
  document.write("Published on "+pubdate);
  document.write(" at "+pubtime);
This part is simply inserting the two aforementioned variables.
  document.write("<br><br><div style='position: relative;'>");
  document.write("<br><br>");
We then open the container of the body part of the post. I decided to give it a relative position so that things stay inside it.
  var kids = x[i].getElementsByTagName("content")[0].childNodes.length;
With the above code we enumerate the number of children of the <content> node. Why is that? Well, I really banged my head on this, but to make a long story short, FireFox and Opera seem to divide nodes into pieces of length equal to 4096. That ended up in truncated post body texts. To resolve that issue, we need to know how many children the parent has, and show them all.
  for (j=0;j<kids;j++)
  {
  document.write(x[i].getElementsByTagName("content")[0].childNodes[j].nodeValue);
  }
The second for cycle is there because of the above explanation. We cycle through every possible child and display it. Because the text inside the <content> node is plain text, all the HTML tags inside it will be rendered perfectly. Your HTML code, inside the post, will be interpreted and shown correctly.
  document.write("</div><br><hr><br>");
  }
  }
  </script>
</body>
We finally close the post body container, add a few lines and close the script.
That's all.

Wow, that was something. I did all this, because I wanted to use the Blogger backup file to display my blog for printing. It's quite clear though, that you can use the code with any xml file. You obviously need to change the nodes names and possibly rearrange it all. However if you followed me, you have probably understood the general idea behind all the above example.

I hope you like the article and please let me know if you've found it useful or not.

1 comment:

Comments are moderated. I apologize if I don't publish comments immediately.

However, I do answer to all the comments.