XML

Archived Posts from this Category

Invalid XML from a .NET web service

Posted by on 01 Oct 2007 | Tagged as: .NET, ASP.NET, C#, Development, Web Services, XML

10187684_78f140f0e2_m.jpg
Image Credit: kevinzim on Flickr

One of my basic assumptions about the .NET framework was proved incorrect last week. Up until then, I had believed that when you are using the built in framework classes for exposing web services you were always safe when it came to the output being valid XML. Sadly that turns out to be untrue, and to make matters worse, .NET will happily output XML over a web service even its own .NET web service clients can’t read.

There are certain characters that are forbidden from being in XML as per the official specification. These characters are the low ascii characters such as NULL, EOF etc. Its important to note that this is not a case of unescaped/unencoded versions of this character being disallowed, the encoded characters are also disallowed.

The problem with this isn’t that the .NET framework doesn’t understand these rules – it manages just fine when it comes to acting as a client to a web service serving these characters in content, throwing nice exceptions explaining that these characters are invalid. Additionally, the XML Text Reader has a property ‘Normalization’ which causes the XML reader to be more liberal and ignore invalid characters – but this option is not used within the automatically generated Web Service Client.

This problem isn’t limited to just the web services, standard XML serialisation also experiences the same problems. Here are bits of code that illustrate the problem:

[WebMethod()]
public string InvalidCharacter()
{
   return "" + (char)4;
}

public class MyClass
{
   public string Test = "" + (char)0;
   public static void Main()
{
     MyClass c = new MyClass();
     System.Xml.Serialization.XmlSerializer xs = new System.Xml.Serialization.XmlSerializer(typeof(MyClass));
     System.Text.StringBuilder sb = new System.Text.StringBuilder();
     System.IO.StringWriter sw = new System.IO.StringWriter(sb);
     xs.Serialize(sw, c);
     Console.WriteLine(sb.ToString());
    
     System.IO.StringReader sr = new System.IO.StringReader(sb.ToString());
     try
     {
         c = (MyClass)xs.Deserialize(sr);
     }
     catch (System.Exception ex)
     {
         Console.WriteLine(ex.ToString());
}
}
}

The little console application gives output like this:



  

System.InvalidOperationException: There is an error in XML document (3, 12).
System.Xml.XmlException: '.', hexadecimal value 0x00, is an invalid character.
Line 3, position 12.

Unfortunately I haven’t yet found a fix for this – my only solution is to work around the problem by ensuring that these invalid characters can’t get into the system in the first place or clean the text on the get of each property of the serialized objects.

Things like this really worry me – our frameworks shouldn’t be outputting things that they can’t read – let alone outputting things that completely contravene the specifications. As always, any suggestions on an alternative solution to this problem are welcome.

Selecting nodes when namespaces are involved

Posted by on 07 Jan 2007 | Tagged as: .NET, Development, XML

I needed to select out XML nodes from an XML document that had namespaces attached and wasn’t getting the results I needed from my real code, so using snippet compiler I cooked up a simple example to get it working. Presented below is that simple example, here more as a reminder to myself if I need it again.

 using System;
using System.Xml;
public class XPathWithNamespaces
{
public static void Main()
{
XmlDocument xd = new XmlDocument();
xd.LoadXml(""
+"
test ");
XmlNamespaceManager nsmgr = new XmlNamespaceManager(xd.NameTable);
nsmgr.AddNamespace("foo", "http://foo.bar/mouse");
XmlNode xn = xd.SelectSingleNode("//foo:theone", nsmgr);
if (xn != null)
{
Console.WriteLine(xn.Name);
}
Console.ReadLine();
}
}