Welcome to CSharp Labs

Validating Top Level Domain Names with IANA

Tuesday, July 16, 2013

A top level domain (TLD) is the last label in a fully qualified domain name. To be valid, this label must be one of the names in the actual list provided by the Internet Assigned Numbers Authority (IANA). I have created the TopLevelDomainNameValidatorContext DBContext class to download, parse and add TLD data to a database for the purpose of domain name validation.

How it Works

IANA provides a line delimited list of top level domains with ASCII encoding. Internationalized country code top level domains are encoded with Punycode. Updating the database involves downloading the list of entries to a temporary file, reading each line and converting to Unicode:

        /// <summary>
        /// Updates all the top level domains in the database.
        /// </summary>
        public static void UpdateTopLevelDomains()
        {
            //remote file path to top level domain names
            string remoteFile = ConfigurationManager.AppSettings["TopLevelDomainNamesRemotePath"];

            //if no remote file found
            if (remoteFile == null)
                //throw exception
                throw new SettingsPropertyNotFoundException("The TopLevelDomainNamesRemotePath application setting was not found or is invalid.");

            //ensure database initialized
            using (TopLevelDomainNameValidatorContext context = new TopLevelDomainNameValidatorContext())
                context.Database.Initialize(false);

            //used to decode non-ASCII characters:
            IdnMapping map = new IdnMapping();
            //collection of top level domain names
            HashSet<string> topLevelDomainNames = new HashSet<string>(StringComparer.OrdinalIgnoreCase);
            //temp file path
            string temp = null;

            try
            {
                //create temp file
                temp = Path.GetTempFileName();

                //create a web client to download top level domain names
                using (WebClient client = new WebClient())
                    //download file to temp
                    client.DownloadFile(remoteFile, temp);

                //open downloaded file
                using (StreamReader reader = new StreamReader(temp, Encoding.ASCII))
                {
                    //loops until null line
                    for (; ; )
                    {
                        //read line
                        string line = reader.ReadLine();

                        //break if no more data
                        if (line == null)
                            break;

                        //trim the line
                        line = line.Trim();

                        //skips empty lines or comments
                        if (line == string.Empty || line.StartsWith("#"))
                            continue;

                        //decode the idn label
                        line = map.GetUnicode(line);

                        //add to set
                        if (!topLevelDomainNames.Contains(line))
                            topLevelDomainNames.Add(line);
                    }
                }
            }
            finally
            {
                //if the temporary file was created, delete it
                if (temp != null && File.Exists(temp))
                    File.Delete(temp);
            }

            //insert top level domain names
            InsertTopLevelDomainNames(topLevelDomainNames);
        }

Entries are bulk added to SQL using a table type and stored procedure created in a database initializer allowing for high performance updates:

    /// <summary>
    /// The <see cref="TopLevelDomainNameInitializer"/> creates the top level domain name database.
    /// </summary>
    public sealed class TopLevelDomainNameInitializer : CreateDatabaseIfNotExists<TopLevelDomainNameValidatorContext>
    {
        protected override void Seed(TopLevelDomainNameValidatorContext context)
        {
            //use simple recovery model
            context.Database.ExecuteSqlCommand(@"
DECLARE @alter AS NVARCHAR(256);
SET @alter = 'alter database [' + DB_NAME() + '] set recovery simple';
exec(@alter);
");

            //create a type for bulk inserts
            context.Database.ExecuteSqlCommand(@"
CREATE TYPE [dbo].[TopLevelDomainNamesTable] AS TABLE (
  [Name] nvarchar(128) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
  PRIMARY KEY CLUSTERED ([Name])
)
");

            //creates a procedure for bulk inserts
            context.Database.ExecuteSqlCommand(@"
CREATE PROCEDURE dbo.InsertTopLevelDomainNames
@TLDTable TopLevelDomainNamesTable readonly
AS
SET XACT_ABORT ON;
BEGIN
	BEGIN TRANSACTION;
	DELETE FROM TopLevelDomainNames;
	INSERT INTO TopLevelDomainNames (Name) SELECT Name FROM @TLDTable;
	COMMIT TRANSACTION;
END
");

            base.Seed(context);
        }
    }

A DataTable is created and populated with top level domain names. An SQLCommand targeting the stored procedure is created to insert the elements:

        /// <summary>
        /// Inserts a set of top level domains to the database.
        /// </summary>
        /// <param name="topLevelDomainNames">A set of top level domain names.</param>
        private static void InsertTopLevelDomainNames(HashSet<string> topLevelDomainNames)
        {
            //populate a data table with top level domain names
            using (DataTable tbl = new DataTable("TopLevelDomainNamesTable"))
            using (DataColumn col = new DataColumn("Name", typeof(string)))
            {
                col.MaxLength = 128;
                tbl.Columns.Add(col);

                //add all top level domain names
                foreach (string entry in topLevelDomainNames)
                    tbl.Rows.Add(entry);

                //insert into database
                InsertTopLevelDomainNames(tbl);
            }
        }

        /// <summary>
        /// Inserts a table of top level domains to the database.
        /// </summary>
        /// <param name="topLevelDomainNameTable">A DataTable of top level domain names.</param>
        private static void InsertTopLevelDomainNames(DataTable topLevelDomainNameTable)
        {
            using (SqlConnection conn = new SqlConnection(GetConnectionString()))
            using (SqlCommand cmd = new SqlCommand("InsertTopLevelDomainNames", conn))
            {
                conn.Open();

                cmd.CommandType = CommandType.StoredProcedure;
                cmd.Parameters.AddWithValue("@TLDTable", topLevelDomainNameTable);

                cmd.ExecuteNonQuery();
            }
        }

To determine if a specified top level domain is valid, the TopLevelDomainNameValidatorContext.IsValid method creates a query and returns a value indicating if a top level domain was found:

        /// <summary>
        /// Determines if the specified top level domain is valid.
        /// </summary>
        /// <param name="tld">The top level domain name to validate.</param>
        /// <returns>true if the top level domain is valid; otherwise, false.</returns>
        public bool IsValid(string tld)
        {
            if (tld == null)
                throw new ArgumentNullException("tld");

            if (tld == string.Empty)
                throw new ArgumentException("Please specify a valid top level domain name.", "tld");

            //create query
            var q = from t in TopLevelDomainNames
                    where t.Name == tld
                    select t;

            //return if any element found
            return q.Any();
        }
Using

Using the TopLevelDomainNameValidatorContext requires amending the main web.Config with the connection string, application setting and Entity Framework database initializer shown below:

<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <connectionStrings>
    <!-- TopLevelDomainNameConnection connection defines the connection source to the top level domain database. -->
    <add name="TopLevelDomainNameConnection" connectionString="[your connection string]" providerName="System.Data.SqlClient" />
  </connectionStrings>
  <appSettings>
    <!-- TopLevelDomainNamesRemotePath application setting defines the remote path to a list of top level domain names -->
    <add key="TopLevelDomainNamesRemotePath" value="http://data.iana.org/TLD/tlds-alpha-by-domain.txt"/>
  </appSettings>
  <entityFramework>
    <contexts>
      <context type="System.Data.Entity.TopLevelDomainNameValidatorContext, [your assembly]">
        <!-- Defines the database initializer for the TopLevelDomainNameValidatorContext -->
        <databaseInitializer type="System.Web.Mvc.TopLevelDomainNameInitializer, [your assembly]" />
      </context>
    </contexts>
  </entityFramework>
</configuration>

To download current top level domain names and update the database, call the TopLevelDomainNameValidatorContext.UpdateTopLevelDomains static method. Top level domain name candidates can be validated through the TopLevelDomainNameValidatorContext.IsValid method.

Source Code

Download TopLevelDomainNameValidatorContext and Supporting Classes

Comments