Organizational Research By

Surprising Reserch Topic

php getting domain name from subdomain


php getting domain name from subdomain  using -'php,dns,subdomain'

I need to write a function to parse variables which contain domain names. It's best I explain this with an example, the variable could contain any of these things:

here.example.com
example.com
example.org
here.example.org


But when passed through my function all of these must return either example.com or example.co.uk, the root domain name basically. I'm sure I've done this before but I've been searching Google for about 20 minutes and can't find anything. Any help would be appreciated.

EDIT: Ignore the .co.uk, presume that all domains going through this function have a 3 letter TLD.
    

asked Sep 30, 2015 by pradip.bhoge
0 votes
225 views



Related Hot Questions

16 Answers

0 votes

Stackoverflow Question Archive:


print get_domain("http://somedomain.co.uk"); // outputs 'somedomain.co.uk'

function get_domain($url)
{
  $pieces = parse_url($url);
  $domain = isset($pieces['host']) ? $pieces['host'] : '';
  if (preg_match('/(?P[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i', $domain, $regs)) {
    return $regs['domain'];
  }
  return false;
}
answered Sep 30, 2015 by rolvyrf
0 votes

I would do something like the following:

// hierarchical array of top level domains
$tlds = array(
    'com' => true,
    'uk' => array(
        'co' => true,
        // …
    ),
    // …
);
$domain = 'here.example.co.uk';
// split domain
$parts = explode('.', $domain);
$tmp = $tlds;
// travers the tree in reverse order, from right to left
foreach (array_reverse($parts) as $key => $part) {
    if (isset($tmp[$part])) {
        $tmp = $tmp[$part];
    } else {
        break;
    }
}
// build the result
var_dump(implode('.', array_slice($parts, - $key - 1)));
answered Sep 30, 2015 by deven.bendale
0 votes

Almost certainly, what you're looking for is this:

http://www.dkim-reputation.org/regdom-libs/

It's a PHP library that utilizes the (as nearly as is practical) full list of various TLD's that's collected at publicsuffix.org/list/ , and wraps it up in a spiffy little function.

Once the library is included, it's as easy as:

$registeredDomain = getRegisteredDomain( $domain );

answered Sep 30, 2015 by 20shahi
0 votes

I ended up using the database Mozilla has.

Here's my code:

fetch_mozilla_tlds.php contains caching algorhythm. This line is important:

$mozillaTlds = file('http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1');

The main file used inside the application is this:

function isTopLevelDomain($domain)
{
    $domainParts = explode('.', $domain);
    if (count($domainParts) == 1) {
        return false;
    }

    $previousDomainParts = $domainParts;
    array_shift($previousDomainParts);

    $tld = implode('.', $previousDomainParts);

    return isDomainExtension($tld);
}

function isDomainExtension($domain)
{
    $tlds = getTLDs();

    /**
     * direct hit
     */
    if (in_array($domain, $tlds)) {
        return true;
    }

    if (in_array('!'. $domain, $tlds)) {
        return false;
    }

    $domainParts = explode('.', $domain);

    if (count($domainParts) == 1) {
        return false;
    }

    $previousDomainParts = $domainParts;

    array_shift($previousDomainParts);
    array_unshift($previousDomainParts, '*');

    $wildcardDomain = implode('.', $previousDomainParts);

    return in_array($wildcardDomain, $tlds);
}

function getTLDs()
{
    static $mozillaTlds = array();

    if (empty($mozillaTlds)) {
        require 'fetch_mozilla_tlds.php';
        /* @var $mozillaTlds array */
    }

    return $mozillaTlds;
}

UPDATE:
The database has evolved and is now available at its own website - http://publicsuffix.org/

answered Sep 30, 2015 by girisha
0 votes

This script generates a Perl file containing a single function, get_domain from the ETLD file. So say you have hostnames like img1, img2, img3, ... in .photobucket.com. For each of those get_domain $host would return photobucket.com. Note that this isn't the fastest function on earth, so in my main log parser that's using this, I keep a hash of host to domain mappings and only run this for hosts that aren't in the hash yet.

#!/bin/bash

cat << 'EOT' > suffixes.pl
#!/bin/perl

sub get_domain {
  $_ = shift;
EOT

wget -O - http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1 \
  | iconv -c -f UTF-8 -t ASCII//TRANSLIT \
  | egrep -v '/|^$' \
  | sed -e 's/^\!//' -e "s/\"/'/g" \
  | awk '{ print length($0),$0 | "sort -rn"}' | cut -d" " -f2- \
  | while read SUFF; do
      STAR=`echo $SUFF | cut -b1`
      if [ "$STAR" = '*' ]; then
        SUFF=`echo $SUFF | cut -b3-`
        echo "  return \"\$1\.\$2\.$SUFF\" if /([a-zA-Z0-9\-]+)\.([a-zA-Z0-9\-]+)\.$SUFF\$/;"
      else
        echo "  return \"\$1\.$SUFF\" if /([a-zA-Z0-9\-]+)\.$SUFF\$/;"
      fi
    done >> suffixes.pl

cat << 'EOT' >> suffixes.pl
}

1;
EOT
answered Sep 30, 2015 by ankitarajoria4
0 votes

Here is how you strip the TLD from any URL - I wrote the code to work on my site:
http://internet-portal.me/ - This is a working solution that is used on my site.

$host is the URL that has to be parsed. This code is a simple solution and reliable
compared to everything else I have seen, It works on any URL that I have tried!!!
see this code parsing the page you are looking at right now!
http://internet-portal.me/domain/?dns=questions/1201194/php-getting-domain-name-from-subdomain/6320437#6320437

================================================================================

$host = filter_var($_GET['dns']);
$host = $host . '/'; // needed if URL does not have trailing slash

// Strip www, http, https header ;

$host = str_replace( 'http://www.' , '' , $host );
$host = str_replace( 'https://www.' , '' , $host );

$host = str_replace( 'http://' , '' , $host );
$host = str_replace( 'https://' , '' , $host );
$pos = strpos($host, '/'); // find any sub directories
$host = substr( $host, 0, $pos );  //strip directories

$hostArray = explode (".", $host); // count parts of TLD
$size = count ($hostArray) -1; // really only need to know if not a single level TLD
$tld = $hostArray[$size]; // do we need to parse the TLD any further - 
                          // remove subdomains?

if ($size > 1) {
    if ($tld == "aero" or $tld == "asia" or $tld == "biz" or $tld == "cat" or
        $tld == "com" or $tld == "coop" or $tld == "edu" or $tld == "gov" or
        $tld == "info" or $tld == "int" or $tld == "jobs" or $tld == "me" or
        $tld == "mil" or $tld == "mobi" or $tld == "museum" or $tld == "name" or
        $tld == "net" or $tld == "org" or $tld == "pro" or $tld == "tel" or
        $tld == "travel" or $tld == "tv" or $tld == "ws" or $tld == "XXX") {

        $host = $hostArray[$size -1].".".$hostArray[$size]; // parse to 2 level TLD
    } else {
         // parse to 3 level TLD
        $host = $hostArray[$size -2].".".$hostArray[$size -1].".".$hostArray[$size] ;
    }
}
answered Sep 30, 2015 by ankitarajoria4
0 votes

This is a short way of accomplishing that:

$host = $_SERVER['HTTP_HOST'];
preg_match("/[^\.\/]+\.[^\.\/]+$/", $host, $matches);
echo "domain name is: {$matches[0]}\n";
answered Sep 30, 2015 by rajeshujade
0 votes

I think your problem is that you haven't clearly defined what exactly you want the function to do. From your examples, you certainly don't want it to just blindly return the last two, or last three, components of the name, but just knowing what it shouldn't do isn't enough.

Here's my guess at what you really want: there are certain second-level domain names, like co.uk., that you'd like to be treated as a single TLD (top-level domain) for purposes of this function. In that case I'd suggest enumerating all such cases and putting them as keys into an associative array with dummy values, along with all the normal top-level domains like com., net., info., etc. Then whenever you get a new domain name, extract the last two components and see if the resulting string is in your array as a key. If not, extract just the last component and make sure that's in your array. (If even that isn't, it's not a valid domain name) Either way, whatever key you do find in the array, take that plus one more component off the end of the domain name, and you'll have your base domain.

You could, perhaps, make things a bit simpler by writing a function, instead of using an associative array, to tell whether the last two components should be treated as a single "effective TLD." The function would probably look at the next-to-last component and, if it's shorter than 3 characters, decide that it should be treated as part of the TLD.

answered Sep 30, 2015 by android_master
0 votes

To do it well, you'll need a list of the second level domains and top level domains and build an appropriate regular expression list. A good list of second level domains is available at https://wiki.mozilla.org/TLD_List. Another test case apart from the aforementioned CentralNic .uk.com variants is The Vatican: their website is technically at http://va : and that's a difficult one to match on!

answered Sep 30, 2015 by vibhorsingh
0 votes

Building on Jonathan's answer:

function main_domain($domain) {
  if (preg_match('/([a-z0-9][a-z0-9\-]{1,63})\.([a-z]{3}|[a-z]{2}\.[a-z]{2})$/i', $domain, $regs)) {
    return $regs;
  }

  return false;
}

His expression might be a bit better, but this interface seems more like what you're describing.

answered Sep 30, 2015 by yogeshplv
0 votes

Ah - if you just want to handle three character top level domains - then this code works:

http://here.example.org/longtest?string=here');
foreach ($domains as $domain) {
 testdomain($domain);
}

function testdomain($url) {
 if (preg_match('/^((.+)\.)?([A-Za-z][0-9A-Za-z\-]{1,63})\.([A-Za-z]{3})(\/.*)?$/',$url,$matches)) {
    print 'Domain is: '.$matches[3].'.'.$matches[4].'
'."\n"; } else { print 'Domain not found in '.$url.'
'."\n"; } } ?>

$matches[1]/$matches[2] will contain any subdomain and/or protocol, $matches[3] contains the domain name, $matches[4] the top level domain and $matches[5] contains any other URL path information.

To match most common top level domains you could try changing it to:

if (preg_match('/^((.+)\.)?([A-Za-z][0-9A-Za-z\-]{1,63})\.([A-Za-z]{2,6})(\/.*)?$/',$url,$matches)) {

Or to get it coping with everything:

if (preg_match('/^((.+)\.)?([A-Za-z][0-9A-Za-z\-]{1,63})\.(co\.uk|me\.uk|org\.uk|com|org|net|int|eu)(\/.*)?$/',$url,$matches)) {

etc etc

answered Sep 30, 2015 by 20shahi
0 votes

Based on http://www.cafewebmaster.com/find-top-level-domain-international-urls-php

function find_tld($url){

$purl  = parse_url($url);
$host  = strtolower($purl['host']);

$valid_tlds = ".ab.ca .bc.ca .mb.ca .nb.ca .nf.ca .nl.ca .ns.ca .nt.ca .nu.ca .on.ca .pe.ca .qc.ca .sk.ca .yk.ca .com.cd .net.cd .org.cd .com.ch .net.ch .org.ch .gov.ch .co.ck .ac.cn .com.cn .edu.cn .gov.cn .net.cn .org.cn .ah.cn .bj.cn .cq.cn .fj.cn .gd.cn .gs.cn .gz.cn .gx.cn .ha.cn .hb.cn .he.cn .hi.cn .hl.cn .hn.cn .jl.cn .js.cn .jx.cn .ln.cn .nm.cn .nx.cn .qh.cn .sc.cn .sd.cn .sh.cn .sn.cn .sx.cn .tj.cn .xj.cn .xz.cn .yn.cn .zj.cn .com.co .edu.co .org.co .gov.co .mil.co .net.co .nom.co .com.cu .edu.cu .org.cu .net.cu .gov.cu .inf.cu .gov.cx .edu.do .gov.do .gob.do .com.do .org.do .sld.do .web.do .net.do .mil.do .art.do .com.dz .org.dz .net.dz .gov.dz .edu.dz .asso.dz .pol.dz .art.dz .com.ec .info.ec .net.ec .fin.ec .med.ec .pro.ec .org.ec .edu.ec .gov.ec .mil.ec .com.ee .org.ee .fie.ee .pri.ee .eun.eg .edu.eg .sci.eg .gov.eg .com.eg .org.eg .net.eg .mil.eg .com.es .nom.es .org.es .gob.es .edu.es .com.et .gov.et .org.et .edu.et .net.et .biz.et .name.et .info.et .co.fk .org.fk .gov.fk .ac.fk .nom.fk .net.fk .tm.fr .asso.fr .nom.fr .prd.fr .presse.fr .com.fr .gouv.fr .com.ge .edu.ge .gov.ge .org.ge .mil.ge .net.ge .pvt.ge .co.gg .net.gg .org.gg .com.gi .ltd.gi .gov.gi .mod.gi .edu.gi .org.gi .com.gn .ac.gn .gov.gn .org.gn .net.gn .com.gr .edu.gr .net.gr .org.gr .gov.gr .com.hk .edu.hk .gov.hk .idv.hk .net.hk .org.hk .com.hn .edu.hn .org.hn .net.hn .mil.hn .gob.hn .iz.hr .from.hr .name.hr .com.hr .com.ht .net.ht .firm.ht .shop.ht .info.ht .pro.ht .adult.ht .org.ht .art.ht .pol.ht .rel.ht .asso.ht .perso.ht .coop.ht .med.ht .edu.ht .gouv.ht .gov.ie .co.in .firm.in .net.in .org.in .gen.in .ind.in .nic.in .ac.in .edu.in .res.in .gov.in .mil.in .ac.ir .co.ir .gov.ir .net.ir .org.ir .sch.ir .gov.it .co.je .net.je .org.je .edu.jm .gov.jm .com.jm .net.jm .com.jo .org.jo .net.jo .edu.jo .gov.jo .mil.jo .co.kr .or.kr .com.kw .edu.kw .gov.kw .net.kw .org.kw .mil.kw .edu.ky .gov.ky .com.ky .org.ky .net.ky .org.kz .edu.kz .net.kz .gov.kz .mil.kz .com.kz .com.li .net.li .org.li .gov.li .gov.lk .sch.lk .net.lk .int.lk .com.lk .org.lk .edu.lk .ngo.lk .soc.lk .web.lk .ltd.lk .assn.lk .grp.lk .hotel.lk .com.lr .edu.lr .gov.lr .org.lr .net.lr .org.ls .co.ls .gov.lt .mil.lt .gov.lu .mil.lu .org.lu .net.lu .com.lv .edu.lv .gov.lv .org.lv .mil.lv .id.lv .net.lv .asn.lv .conf.lv .com.ly .net.ly .gov.ly .plc.ly .edu.ly .sch.ly .med.ly .org.ly .id.ly .co.ma .net.ma .gov.ma .org.ma .tm.mc .asso.mc .org.mg .nom.mg .gov.mg .prd.mg .tm.mg .com.mg .edu.mg .mil.mg .com.mk .org.mk .com.mo .net.mo .org.mo .edu.mo .gov.mo .org.mt .com.mt .gov.mt .edu.mt .net.mt .com.mu .co.mu .aero.mv .biz.mv .com.mv .coop.mv .edu.mv .gov.mv .info.mv .int.mv .mil.mv .museum.mv .name.mv .net.mv .org.mv .pro.mv .com.mx .net.mx .org.mx .edu.mx .gob.mx .com.my .net.my .org.my .gov.my .edu.my .mil.my .name.my .edu.ng .com.ng .gov.ng .org.ng .net.ng .gob.ni .com.ni .edu.ni .org.ni .nom.ni .net.ni .gov.nr .edu.nr .biz.nr .info.nr .com.nr .net.nr .ac.nz .co.nz .cri.nz .gen.nz .geek.nz .govt.nz .iwi.nz .maori.nz .mil.nz .net.nz .org.nz .school.nz .com.pf .org.pf .edu.pf .com.pg .net.pg .com.ph .gov.ph .com.pk .net.pk .edu.pk .org.pk .fam.pk .biz.pk .web.pk .gov.pk .gob.pk .gok.pk .gon.pk .gop.pk .gos.pk .com.pl .biz.pl .net.pl .art.pl .edu.pl .org.pl .ngo.pl .gov.pl .info.pl .mil.pl .waw.pl .warszawa.pl .wroc.pl .wroclaw.pl .krakow.pl .poznan.pl .lodz.pl .gda.pl .gdansk.pl .slupsk.pl .szczecin.pl .lublin.pl .bialystok.pl .olsztyn.pl .torun.pl .biz.pr .com.pr .edu.pr .gov.pr .info.pr .isla.pr .name.pr .net.pr .org.pr .pro.pr .edu.ps .gov.ps .sec.ps .plo.ps .com.ps .org.ps .net.ps .com.pt .edu.pt .gov.pt .int.pt .net.pt .nome.pt .org.pt .publ.pt .net.py .org.py .gov.py .edu.py .com.py .com.ru .net.ru .org.ru .pp.ru .msk.ru .int.ru .ac.ru .gov.rw .net.rw .edu.rw .ac.rw .com.rw .co.rw .int.rw .mil.rw .gouv.rw .com.sa .edu.sa .sch.sa .med.sa .gov.sa .net.sa .org.sa .pub.sa .com.sb .gov.sb .net.sb .edu.sb .com.sc .gov.sc .net.sc .org.sc .edu.sc .com.sd .net.sd .org.sd .edu.sd .med.sd .tv.sd .gov.sd .info.sd .org.se .pp.se .tm.se .parti.se .press.se .ab.se .c.se .d.se .e.se .f.se .g.se .h.se .i.se .k.se .m.se .n.se .o.se .s.se .t.se .u.se .w.se .x.se .y.se .z.se .ac.se .bd.se .com.sg .net.sg .org.sg .gov.sg .edu.sg .per.sg .idn.sg .edu.sv .com.sv .gob.sv .org.sv .red.sv .gov.sy .com.sy .net.sy .ac.th .co.th .in.th .go.th .mi.th .or.th .net.th .ac.tj .biz.tj .com.tj .co.tj .edu.tj .int.tj .name.tj .net.tj .org.tj .web.tj .gov.tj .go.tj .mil.tj .com.tn .intl.tn .gov.tn .org.tn .ind.tn .nat.tn .tourism.tn .info.tn .ens.tn .fin.tn .net.tn .gov.to .gov.tp .com.tr .info.tr .biz.tr .net.tr .org.tr .web.tr .gen.tr .av.tr .dr.tr .bbs.tr .name.tr .tel.tr .gov.tr .bel.tr .pol.tr .mil.tr .k12.tr .edu.tr .co.tt .com.tt .org.tt .net.tt .biz.tt .info.tt .pro.tt .name.tt .edu.tt .gov.tt .gov.tv .edu.tw .gov.tw .mil.tw .com.tw .net.tw .org.tw .idv.tw .game.tw .ebiz.tw .club.tw .co.tz .ac.tz .go.tz .or.tz .ne.tz .com.ua .gov.ua .net.ua .edu.ua .org.ua .cherkassy.ua .ck.ua .chernigov.ua .cn.ua .chernovtsy.ua .cv.ua .crimea.ua .dnepropetrovsk.ua .dp.ua .donetsk.ua .dn.ua .if.ua .kharkov.ua .kh.ua .kherson.ua .ks.ua .khmelnitskiy.ua .km.ua .kiev.ua .kv.ua .kirovograd.ua .kr.ua .lugansk.ua .lg.ua .lutsk.ua .lviv.ua .nikolaev.ua .mk.ua .odessa.ua .od.ua .poltava.ua .pl.ua .rovno.ua .rv.ua .sebastopol.ua .sumy.ua .ternopil.ua .te.ua .uzhgorod.ua .vinnica.ua .vn.ua .zaporizhzhe.ua .zp.ua .zhitomir.ua .zt.ua .co.ug .ac.ug .sc.ug .go.ug .ne.ug .or.ug .ac.uk .co.uk .gov.uk .ltd.uk .me.uk .mil.uk .mod.uk .net.uk .nic.uk .nhs.uk .org.uk .plc.uk .police.uk .bl.uk .icnet.uk .jet.uk .nel.uk .nls.uk .parliament.uk .sch.uk .ak.us .al.us .ar.us .az.us .ca.us .co.us .ct.us .dc.us .de.us .dni.us .fed.us .fl.us .ga.us .hi.us .ia.us .id.us .il.us .in.us .isa.us .kids.us .ks.us .ky.us .la.us .ma.us .md.us .me.us .mi.us .mn.us .mo.us .ms.us .mt.us .nc.us .nd.us .ne.us .nh.us .nj.us .nm.us .nsn.us .nv.us .ny.us .oh.us .ok.us .or.us .pa.us .ri.us .sc.us .sd.us .tn.us .tx.us .ut.us .vt.us .va.us .wa.us .wi.us .wv.us .wy.us .edu.uy .gub.uy .org.uy .com.uy .net.uy .mil.uy .com.ve .net.ve .org.ve .info.ve .co.ve .web.ve .com.vi .org.vi .edu.vi .gov.vi .com.vn .net.vn .org.vn .edu.vn .gov.vn .int.vn .ac.vn .biz.vn .info.vn .name.vn .pro.vn .health.vn .com.ye .net.ye .ac.yu .co.yu .org.yu .edu.yu .ac.za .city.za .co.za .edu.za .gov.za .law.za .mil.za .nom.za .org.za .school.za .alt.za .net.za .ngo.za .tm.za .web.za .co.zm .org.zm .gov.zm .sch.zm .ac.zm .co.zw .org.zw .gov.zw .ac.zw .com.ac .edu.ac .gov.ac .net.ac .mil.ac .org.ac .nom.ad .net.ae .co.ae .gov.ae .ac.ae .sch.ae .org.ae .mil.ae .pro.ae .name.ae .com.ag .org.ag .net.ag .co.ag .nom.ag .off.ai .com.ai .net.ai .org.ai .gov.al .edu.al .org.al .com.al .net.al .com.am .net.am .org.am .com.ar .net.ar .org.ar .e164.arpa .ip6.arpa .uri.arpa .urn.arpa .gv.at .ac.at .co.at .or.at .com.au .net.au .asn.au .org.au .id.au .csiro.au .gov.au .edu.au .com.aw .com.az .net.az .org.az .com.bb .edu.bb .gov.bb .net.bb .org.bb .com.bd .edu.bd .net.bd .gov.bd .org.bd .mil.be .ac.be .gov.bf .com.bm .edu.bm .org.bm .gov.bm .net.bm .com.bn .edu.bn .org.bn .net.bn .com.bo .org.bo .net.bo .gov.bo .gob.bo .edu.bo .tv.bo .mil.bo .int.bo .agr.br .am.br .art.br .edu.br .com.br .coop.br .esp.br .far.br .fm.br .g12.br .gov.br .imb.br .ind.br .inf.br .mil.br .net.br .org.br .psi.br .rec.br .srv.br .tmp.br .tur.br .tv.br .etc.br .adm.br .adv.br .arq.br .ato.br .bio.br .bmd.br .cim.br .cng.br .cnt.br .ecn.br .eng.br .eti.br .fnd.br .fot.br .fst.br .ggf.br .jor.br .lel.br .mat.br .med.br .mus.br .not.br .ntr.br .odo.br .ppg.br .pro.br .psc.br .qsl.br .slg.br .trd.br .vet.br .zlg.br .dpn.br .nom.br .com.bs .net.bs .org.bs .com.bt .edu.bt .gov.bt .net.bt .org.bt .co.bw .org.bw .gov.by .mil.by .ac.cr .co.cr .ed.cr .fi.cr .go.cr .or.cr .sa.cr .com.cy .biz.cy .info.cy .ltd.cy .pro.cy .net.cy .org.cy .name.cy .tm.cy .ac.cy .ekloges.cy .press.cy .parliament.cy .com.dm .net.dm .org.dm .edu.dm .gov.dm .biz.fj .com.fj .info.fj .name.fj .net.fj .org.fj .pro.fj .ac.fj .gov.fj .mil.fj .school.fj .com.gh .edu.gh .gov.gh .org.gh .mil.gh .co.hu .info.hu .org.hu .priv.hu .sport.hu .tm.hu .2000.hu .agrar.hu .bolt.hu .casino.hu .city.hu .erotica.hu .erotika.hu .film.hu .forum.hu .games.hu .hotel.hu .ingatlan.hu .jogasz.hu .konyvelo.hu .lakas.hu .media.hu .news.hu .reklam.hu .sex.hu .shop.hu .suli.hu .szex.hu .tozsde.hu .utazas.hu .video.hu .ac.id .co.id .or.id .go.id .ac.il .co.il .org.il .net.il .k12.il .gov.il .muni.il .idf.il .co.im .net.im .gov.im .org.im .nic.im .ac.im .org.jm .ac.jp .ad.jp .co.jp .ed.jp .go.jp .gr.jp .lg.jp .ne.jp .or.jp .hokkaido.jp .aomori.jp .iwate.jp .miyagi.jp .akita.jp .yamagata.jp .fukushima.jp .ibaraki.jp .tochigi.jp .gunma.jp .saitama.jp .chiba.jp .tokyo.jp .kanagawa.jp .niigata.jp .toyama.jp .ishikawa.jp .fukui.jp .yamanashi.jp .nagano.jp .gifu.jp .shizuoka.jp .aichi.jp .mie.jp .shiga.jp .kyoto.jp .osaka.jp .hyogo.jp .nara.jp .wakayama.jp .tottori.jp .shimane.jp .okayama.jp .hiroshima.jp .yamaguchi.jp .tokushima.jp .kagawa.jp .ehime.jp .kochi.jp .fukuoka.jp .saga.jp .nagasaki.jp .kumamoto.jp .oita.jp .miyazaki.jp .kagoshima.jp .okinawa.jp .sapporo.jp .sendai.jp .yokohama.jp .kawasaki.jp .nagoya.jp .kobe.jp .kitakyushu.jp .per.kh .com.kh .edu.kh .gov.kh .mil.kh .net.kh .org.kh .net.lb .org.lb .gov.lb .edu.lb .com.lb .com.lc .org.lc .edu.lc .gov.lc .army.mil .navy.mil .weather.mobi .music.mobi .ac.mw .co.mw .com.mw .coop.mw .edu.mw .gov.mw .int.mw .museum.mw .net.mw .org.mw .mil.no .stat.no .kommune.no .herad.no .priv.no .vgs.no .fhs.no .museum.no .fylkesbibl.no .folkebibl.no .idrett.no .com.np .org.np .edu.np .net.np .gov.np .mil.np .org.nr .com.om .co.om .edu.om .ac.com .sch.om .gov.om .net.om .org.om .mil.om .museum.om .biz.om .pro.om .med.om .com.pa .ac.pa .sld.pa .gob.pa .edu.pa .org.pa .net.pa .abo.pa .ing.pa .med.pa .nom.pa .com.pe .org.pe .net.pe .edu.pe .mil.pe .gob.pe .nom.pe .law.pro .med.pro .cpa.pro .vatican.va .ac .ad .ae .aero .af .ag .ai .al .am .an .ao .aq .ar .arpa .as .at .au .aw .az .ba .bb .bd .be .bf .bg .bh .bi .biz .bj .bm .bn .bo .br .bs .bt .bv .bw .by .bz .ca .cat .cc .cd .cf .cg .ch .ci .ck .cl .cm .cn .co .com .coop .cr .cu .cv .cx .cy .cz .de .dj .dk .dm .do .dz .ec .edu .ee .eg .er .es .et .eu .fi .fj .fk .fm .fo .fr .ga .gb .gd .ge .gf .gg .gh .gi .gl .gm .gov .gp .gq .gr .gs .gt .gu .gw .gy .hk .hm .hn .hr .ht .hu .id .ie .il .im .in .info .int .io .iq .ir .is .it .je .jm .jo .jobs .jp .ke .kg .kh .ki .km .kn .kr .kw .ky .kz .la .lb .lc .li .lk .lr .ls .lt .lu .lv .ly .ma .mc .md .mg .mh .mil .mk .ml .mm .mn .mo .mobi .mp .mq .mr .ms .mt .mu .museum .mv .mw .na .name .nc .ne .net .nf .ng .ni .nl .no .np .nr .nu .nz .om .org .pa .pe .pf .pg .ph .pk .pl .pm .pn .post .pr .pro .ps .pt .pw .py .qa .re .ro .ru .rw .sa .sb .sc .sd .se .sg .sh .si .sj .sk .sl .sm .sn .so .sr .st .su .sv .sy .sz .tc .td .tf .tg .th .tj .tk .tl .tm .tn .to .tp .tr .travel .tt .tv .tw .tz .ua .ug .uk .um .us .uy .uz .va .vc .ve .vg .vi .vn .vuwf .ye .yt .yu .za .zm .zw .ca .cd .ch .cn .cu .cx .dm .dz .ec .ee .es .fr .ge .gg .gi .gr .hk .hn .hr .ht .hu .ie .in .ir .it .je .jo .jp .kr .ky .li .lk .lt .lu .lv .ly .ma .mc .mg .mk .mo .mt .mu .nl .no .nr .nr .pf .ph .pk .pl .pr .ps .pt .ro .ru .rw .sc .sd .se .sg .tj .to .to .tt .tv .tw .tw .tw .tw .ua .ug .us .vi .vn";



    $tld_regex = '#(.*?)([^.]+)('.str_replace(array('.',' '),array('\\.','|'),$valid_tlds).')$#';

    //remove the extension
    preg_match($tld_regex,$host,$matches);

    if(!empty($matches) && sizeof($matches) > 2){
        $extension = array_pop($matches);
        $tld = array_pop($matches);
        return $tld.$extension;

    }else{ //change to "false" if you prefer
        return $host;
    }



}
answered Sep 30, 2015 by shegokar.anjeet
0 votes

As a variant to Jonathan Sampson

function get_domain($url)   {   
    if ( !preg_match("/^http/", $url) )
        $url = 'http://' . $url;
    if ( $url[strlen($url)-1] != '/' )
        $url .= '/';
    $pieces = parse_url($url);
    $domain = isset($pieces['host']) ? $pieces['host'] : ''; 
    if ( preg_match('/(?P[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i', $domain, $regs) ) { 
        $res = preg_replace('/^www\./', '', $regs['domain'] );
        return $res;
    }   
    return false;
}
answered Sep 30, 2015 by suyesh.lokhande
0 votes

Here is what I am using: It works great without needing any arrays for tld's

$split = array_reverse(explode(".", $_SERVER['HTTP_HOST']));
$domain = $split[1].".".$split[0];

if(function_exists('gethostbyname'))
{
    if(gethostbyname($domain) != $_SERVER['SERVER_ADDR'] && isset($split[2]))
    {   
        $domain = $split[2].".".$split[1].".".$split[0];
    }
}
answered Sep 30, 2015 by jekbishnoi
0 votes

It is not possible without using a TLD list to compare with as their exist many cases like http://www.db.de/ or http://bbc.co.uk/

But even with that you won't have success in every case because of SLD's like http://big.uk.com/ or http://www.uk.com/

If you need a complete list you can use the public suffix list:

http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1

Feel free to use my function. It won't use regex and it is fast:

http://www.programmierer-forum.de/domainnamen-ermitteln-t244185.htm#3471878

answered Sep 30, 2015 by vijayshukla80
0 votes

No need for listing all the countries TLD, they are all 2 letters, besides the special ones listed by IANA

https://gist.github.com/pocesar/5366899

and the tests are here http://codepad.viper-7.com/QfueI0

Comprehensive test suit along with working code. The only caveat is that it won't work with unicode domain names, but that's another level of data extraction.

From the list, I'm testing against:

$urls = array(
'www.example.com' => 'example.com',
'example.com' => 'example.com',
'example.com.br' => 'example.com.br',
'www.example.com.br' => 'example.com.br',
'www.example.gov.br' => 'example.gov.br',
'localhost' => 'localhost',
'www.localhost' => 'localhost',
'subdomain.localhost' => 'localhost',
'www.subdomain.example.com' => 'example.com',
'subdomain.example.com' => 'example.com',
'subdomain.example.com.br' => 'example.com.br',
'www.subdomain.example.com.br' => 'example.com.br',
'www.subdomain.example.biz.br' => 'example.biz.br',
'subdomain.example.biz.br' => 'example.biz.br',
'subdomain.example.net' => 'example.net',
'www.subdomain.example.net' => 'example.net',
'www.subdomain.example.co.kr' => 'example.co.kr',
'subdomain.example.co.kr' => 'example.co.kr',
'example.co.kr' => 'example.co.kr',
'example.jobs' => 'example.jobs',
'www.example.jobs' => 'example.jobs',
'subdomain.example.jobs' => 'example.jobs',
'insane.subdomain.example.jobs' => 'example.jobs',
'insane.subdomain.example.com.br' => 'example.com.br',
'www.doubleinsane.subdomain.example.com.br' => 'example.com.br',
'www.subdomain.example.jobs' => 'example.jobs',
'test' => 'test',
'www.test' => 'test',
'subdomain.test' => 'test',
'www.detran.sp.gov.br' => 'sp.gov.br',
'www.mp.sp.gov.br' => 'sp.gov.br',
'ny.library.museum' => 'library.museum',
'www.ny.library.museum' => 'library.museum',
'ny.ny.library.museum' => 'library.museum',
'www.library.museum' => 'library.museum',
'info.abril.com.br' => 'abril.com.br',
'127.0.0.1' => '127.0.0.1',
'::1' => '::1',
);
answered Sep 30, 2015 by yashwantpinge

...