LINKED DATA: WEB SCIENCE AND THE SEMANTIC WEB

LINKED DATA: WEB SCIENCE AND THE SEMANTIC WEB

Tim Berners-Lee [8.15.12]

A lot of people assume that Semantic Web consists only of the metadata, the data at the top of an article that indicates who it was written by. But no, it's the data. It's the government spending data. It's where the potholes are and where space ships are. It's where cars are. It's where taxis are and it is all the data that makes a map. It's the data that makes all the charts, and it's the data that makes industry run. It's the data that makes governments run. It's not just metadata, and it's not data just sucked from the Web.

TIM BERNERS-LEE is a British engineer and computer scientist and MIT professor credited with inventing the World Wide Web, making the first proposal for it in March 1989. Berners-Lee is the director of the World Wide Web Consortium (W3C), which oversees the Web's continued development. He is also the founder of the World Wide Web Foundation, and is a senior researcher and holder of the 3Com Founders Chair at the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). He is a director of The Web Science Research Initiative (WSRI), and a member of the advisory board of the MIT Center for Collective Intelligence. 

THE REALITY CLUB: Anonymous, George Dyson, Hans Ulrich Obrist, Dave Winer, Douglas Rushkoff, Esther Dyson, Nicholas Carr, Brian Eno, Craig Mundie



[55:58 minutes]


LINKED DATA: WEB SCIENCE AND THE SEMANTIC WEB

[TIM BERNERS-LEE:] The questions I'm asking myself vary depending on the hat I'm wearing at the time. I'm switching hats quite a lot these days. If I'm at the Web Foundation, then I'm thinking about what are the smallest, simplest things that we can do, what buttons can we push so that things change, so that the people who are the 80 percent of the world, who are not really members of the society of people using the web and not members of the information society, how can we get them up to speed, perhaps 15 years earlier than they would otherwise?      

When it comes to the standards, the impact of the open web platform, what missing pieces of the architecture are there? I have just come from a WPC Technical Advisory Group meeting where we were talking about the effect that every web page will basically become a computer. How is that going to change the world? What will we be able to build on top of that? What extra pieces in architecture do we have to put in now so that we'll be able to do really amazing things? There are other things where I wear research hats and hacking hats and there are lots of questions out there.

If you look back at the Internet, the Internet is a layer on which the Web was built. Since I built the web on top of the Internet —the Internet really is the service, the Internet provides to the web of transmitting packets around—it has remained basically the same. The code I wrote back then, 20 years ago, would still basically run today. You can write programs using this net in the same way. That is amazing because the speed at which those packets go, the bits per second when we connect to the Internet has gone up by a million or a billion times, depending on where we are.    

The Internet has changed massively when it comes to speed, but not in terms of the function that it provides. Even though lots of things inside it may have changed and the sorts of equipment would look very different, it's still called the Internet. I think the same may happen of the Web. The Web is also a platform. It's built on top of the Internet, but other things are built on top of that: social networking sites, search services, buying and selling, auctions. All kinds of things are built on top of the Web.


[TEMPORARY PRE-PUBLICATION VIDEO LINK]


They will continue to be built on top of the Web. ACP will probably go on being called ACP for a long time. But it may change inside. It already has graduated from the very simple protocol it started off with that's being developed. URLs will probably be around for a long time because they're a really crucial part of the architecture. But there will be new URL schemes introduced, and the way they're used and the things that they refer to will probably change. The Web, as information space, is a very fundamental, basic idea: the fact that there is an information space in which everything has some kind of a name, which allows you to get at it. That will continue, even though the technology that it was built out of will change as well as the technology that is built on top of it.

The Semantic Web is changing things. The Web exploded as a space full of documents, basically Web pages that people would read. There was a frustration from the sort of people who actually deal with data. Anybody who needed a spreadsheet, you'd have a frustration that you'd go to it as a report, go to it as a Web page, there would be a graph, but you would have to burrow down and find the data value used to make the graph, because even though that's an interesting graph, you'd actually like to be able to do something different with it. In particular, the sort of thing that is very powerful to do with data is to take that data and join it to other data.

The general rule is that data is much more interesting when it's joined with data which is from very a disparate space but which actually connects up. Recently the meme has taken off, it's a simple idea of linked data. Lately, linked data has been taking off. I'm very pleased with it. It started off with a few public data sets, and now there are hundreds of large public data sets that link data together. Those are being used by people who build apps that use the data for companies that combine it with a lot of not only public data, but also internal data. People are starting to realize that the Web of data within that company or that enterprise is very important.

In the future, we're going to see a bit of the Web of data as a read-write medium becoming the slime out of which huge new really interesting social applications will emerge—social applications that will be distributed rather than just based in one particular Web site. The Semantic Web is very exciting. It's taking a long time for all of the pieces to be in place, but it is much more complicated. It's much more powerful than the Web of documents.

It's powerful because this data, which can be easily treated by computers, and there will be much more powerful machines using the Semantic Web. Recently, we've seen all kinds of companies doing lots of things with data in a very pleasing way.


In the future, we're going to see a bit of the Web of data as a read-write medium becoming the slime out of which huge new really interesting social applications will emerge—social applications that will be distributed rather than just based in one particular Web site. The Semantic Web is very exciting. It's taking a long time for all of the pieces to be in place, but it is much more complicated. It's much more powerful than the Web of documents.


There have been a lot of misunderstandings about the Semantic Web. People said that it won't take off because it's a very rigid ontology in which everybody has to agree on the same terms. That shows lack of understanding of the nature of Semantic Web as lots of little pieces loosely joined. There have been people who say it won't take off because you can't trust the data on the Web, because people will lie on the Web. But in fact, there's a huge amount of data we rely on all the time on the Web where we trust people not to lie.

There have been lots of people who for one reason or another have put down the Semantic Web idea. But now you find, for example, search engines are starting to move away from just taking your question and pointing you to documents to read. They're starting to move toward taking your question and looking at the question really as a question and trying to figure out what are the things you're talking about and what is it you want to know about them. You're getting more and more structured information coming back.

There are large databases of all the things that people may want to know about and how they interrelate, which is basically the Semantic Web. Whether or not that data is shared with the public as open data or not, it's clear that people are starting to realize that you need to have your machines building a very large amount of knowledge about everything that you need to deal with.

The Semantic Web is a big fancy term for a Web with data. We have HTML for the Web of documents that we can read. We have the Semantic Web for a Web of linked data. It's very similar in that there are pieces of information out there, except that they're put there in a way that it's easy for machines to use. People can use them, but they have to use tools like spreadsheets, and so on.

I know there's a lot of concern, and some valid concern whenever in the course of human events we end up with a very strong monopoly. A lot of people felt they had to deter monopolies, even though they did pour a lot of money into particular development labs, they restricted the innovation that actually happened on telephones, because you couldn't just go and buy a telephone from anywhere. There wasn't really a market in interesting phones.


The Semantic Web is a big fancy term for a Web with data. We have HTML for the Web of documents that we can read. We have the Semantic Web for a Web of linked data. It's very similar in that there are pieces of information out there, except that they're put there in a way that it's easy for machines to use. People can use them, but they have to use tools like spreadsheets, and so on.       


Coming on to the Internet age, people were initially very worried because Netscape had the complete monopoly of the browser, and they assumed that Netscape would completely dominate the Web. Then they woke up one day worrying that Microsoft now would have a monopoly of the browser, and therefore would dominate the browser. Then they worried that Google or Facebook would. Clearly, looking back historically, you can see that at any one time, there's a justified worry about monopolies, but these worries can shift very quickly.  

 

The idea of the Open Web is the idea that anybody can publish something, and that when I publish something, I can make a link to absolutely anything out there, and in a way, the thing I put there is on a par with the thing that you've put there. You may be a famous writer. I may not be. I may just be writing a blog. But when I write my blog, I can point to your blog, and I can point to more or less everything. That is the wonderful medium for creativity that the Web is, and a lot of people are concerned that we keep that, we keep that openness, the ability for anybody to create stuff, the ability for anybody to be able to write the code to make a difference for the Website, and the ability to be able to point to anything. Anything that's out there for you to read you should be able to point to point to and say that you like it, to say that you don't like it, to evolve it in human discourse.  

 

People often ask about the Web and the ups and downs, with the dot.com boom, the dot.com bust and with the current recession. The economic ups and downs show that we don't really have a very good model of the economy. We don't have switch and controls on it to prevent bad things from happening. We don't understand it.


The idea of the Open Web is the idea that anybody can publish something, and that when I publish something, I can make a link to absolutely anything out there, and in a way, the thing I put there is on a par with the thing that you've put there. You may be a famous writer. I may not be. I may just be writing a blog. But when I write my blog, I can point to your blog, and I can point to more or less everything. That is the wonderful medium for creativity that the Web is, and a lot of people are concerned that we keep that, we keep that openness, the ability for anybody to create stuff, the ability for anybody to be able to write the code to make a difference for the Website, and the ability to be able to point to anything. Anything that's out there for you to read you should be able to point to point to and say that you like it, to say that you don't like it, to evolve it in human discourse.


A lot of people that I know who think about the Web—scientists of all different sorts—are looking at the Web and they see people interconnected and they realize that this is a very big, complex system. When people are interconnected by technology, when they believe things because of things they've seen on Twitter or looked up on a blog and then followed the links from various blogs, then a search engine will trace through those links and try to find out what are the current trending topics.

People individually will try to find out what's exciting and what's new, or maybe what they should invest in or what they should get out of. All these ways in which people interact form a large system, a large system that is becoming more and more interconnected. For example, social networking and investment are becoming interconnected. I've been calling "Web Science" the idea that we should be doing some serious analysis about the stability of this. We should be taking psychologists, we should be taking mathematicians who look to complexity theory, we should be taking economists, we should be taking computer scientists who build new systems, network scientists and so on and get them to come together.

In fact, this has been happening for the last few years and there have been Web Science conferences, and Web science journals, and there are a number of Web Science labs, and a number of people who call themselves Web Scientists. When people ask questions about the effect of this on the economy, I say, "That's a very good question." The question is about whether the Web will continue to be an income generator or whether it will flatten out and what different modes we may need to be stimulating the economy. Those are good questions and we should be doing more Web Science.


 I've been calling "Web Science" the idea that we should be doing some serious analysis about the stability of this. We should be taking psychologists, we should be taking mathematicians who look to complexity theory, we should be taking economists, we should be taking computer scientists who build new systems, network scientists and so on and get them to come together.


If you're a Web scientist, I salute you. I encourage you to come together in this very multi-disciplinary way that we've been doing of meeting across the original disciplines in order to master some sort of understanding of the processes going on in this massive system that we're part of, but we don't really understand.

Let's put big data in perspective. Most of the processes that you look at on the Web typically have been scale-free systems. They've got a few things that are very, very large. They've got quite a lot of things that are middle sized, and they've got a ridiculously long tail of things that are very small. When you look at the size of Web sites in general, it's reasonable to think that that's not only what you often find, but also something that is maybe optimal. Maybe it's good to have one leading producer of say, films, which a lot of people really like, and then there are a lot of medium producers which are threatening the leader, and then a long tail of really interesting different independent film producers. 

When you look at data, I suspect it's going to be a similar situation. Yes, there's going to be some really big data. There's going to be some datasets which are very large, like the meteorological data is just very large, and in a sense, rather boring because once you look at one weather balloon data point it's not very exciting to look at all the others. There's big data like that. There's big data that comes from accumulating masses of data about individuals, then looking at how a population moves. That's an interesting example of big data that is derived from individual data, but for now it looks more monolithic.

But then you find a little piece of data. Look at, for example, the UK pound/US dollar exchange rate. Your company might be doing things on both sides of the pond, and it may have lots and lots of data in dollars and lots of data in pounds and that exchange rate is absolutely crucial, and ties it all together. There are some interesting visualizations done of open government data, which they did at RPI, Jim Hendler's lot took the US data and took the UK data and of course, they needed that crucial piece to connect the two together. So it's not all about size. There are pieces of data, which are going to be really interesting. One of the stories I've heard from consultants is if you look at trying to integrate the important data within a company, you'll find you can ask for all the databases. But then there is some spreadsheet on the boss' laptop that is really the key to the future of the company, which has got the parameters which are really guiding it and tying everything together. So it's not all about size. Yes, big data is very exciting. People are concerned about big data, because they're worried about privacy in big data and whether the fact that there will be more data out about them will indirectly expose things about them. But privacy is just one aspect of data. There are lots of other ways in which data is used in which the data isn't personal at all.

When you look at the impact of data on an individual, for example, you'll find that the small data is always very important. One of the things that people are not talking about right now, but is really important, is personal data is my personal data in the sense of everything that machines can or could know about me, they could use to help me. I'm not talking about the issue of my selling that data to other people for cash. I don't think there's anybody out there to whom that data will be as interesting as it is to me.

I've got health data from when I carry around a gadget when I run. That gadget knows how I've been doing. It can watch me over time. There's all sorts of data about my health, about my finances, and they all connect. There's data in my calendar, my schedule. When you look at that data, if you think of the data that is important to you every day when you get up in the morning, for example, the temperature it's expected to be at lunchtime may be one key number that determines how you dress, and maybe also what you end up doing that day. That's a small detail but key, crucial data.


When you look at the impact of data on an individual, for example, you'll find that the small data is always very important. One of the things that people are not talking about right now, but is really important, is personal data is my personal data in the sense of everything that machines can or could know about me, they could use to help me. I'm not talking about the issue of my selling that data to other people for cash. I don't think there's anybody out there to whom that data will be as interesting as it is to me.


Your calendar of what your schedule for today is a relatively small number of bits; it's not big data, but its key to you. Realize that data is going to be scale-free. Realize that the small data can have just as much impact as the big data. Realize that we're going to have to be very clever at building systems that will let us make the best use of all this data and leverage the small data with the big data.

Linked data is a technology, a more advanced technology than basic tables, converse separated values of tables, in which you've got the possibility of linking. I mean, linking two sentences. For example, when we say population in this column, do we mean the same thing as they mean by population? Suddenly, that's very valuable. Suddenly, my data becomes comparable with that data. Suddenly, a query I do on my data becomes a query I can then move on to another data. In a second sense, linking the rows of the data. This row is about Boston, it tells us the population of Boston. When I say Boston, I mean the same thing as you mean by Boston. I mean the same thing as Wikipedia means by Boston. There is a URL out there that I can use and link data. I make links in data and that makes the data much more powerful.


Your calendar of what your schedule for today is a relatively small number of bits; it's not big data, but its key to you. Realize that data is going to be scale-free. Realize that the small data can have just as much impact as the big data. Realize that we're going to have to be very clever at building systems that will let us make the best use of all this data and leverage the small data with the big data.


The architecture for linked data is there, and there are lots and lots of people storing the data and some of it's linked and some if it isn't. Where people have gone to the trouble of linking it, they've linked it. But there are lots and lots of tools, and lots of tools independently. A data journalist may develop a tool for looking at government spending data. It's a pretty nifty tool. Then, a school child who's doing a report on volcanoes might actually find that using that tool they can do just what they need for their report. Lots of tools, more and more powerful tools, some of them aimed at professionals, some of them aimed at school children, teachers, data journalists, all give you a choice of the tremendous power that you can use when you point them at different sorts of data.


THE REALITY CLUB: Anonymous, George Dyson, Hans Ulrich Obrist, Dave Winer, Douglas Rushkoff, Esther Dyson, Nicholas Carr, Brian Eno, Craig Mundie

[Ed Note: In preparation for this interview I asked a number of Edgies to submit questions for Tim Berners-Lee which he read on camera and answered.


QUESTION

"I've always wondered if HTML was just a quick hack, or was it well thought out? It always struck me as a language as dumb as DOS. Done in the heat of battle just to have something in place. If so, have there been any regrets that such an opportunity was squandered. That could be wrong. Tim might not have had anything to do with it, anyway."

I did have a lot to do with the original design, I had that tremendous luxury of designing HTML without anybody looking over my shoulder. Was is a quick hack? Well, it was designed fairly rapidly—the whole process was just a few months. It's not a beautiful design from the ground up of a markup language. If you look at all of the bits of the Web, they're designed to look just like something else, which some developer somewhere is going to feel comfortable with.

In this particular case, the people at CERN, where I was working, they were using a particular form of SGML, Standard Generalized Markup Language. SGML comes in lots of flavors, but this one particular flavor had angle brackets and it had things like H1 for "heading one". All the people at CERN who had documents on the mainframe were used to writing H1 for the top-level heading. So obviously, if I put H1 in angle brackets, they're going to look at it and they're going to think that they can live with that. They can imagine themselves producing that stuff. The angle brackets were to look like a particular document type that was available at CERN.


I did have a lot to do with the original design, I had that tremendous luxury of designing HTML without anybody looking over my shoulder. Was is a quick hack? Well, it was designed fairly rapidly—the whole process was just a few months. It's not a beautiful design from the ground up of a markup language. If you look at all of the bits of the Web, they're designed to look just like something else, which some developer somewhere is going to feel comfortable with.


When you look at HTTP, the protocol was designed to look like NNTP. NNTP is the Network News Transfer Protocol that had been very successful in distributing news all over the Internet. It was supposed to look like SMTP, the mail transfer protocol, so that any developer that had been hacking on those protocols would look at the thing and say, "I've done SMTP, I can do HTTP, URLs similarly". The double slash came from the Apollo domain system. The HTTP colon (http:) came from lots of computer systems where you start off with introducing a new sort of system with the colon. The whole thing wasn't designed from the ground up to be a beautiful new consistent system. Each bit was designed to look like something that already existed to make the people feel comfortable.   


GEORGE DYSON

Science Historian; Author, Turing's Cathedral: The Origins of the Digital Universe; Darwin Among the Machines

"What is one thing each that (A), surprised you, (B), scares you, (C), that you would do differently." 

I was surprised that people would edit HTML files by hand. By 1989, we had WYSIWYG processors, or "what you see is what you get" processors. People expected to type in a letter on the screen and press a button and it should come out of the printer as it was on the screen. Yes, some people still used dot commands and mark up. But it was really out of fashion.

Certainly in, for example, the hypertext community, people would expect to edit hypertext with a WYSIWYG editor. My assumption was that you would have to have a WYSIWYG editor for HTML. It didn't really matter what sort of angle brackets you used too much. The original Web browser that I made on the NeXt was a browser editor, and you never saw the mark up. You didn't see the URL, you just saw a bit of hypertext if you were writing, you could then go to another piece of hypertext if you were reading. You could highlight one and then highlight another and press some hot keys, and then you'd get a link between the two and you could save them.

The fact that people started looking at their Web pages and just writing them by hand was amazing. In fact, what happened was that Web pages took off and the editors didn't. So the people who wrote browsers just made browsers more and more sophisticated, which would handle more and more sophisticated types of HTML. But the editors never kept up, which is a shame.

Now even, when a lot of people writing in Wikis and blogs have to write in a funny sort of mark up. Sometimes they have an editor, but it doesn't give them the full power of HTML. In general, most people don't use the the powerful tools available today to write a Wiki. That's one thing that has surprised me.

The one thing that's always top of the list of what scares me scares me is control by a large entity—the Internet being controlled by a large company or the Internet being controlled by a large government. Whether it's government or a company depends to a certain extent on what country you're in. But it might be good to look at some of your initial assumptions for that country. At the end of the day, perhaps what's most frightening is when you find that it's large governments and large companies, in fact, connected together. The information collected, for example, about what people do on the Internet by a large company, is then used and abused by large government. The scary thing is the control.

What would I have done differently?  I would have removed the double slash from URLs. People respond to this because they try to work out how many seconds and how many broken fingers would have been saved if I had removed double slashes. In fact, there was a reason for the double slash, because it meant you could in fact start your URL with a new Web page. If there was a Web page on HTTP and the Web page you are linking to is in HTTP, you didn't have to put HTTP colon so I put in the double slash.


The one thing that's always top of the list of what scares me scares me is control by a large entity—the Internet being controlled by a large company or the Internet being controlled by a large government. Whether it's government or a company depends to a certain extent on what country you're in. But it might be good to look at some of your initial assumptions for that country. At the end of the day, perhaps what's most frightening is when you find that it's large governments and large companies, in fact, connected together. The information collected, for example, about what people do on the Internet by a large company, is then used and abused by large government. The scary thing is the control.


In fact, people always write HTTP colon. They never start with the double slash. Because of the double-spaced colon the double slash becomes redundant. But do you know what? It may have been useful because a few people who used the Apollo domain file system (which used the double slash) may have looked at it and thought "I can use that". So maybe it was useful enough getting them on board.

If I would do the whole thing differently (the URL system for HTTP) I'd probably design it so that the domain name and the path in fact were all blurred into one.

But that's another rather long story.


HANS ULRICH OBRIST
Co-Director, Curator, Serpentine Gallery, London; Editor: A Brief History of Curating; Formulas for Now; Co-author (with Rem Koolhas): Project Japan: Metabolism Talks

"What is your one unrealized project, your dream?"

Dave Winer says I always want to talk about Semantic Web, something I've been working on for ten years. For me, it's got a tremendous amount of potential. A lot of people have got the wrong idea about what it is, but now that things are taking off with the world of linked data, the fact that now, lots of companies are starting to think in a much more Semantic Web way inside, it's very exciting. Other people are getting on board.


I always want to talk about Semantic Web, something I've been working on for ten years. For me, it's got a tremendous amount of potential. A lot of people have got the wrong idea about what it is, but now that things are taking off with the world of linked data, the fact that now, lots of companies are starting to think in a much more Semantic Web way inside, it's very exciting. Other people are getting on board.


I suppose my unrealized dream is seeing a huge amount of integration in my life with my personal data, with the data I share and with the groups that I work with. The enterprise data, for example, and the public data and seeing all that integrated to a very good effect. Given the spare time, I will spend it hacking together user interfaces for the Semantic Web, because I find they pay off. I've used them in day to day as productivity tools. I found a small amount of coding, when I get time for it, really makes it tremendously satisfying because then it gives it extra power to do things just more rapidly and to do some things that I couldn't really think of doing before.   


DAVE WINER
Founder of UserLand Software; pioneer of Internet standards in distributed computing, including SOAP,XML-RPC, RSS and OPML

"What do you think of RSS? Should we build on it, to make an independent news network to stand alongside the big corporate networks? Have we achieved his original vision of the web? Did it include something like Twitter or Facebook? Aside from the Semantic Web, which you are sure to want to talk about, what would you like us all to be doing?

I'm only going to give you a few of those many questions, Dave, because they're all good.

No, I don't think we've achieved at all what I set out to do, even if you go back to the very beginning. Yes, I've talked a bit about the Data Web. The Data Web is something that is taking off. It's very exciting. The Semantic Web, which is the Data Web plus a lot of powerful tools, is going to be even more powerful. But if you just go back to the original goal with the Web at CERN, one of the things I wanted to use it for was because I was building projects like the Web, and other software projects as a collaborative project with teams right across the planet, and I wanted to be able to use the Web as a shared brainstorming space, a shared design space.

I wanted it to be a sand box, if you like, where we could collaboratively brainstorm about ideas, where we could make decisions about which particular things we'd do. Then when other people joined the project later, they could come back and without any briefing from a human being be able to just get on board by following with the hypertext links, understanding why the designs had been done, finding things that still needed to be done, fixing them, joining in and adding in hypertext, donating their insight, their code, their solutions. Then when they left, not having to go through a debriefing or debrief because they'd added to the Hypertext Web. That was the idea.

One of the things I wanted from it was this very, very powerful collaborative space to be very, very intuitive. The sort of thing so that if you like, if we have a big problem we're trying to solve together, and part of the solution is in my head and part of the solution is in your head, how can we use the Web in a way that will allow us to put those pieces, those two half formed solutions together into a complete solution.


We have just only scratched the surface of the sorts of things that we need to do, the things I call the social machinery that enable us together to work as humans much more effectively, to be able to solve a problem that involves connecting together ideas in different people's brains.


When people follow each other's blogs we get a bit of that. When people chat together on IRC, we get some bits of that. When people use the tools that we build around their software repositories, for example, we get a bit of that. We have just only scratched the surface of the sorts of things that we need to do, the things I call the social machinery that enable us together to work as humans much more effectively, to be able to solve a problem that involves connecting together ideas in different people's brains.


DOUGLAS RUSHKOFF
Media Analyst; Documentary Writer; Author, Program or Be Programmed; Life, Inc.

"What do you think of efforts to "fork" the Web? Does the fact that government or business can control the Web (through DNS and other means) neutralize the network's power, or is it okay with you that the openness of the Web is always dependent on the good graces of central authorities? Do we need a new Web, or should we carry on working to improve the one we have? "

If by forking the Web, you mean taking a Web and wrapping it up in a box where it's controlled, then I would say, as I've mentioned, the biggest fear is when governments or industry try to, or succeed in controlling the Internet or to spy on the Internet with repercussions, depending on what you do, which effectively prevents you from using the Internet for lots of things. The fact that governmental business can control the Web through DNS is really serious. It's something that we all have to be aware of all the time. We have to watch it.

Yes, we have to have law enforcement. It's very important that law enforcement has quite strong powers when it comes to being able to look into spying on the Web. But I feel we need to have agencies that have watchers to watch the watchers, and I don't see that. I don't see which agencies are checking to see whether the wire-tapping laws are being used appropriately, whether data that has been harnessed about individual users is being filtered down and only kept for people who have been demonstrated in the courts to have broken the laws.

I don't see people following up on how data is being used in long term, going back and informing people and apologizing actually in the cases where the data has been abused. In general, this is a very serious case. I've spoken about it a lot, both that the spying has to be done under certain circumstances and in very controlled ways, and there have to be agents watching the agents. With regard to "blocking", similarly, only under very severe situations perhaps can I imagine needing to block a site because of a very serious virus problem or because of the introduction into the network of  a worm which is spreading and we have no other way of handling it. But I only see doing that in extreme situations, and having the keys to that nuclear capability in only very good hands and probably in a double lock system.


The biggest fear is when governments or industry try to, or succeed in controlling the Internet or to spy on the Internet with repercussions, depending on what you do, which effectively prevents you from using the Internet for lots of things. The fact that governmental business can control the Web through DNS is really serious. It's something that we all have to be aware of all the time. We have to watch it.


About whether we should stick to this Web and improve it, or whether we should use a new one: in general, it's difficult to make a new one. If you looked at the difficulty that people of the Internet layer have had trying to persuade people to switch from B4 to B6. It hasn't been easy. In general, the rule in the Internet has always been no flag days, no sudden day when everything changes; no, it's incremental change.

It will always be the Web. It will change. It is changing all the time. People are revising HTTP. They're revising HTML as we speak. New protocols are being invented; new ways of using the Web are being invented. The Web is evolving all the time. The idea, the concept of the Web as an information space in virtual space will continue to be very important. But what it actually means in terms of technology will evolve bit by bit.


ESTHER DYSON

Catalyst, Information Technology Startups, EDventure Holdings, Former Chairman, Electronic Frontier Foundation and ICANN; Author: Release 2.1

"What are your thoughts about the new Top-Level Domain Name Applications, being announced today (6.13.12) in London, and the overall approach being taken by ICANN." 

Esther, you and I talked when we were going on stage early on in this. I talked to you about the issue, the question as to whether a domain name should be owned or rented. You said it's really important; of course they should be rented, that's very important.

Of course, one of the problems we have to deal with is what happens when accidentally or on purpose or through bankruptcy or just through the course of events, somebody stops owning a domain name. What happens to all the material, all the poetry they wrote? What happens to all of the family photographs they took which are on that domain name? Whereas the domain name perhaps for e-mail, it's reasonable for people to lose them after they die.

In a lot of cases there's a lot of damage that happens when things go "Error 404" because people don't look after them, or when the whole domain disappears, which in a way is worse. The "Error 404" gives you ways of finding out where the thing might be when there's something you can chase after. We should look at ways of making domain names move into mode, an archive where they will be preserved with humanity.

What do I think about the top-level domain names? One needs a balanced opinion about those things. But let me put across a point of view which maybe hasn't been expressed enough. Top-level domain names are happy as they are. People have said, we should have alternatives to dot.com and dot org. And dot.base came out and dot.info. Did the world benefit? What I'd like to see in ICCAN, is the question asked of what's the value added for humanity? Not for particular players. If you come along and you start a top-level domain, then I can make lots of money, then the person who gets the domain name, whether it's the top level or at any level to a certain extent, but obviously the top level has hugely more impact. If you get a top-level domain, then you get to make a lot of money by then renting out things within that to other people. In a way, then ICCAN can print money. When ICCAN prints money, is this actually adding to the value? There's an argument that if a new top level domain is released, then everybody has to go out and not only invest in an existing one, but they have to go out and buy the others. Sometimes they do. Sometimes they don't. If they do, then effectively what you're doing is you're deflating the value of all the money. You already sold people domains, and now you're saying, "Yes, we know we sold you the domains, but now if you want to own that trademark then you really have to buy all these, too. And you have to buy these from different people, but you have to, to defend your trademark." What is the actual value added to society?


Top-level domain names are happy as they are. People have said, we should have alternatives to dot.com and dot org. And dot.base came out and dot.info. Did the world benefit? What I'd like to see in ICCAN, is the question asked of what's the value added for humanity? Not for particular players. If you come along and you start a top-level domain, then I can make lots of money, then the person who gets the domain name, whether it's the top level or at any level to a certain extent, but obviously the top level has hugely more impact. If you get a top-level domain, then you get to make a lot of money by then renting out things within that to other people.


People understand dot.com and dot.org. Dot.org could be around so it is only available to non-profits, then you would have had a value, a particular value. It would have been able to place a certain amount of trust that something was a non-profit, that the overall dot.org domain is not actually run like that. But in a way, dot.com and dot.org are used interchangeably.

What's valuable is when you make a new top-level domain, for example, we had a dot.archive domain where anything put in that domain on the Web would be automatically licensed in such a way that if the domain owner went away, was bankrupt, then automatically, libraries would take it on. When a new domain name introduces a socially different structure that is valuable to humanity, then yes, that's useful. When there's a new contract that provides new value, yes. When it's just a question of somebody managing to persuade ICANN that they should be allowed to make lots and lots of money at the expense of people who own trademarks, then it fails the test.


NICHOLAS CARR
Author, The Shallows and The Big Switch

"You have said that one of your desires in inventing the World Wide Web was to 'to create a sort of human meta-brain—getting connected brains to function as a greater human brain.' One thing we know about the human brain is that, for all its wonderful capacities, it is also subject to all sorts of flaws and biases, which can distort our thoughts and perceptions. I wonder if you see analogous flaws and biases affecting the web's "meta-brain"? I'm thinking, for instance, of evidence suggesting that search engines, because they tend to amplify measures of popularity, may in some cases narrow the information people see rather than broadening it. Or the way popular new "social filters" may create information cascades that lead to groupthink or polarization. In general, as the web has evolved over the last 20 years, do you think the "meta-brain" has come to think more clearly or less clearly?'

The Web is humanity connected, so if you like, it's a lot of brains connected together. It's not at the moment a big brain. It doesn't itself have its own thoughts, but it is a system that allows us to solve problems more or less well, depending on how we manage to interact.

There are really interesting questions. I talked earlier about the need for Web Science, the need to have people who are analyzing this and analyzing this huge system of as many Web pages out there as there are neurons in the brain. This is a big system and it's one that we depend on. Therefore, we must analyze it.

Do I think it's always behaving well? Let me ask some questions. Do you think when you look at Twitter, that the Twitter system itself is promoting useful information rather than useless information? What evidence would you have to that effect? If that's the case at the moment, then in fact with regard to the way you use Twitter, it's pretty valuable in slightly useless things.

Have you seen anybody prove that there couldn't come a day when Twitter suddenly ends up being a medium that produces a mass conspiracy theory, for example? If you just look at a word on Twitter, the thing you will see is that people may be making all sorts of statements, but they're generally fairly strong. I wonder whether it's the strong statements that are re-tweeted, because the strong statements are the things that appeal to the soul of the person who is re-tweeting it, and that causes them to make that re-tweeting reflex.


The Web is humanity connected, so if you like, it's a lot of brains connected together. It's not at the moment a big brain. It doesn't itself have its own thoughts, but it is a system that allows us to solve problems more or less well, depending on how we manage to interact.


Twitter then becomes an amplifying medium for strong emotion, but it becomes a damping medium for low emotion. If you look at Twitter and you take Twitter as your metric of humanity, in fact, you're seeing it through strong emotion glasses. What effect is this going to have on humanity when something very serious happens and people get really upset? Will it in fact, be more of a polarizing influence in this already over-polarized society? These are questions we need to ask.  


BRIAN ENO
Artist; Composer; Recording Producer: U2, Coldplay, Talking Heads, Paul Simon; Recording Artist

"How do you respond to Evgeny Morozow's fears about the possible dark futures of the net. (Perhaps after Stuxnet and Flame it's clear even to ardent siliconists that things could go less-than-Utopian, but I'd like to hear his take on it)."

Do I think that the dark will eventually overcome the light?

I suppose the most significant bit of the answer is no. But the light is going to have to fight really hard in order to continue. When you look at the Web, you see humanity connected. When you look at the links that are followed, when you look at the tweets that are sent, see the operations out there, a lot are done by humans, not all of them, but lots of them, and the critical side of the operations out there are done by people. You see humanity. You see the state of humanity. You see good. You see bad. You always have. Every powerful invention has been used for good and for bad. Information itself is a very powerful thing.

It can obviously be used for good and for bad. People can obviously take systems that have been designed for good purposes and turn them against themselves or use them for nasty reasons. But in general, I am an optimist when it comes to humanity. I believe that we've got in our DNA lots of things which make us in balance as individuals work towards the common good and keep a look out for what I'm doing now will do, will it have an effect on humanity in general? As a result, the Net effect of people all working together across the Net is in the end, working towards using the net for better health, better education, and better democracy and more peace. I'm an optimist.        

When we grew up, we weren't thinking of ourselves as commodities that corporations would trade on. But perhaps that was our mistake. Corporations didn't have Internet. But on the other hand, they did make sweeping assumptions about what your typical housewife, or your man about the house, would want. It would encourage you, and there was television advertising. To a certain extent, in a way the television advertising entity and the idea that one size fits all television had lots of problems, some of which to a certain extent have been solved by the Web, and at least in the Web, you have a huge choice.

Yes, each Web page now looks at you as a commodity, and perhaps now people are more in the sense of trying to figure out from what you've done, from the data it can get from its partners, where you fit in this multi-parameter space, where you are in the socio-economic spectrum. And therefore, what it should serve you in the way of ads.

It's reasonable for people to worry about privacy. There's work happening in W3C, where a lot of people have been thinking in other workshops and groups about privacy and about how privacy on the Web in general could evolve to be better.

At the end of the day, Some people despaired when they found a lot of advertising on the Internet. But remember, there's a lot of stuff out there that is not advertising-funded, and you have a choice in what you read there. Personally, I'd like to have more choices.. For example, for children, I'd like to be able to pay fees for educational material without seeing ads, and I'm prepared to pay for that. Then I would imagine that the ads could go away in places where I pay I'm not subjected to advertising.

It is important that people who write music can put bread on the table. We have to find more ways of getting the money from the people who listen to the music, maybe not all of them, maybe just a subset of the people who listen to music, to the people who perform the music and the people who write the music. That may involve different sorts of payment protocols, different user interfaces. At the moment, there's a lot of creativity happening on the Web about new types of payment systems, new sorts of markets. I hope it will end up meeting everybody's needs without everybody feeling that the world is just horribly commercial.

The big trend in the Web in general is going to mobile. There are many more mobile Web browsers than there are Web browsers on laptops. I suppose one of the exciting ends of it is that as an executive with a tablet in one pocket, the phone in another, and a computer in your briefcase, you've got more ways of doing the same sorts of things. One of the challenges there is that your executive can optimally use this interface on the phone and then maybe move to take advantage of a huge screen.


It is important that people who write music can put bread on the table. We have to find more ways of getting the money from the people who listen to the music, maybe not all of them, maybe just a subset of the people who listen to music, to the people who perform the music and the people who write the music. That may involve different sorts of payment protocols, different user interfaces. At the moment, there's a lot of creativity happening on the Web about new types of payment systems, new sorts of markets. I hope it will end up meeting everybody's needs without everybody feeling that the world is just horribly commercial.


The other interesting thing about moving to mobile though is that there are a huge number of people who can only afford mobile. Poor people who can't afford anything but a phone in the large parts of the developing countries, for example, are a huge population and getting them enfranchised is going to be very exciting, very important. It could change the look of the Web.

There are people who have mobile phones and don't have Web browsers on them. But for people who have mobile phones that have Web browsers, still a huge number of those people may go onto the Web and find nothing really there in their language. There are many more things than just mobile, but mobile is a key piece of getting the Web to the people who don't have it by now.


CRAIG MUNDIE
Chief Research and Strategy Officer, Microsoft Corporation

"I know that you have been advocating for more than ten years that the next phase of the World Wide Web would emerge from having more explicit semantics expressed in metadata. The W3C has worked toward this but with limited success so far in adoption. On the other hand, more metadata is emerging on the Web in unstructured ways, and may lead to a similar outcome.  It would be interested to know how you see this evolution happening these days and whether you see the path now as more likely to come from the more-formalized approach that you and the W3C were pursuing, or from a more ad hoc development of these capabilities through only semi-structured metadata and the increasing capabilities of the search infrastructure."

First of all, Craig, I'm not sure whether we mean the same thing by metadata. When I say metadata, it means data about data. Metadata is a very important thing, because when we have data, we need to know its problems for example. The problem with metadata is who wrote an article is metadata. But the vast majority of data is actually data, not metadata. Government spending data is data. It's not metadata.

A lot of people assume that Semantic Web consists only of the metadata, the data at the top of an article that indicates who it was written by. But no, it's the data. It's the government spending data. It's where the potholes are and where space ships are. It's where cars are. It's where taxis are and it is all the data that makes a map. It's the data that makes all the charts, and it's the data that makes industry run. It's the data that makes governments run. It's not just metadata, and it's not data just sucked from the Web.

Another set of people imagine that all information is on the Web, therefore, the only way you're going to get data is by reading Web pages. In fact, now, if the Web pages have been done using RDFa technology, then you can reliably extract hard data from it. If you look at Best Buy, for example, Bestbuy.com, the Web page is about products they have RDFa data, and there are lots of companies now that are putting data out there in RDFa, so that's taking off. But there's lots of data. For me, really most of the things that are out are in relational databases.

The relational databases are sitting there in government agencies or in corporations, sitting there and going round and round and round, being used within that particular group and not being shared, and there's a huge lack of value. Now, with this coming explosion of open government data, for example, we're seeing people realize that hugging the data, as Hans Rosling calls it—"hugging disease"—is a losing game because actually you're going to get more kudos from giving the data to other people.

People are getting a lot of kudos from having to share their data. People are finding that, for example, when there's a Freedom of Information Act, if they just put the data on the Web, then they can close down the set of people whose job was constantly answering Freedom of Information Act requests, and just say "You know what? It's on the Web. Don't come to us." Life gets simpler. People are finding that within government departments, the data that they got from different departments, which they get through the public interface, (which they had the right to get before but they didn't really realize it existed or they could have done it) but it would have involved some inter-government transfer. You know, life is too short and they never did it, so they never saw that data and they never used it in their calculations. People are finding that exposing this data, putting it out there is very valuable in lots of different ways.


A lot of people assume that Semantic Web consists only of the metadata, the data at the top of an article that indicates who it was written by. But no, it's the data. It's the government spending data. It's where the potholes are and where space ships are. It's where cars are. It's where taxis are and it is all the data that makes a map. It's the data that makes all the charts, and it's the data that makes industry run. It's the data that makes governments run. It's not just metadata, and it's not data just sucked from the Web.


They're realizing something along the lines of the initial excitement about the Web—that people who had never met used it to put things out there in ways never imagined. The same thing is happening now with the world of linked data. It's a mixture of different sorts of data, and in general, there is this fluffy data that people may have pulled from natural language processing from Web pages and that may get a reasonably high quality, but the bulk of the data I'm interested in is the data that is solid data from databases that is now being linked together more and more. Sometimes it's curated by companies, sometimes it's curated by crowds. That is producing a revolution as we speak now, which is really exciting and the more formalized approach is paying off.