28
Nov

Java: guess content type of byte array

Maybe it is a not a common usecase, but I've just had need to know the content type of a bare byte[].

...

After been searching the whole Internet and evaluated dozens of utility libraries I finally ended up with a couple of very simple but effective solutions.

First solution, 100% JDK-only, based on URLConnection.

Code:

byte[] value = ...;
String contentType = null;
try {    
contentType = URLConnection.guessContentTypeFromStream(                  
              new ByteArrayInputStream(value));
} catch (IOException e) {
     LOG.error("Could not guess content type", e);
}

Second solution, suggested by Jukka Zitting, based on Apache Tika with a very small footprint (~450 KB)

Code:

byte[] value = ...;
String contentType = new Tika().detect(value);

free b2evolution skin

3 comments

Comment from: Jukka Zitting [Visitor] Email
Jukka ZittingHere's how you'd do the same thing with Apache Tika:

import org.apache.tika.Tika;

byte[] value = ...;
String contentType = new Tika().detect(value);
if ("application/octet-stream".equals(contentType)) {
LOG.error("Could not guess content type");
}

It's a bit more heavy-weight than the plain JDK (you need the 450kB tika-core jar as a dependency), but covers a much wider range of content types than the JDK.
11/28/12 @ 21:00
Comment from: ilgrosso [Member] Email
Hi Jukka, and thanks for your comment: as you can see I've just updated the post content as per your suggestion: I couldn't imagine that Tika's footprint was so thin!
11/29/12 @ 08:27
Comment from: Gray [Visitor]
GrayFor posterity, I've recently released my SimpleMagic package which uses the magic(5) unix configs to determine the content-type of a byte[]. More details and sources, docs, etc. here:
http://256.com/sources/simplemagic/
05/21/13 @ 18:06