Kuro5hin.org: technology and culture, from the trenches
create account | help/FAQ | contact | links | search | IRC | site news
[ Everything | Diaries | Technology | Science | Culture | Politics | Media | News | Internet | Op-Ed | Fiction | Meta | MLP ]
We need your support: buy an ad | premium membership

[P]
Open Source Reverse Engineering Tools

By kris in News
Wed Jul 05, 2000 at 12:52:04 PM EST
Tags: Software (all tags)
Software

As Open Source Software gets used more and more in Closed Source Programs, it may become necessary to demostrate that Open Sourced Code has been used in the production of a Closed Source product. How do you prove such use, if not by the use of Decompilers, Disassemblers and other tools of reverse engineering? Such tools are essential in documenting propietary protocols and APIs.

What tools of this type are presently available, for the Linux and the Windows platform? Do you think they are necessary to bring balance to the force? Do you think they are useful in keeping Closed Source ripoffs of our code at bay?


Sponsors

Voxel dot net
o Managed Hosting
o VoxCAST Content Delivery
o Raw Infrastructure

Login

Related Links
o Also by kris


Display: Sort:
Open Source Reverse Engineering Tools | 38 comments (33 topical, 5 editorial, 0 hidden)
Not necessarily possible (4.50 / 4) (#5)
by Imperator on Wed Jul 05, 2000 at 12:43:01 PM EST

If we simply the matter down to a single function, it's almost impossible to prove that it was copied unless you have the source. After all, with an optimizing compiler, how can you tell that Joe Hacker's function is different than Evil Company's function? It's not unlikely they'll compile/disassemble to the same instructions, with different addresses.

Of course, there are always tricks to help you catch companies that try to steal code without reading through it thoroughly. Stick a few unusual but inconspicuous strings in strange places. (Function calls work well if you have a function that will ignore a particular argument based on another argument; alternatively, try a debugging message like "if (time() == 0) printf("ip%%

Ensuring that optimizers don't remove the code (3.50 / 2) (#7)
by mind21_98 on Wed Jul 05, 2000 at 01:11:09 PM EST

You could do something like:

char *copyrightA, *copyrightB;

copyrightA = "libxxx, you're violating the GPL!";
copyrightB = copyrightA;

This would ensure that the optimizer does not remove that definition from the final file. Then you could do:

grep "libxxx, you're violating the GPL!" binary

If it does match, it should say: "Binary file xxxx matches"

--
mind21_98 - http://www.translator.cx/
"Ask not if the article is utter BS, but what BS can be exposed in said article."

Re: Ensuring that optimizers don't remove the code (4.00 / 1) (#8)
by Cariset on Wed Jul 05, 2000 at 01:22:33 PM EST

But sadly, if the code was open source in the first place, it'd be possible for the company to check for this sort of thing, too. And remove it from the source before they compile it.

The way around this would be to obfuscate or encode the message, but that kinda defeats the purpose of being open source in the first place. If you make it possible for programmers to understand, you make it possible for companies to understand. If you make it impossible for programmers to understand, then what's the point of opening the source? (The software might still be Free-as-in-speech, but without the source code being comprehensible, it's only a half-assed sort of Free...)

[ Parent ]

Re: Ensuring that optimizers don't remove the code (5.00 / 1) (#13)
by tzanger on Wed Jul 05, 2000 at 05:31:05 PM EST

Actually I believe the proper method to check for code violtations is to fingerprint your code function the same way that nmap can fingerprint an OS by analyzing its TCP stack response. Perhaps your library behaves in a peculiar way when tickled with certain data.

Of course you don't do an out-and-out check for that data in your code -- that can be seen and removed. Rather you want the algorithm to have a particular response, much the same way as the various TCP/IP stacks respond.

I didn't say it'd be easy. :-)

[ Parent ]
Err, doesn't DMCA/UCITA make this illegal? (4.00 / 1) (#9)
by Anonymous Hero on Wed Jul 05, 2000 at 01:47:13 PM EST

As I recall, one (or both?) of DMCA/UCITA made it illegal to reverse-engineer.

Or am I just hallucinating?

Re: Err, doesn't DMCA/UCITA make this illegal? (none / 0) (#11)
by Oxryly on Wed Jul 05, 2000 at 04:54:30 PM EST

Not only that, but any End-User License Agreement worth anything at all (and thats just about all of them) prohibits it as well.

IANAL, but I'm afraid it seems there's no legal founding for this approach...

Oxryly

[ Parent ]
Re: Err, doesn't DMCA/UCITA make this illegal? (4.00 / 1) (#12)
by mattdm on Wed Jul 05, 2000 at 05:13:27 PM EST

Those provisions of EULAs have generally been found to be unenforcable. (Which is a *good* thing.) The new laws take away a lot of our rights as users, unfortunately.

[ Parent ]
Unenforceable? (none / 0) (#36)
by marlowe on Sun Jul 09, 2000 at 08:56:14 AM EST

Tell us more. Cites, please.

I'd reverse engineer in a minute if I could be sure there were no legal hang ups. And no, I don't want to move to Germany.

--- A vacant engineer rides on a train of thought that will not take him home ---
-- The Americans are the Jews of the 21st century. Only we won't go as quietly to the gas chambers. --
[ Parent ]
Re: Unenforceable? (none / 0) (#38)
by mattdm on Mon Jul 10, 2000 at 09:55:53 AM EST

Example: Sega v. Accolade -- "We conclude that where disassembly is the only way to gain access to the ideas and functional elements embodied in a copyrighted computer program and where there is a legitimate reason for seeking such access, disassembly is a fair use of the copyrighted work, as a matter of law."

There's more here.

[ Parent ]
Re: Err, doesn't DMCA/UCITA make this illegal? (none / 0) (#35)
by Anonymous Hero on Sat Jul 08, 2000 at 04:10:21 AM EST

So what is it that these organisations make illegal about reverse engeneering?

You are allowed to open the executable in a hex-editor, aren't you? If I would be brillant enough to be able to interpret all opcodes myself, who's (legally) stopping me from reading it? If I'm a little less brillant I could perhaps write a tool to assist me at this job. Is that illegal?


[ Parent ]
Re: Err, doesn't DMCA/UCITA make this illegal? (none / 0) (#21)
by kris on Thu Jul 06, 2000 at 12:49:58 AM EST

There is no DMCA or UCITA in Germany.


[ Parent ]
Re: Err, doesn't DMCA/UCITA make this illegal? (none / 0) (#34)
by techt on Fri Jul 07, 2000 at 06:23:46 PM EST

None in Norway, also. Norway also expressly allows reverse engineering for interoperability. Yet, Johanson was still raided and will be going to court against the MPAA.

I think you'll find companies don't care. If the exiting law doesn't cover you, they'll just scream "piracy!" or "access tampering!" and have you in a lengthy and costly legal battle.


--
Proud member of the Electronic Frontier Foundation!
Are You? http://www.eff.org/support/joineff.html
[ Parent ]
Adding identifying information to Open Sourced Cod (4.00 / 1) (#10)
by DeepDarkSky on Wed Jul 05, 2000 at 02:35:13 PM EST

At the risk of making the source code less readable/maintainable, the code can have static data added in such a way that it is inconvenient to remove it without making the function useless. Say, for example, if there was some data that is embedded in a large block of static data and different XOR masks were used to extract the information out. Since the static data will be declarative data like a string constant, then it can serve as a fingerprint for the code. I don't know if this is a viable alternative, but it's the only one I could think of in the short term. I don't see how open source code can be "proved" to have been included with a closed source project if there are so many variables involved - platforms, compilers, optimizations, and devious 'slight' alterations by the programmer.

The only other thing I could think of, is possibly have a big library of signatures (maybe MD5) of compiled snippets in as many different configuration as possible, and then searching for matches. I don't think that would be very practical at all. I still think it'd be best to devise something with declarative data at the source code level.

Simple tools for simple minds (4.00 / 2) (#14)
by Anonymous Hero on Wed Jul 05, 2000 at 05:33:35 PM EST

The best tools I've found are the unix tool "strings" and any dump utility. You see, feebs who steal OSS to incorporate into commercial products are almost always stupid, and those who aren't generally are convinced they will never be caught regardless of their crimes. So, you can probably pick up verbatim stuff from OSS routines quite easily in most cases. Please don't compromise the efficacy of OSS by including copy protection - that's lowering yourself to the level of those you'd probably be better off using for paving. --Charlie

Publishing with Perl (1.00 / 3) (#15)
by igaborf on Wed Jul 05, 2000 at 07:09:41 PM EST

Where I work we publish an annual directory that is derived from an Access database. The form of publication is dead tree -- it's a couple hundred pages of tabular listings. Until this year it was published by the time-honored method of bringing the Access data into Word and then into Pagemaker where the page layout was done, to the tune of about 150 person-hours per year of DTP effort. It took about 8 hours to write a Perl program that:
  1. Queried the Access database via DBI::ODBC
  2. Laid out and formatted the book pages in a PDF file using pdflib
The PDF file can be sent directly to the printer, thus replacing 150/hours annually with a one-time 8-hour job.

My boss was very happy!



Re: Publishing with Perl (1.00 / 1) (#17)
by hummer on Wed Jul 05, 2000 at 08:40:13 PM EST

My boss was very happy!

I'm sure he was!

oh.... sorry... what did that have to do with reverse engineering stuff? :P

I thin you were after "Perl: it's not just for breakfast" 2nd door on the right....

hummer

[ Parent ]
what exactly qualifies as "copying"? (3.00 / 1) (#16)
by Justinfinity on Wed Jul 05, 2000 at 07:39:58 PM EST

i know that it's considered copying if someone includes a source file pulled from the source tree of a projects, or cut & pastes a chunk of code from a file. but what about if the programmer is just stuck on a problem and decides to browse through the code of a similiar project? the programmer finds the implementation in the other program, then proceeds to do the same thing in his own program, albiet with different names and other subtle changes based on their own programming style. is this copying?

-justin
Re: what exactly qualifies as "copying"? (none / 0) (#18)
by hardlogic on Wed Jul 05, 2000 at 09:28:14 PM EST

Isn't that pretty much one of the major points made for Open Source? Developers helping each other; sharing knowledge; avoiding re-inventing the wheel; avoiding inelegant implementations/architectures/solutions, etc.

[ Parent ]
Re: what exactly qualifies as "copying"? (none / 0) (#27)
by meadows_p on Thu Jul 06, 2000 at 04:54:33 AM EST

The idea of the community though is that it's reciprical. Whether it will actually work like that is a different matter, also depending on the legality of the GPL it could be theft rather than just sharing.

[ Parent ]
This only works if you're the author of the OSS, b (3.00 / 1) (#19)
by krogoth on Wed Jul 05, 2000 at 09:47:41 PM EST

If you're the author of some open source software, you can use an easy "copy protection system". I'm not sure if this would work (i don't really understand how compilers work), but if you have, at the start of the function, "char stuff[] = 'thisisheretofillxcharacters';", then you can search the .exe of the closed-source product for 'thisisheretofillxcharacters'. It's easy to get around if they know what it's there for, but it could work.
--
"If you've never removed your pants and climbed into a tree to swear drunkenly at stuck-up rich kids, I highly recommend it."
:wq
Reverse engineering for compatibility (4.00 / 1) (#20)
by Anonymous Hero on Thu Jul 06, 2000 at 12:49:42 AM EST

I've been looking around for Free decompilers recently, with an eye to using one on the various bits of windows that don't interoperate well (ASF, bits of SMB). I can't find any tools at all. There seems to be very little interest in this, which is surprising. I may end up starting a project to do it myself. I believe it would be a comparable task to writing a C compiler - several man-years, but doable by one or two people. (Remember, reverse engineering for compatibility is protected in Europe)

Re: Reverse engineering for compatibility (none / 0) (#29)
by odradek on Thu Jul 06, 2000 at 06:12:01 AM EST

If you ever do decide to take this on, it is a project I'm quite interested in, myself. I even put some work into this at one point. It was only because of IDA Pro that I eventually stopped working on it.

[ Parent ]
Re: Reverse engineering for compatibility (none / 0) (#30)
by kris on Thu Jul 06, 2000 at 10:13:12 AM EST

If you are going to pursue this, I am very interested into the project and would like to hear from you.

[ Parent ]
GDB? (none / 0) (#33)
by pin0cchio on Fri Jul 07, 2000 at 05:38:05 PM EST

IIRC, the GNU debugger GDB has a disassembler. Would this help?
lj65
[ Parent ]
Disassemblers are common enough (none / 0) (#37)
by marlowe on Sun Jul 09, 2000 at 09:00:47 AM EST

But to decompile into a maintainable style of C is something I've never seen. And that's really what we want.

I don't think it would be easy. Maybe we should settle for disassembly?

--- A vacant engineer rides on a train of thought that will not take him home ---
-- The Americans are the Jews of the 21st century. Only we won't go as quietly to the gas chambers. --
[ Parent ]
I want source again (3.00 / 1) (#22)
by kris on Thu Jul 06, 2000 at 12:55:41 AM EST

Sticking strings into a program is almost certain not to work to mark
it, for multiple reasons. One of them is that with an Open Source
program in the first place, such "easy copy protections" are glaringly
obvious from the source. Another is that with a binary with i18n hooks
there are few strings in the binary.

No, what I thought of are tools for walking through a Closed Source
program such as SoftICE for Windows - is there such a tool with similar
capability in Linux? And tools helping to produce source again from
compiled code - interactive disassemblers with hooks to recreate a
symbol table, or even better decompilers and other tools to create
proper HLL code. What are people using to do a proper analysis and
reverse engineering of foreign code?



Re: I want source again (none / 0) (#23)
by Cryptnotic on Thu Jul 06, 2000 at 02:04:23 AM EST

Free disassemblers are currently somewhat limited... I currently use:
objdump --disassemble somefile
and
objdump --disassemble-all somefile
objdump is part of the GNU binutils package.

[ Parent ]
Re: I want source again (4.00 / 2) (#26)
by kris on Thu Jul 06, 2000 at 04:23:14 AM EST

I know about objdump and it is a great tool, when you still have a symbol table and when you are working low level. I could think of something more sophisticated, though. How about an objdump, where synthetic symbols are assigned to all subroutines and labels, and where you have the ability to edit that symbol table so that all uses of that symbol are changed with a single flip of the switch. That is, you get to see something like

08048400 <main>:
 8048400:       55                       push   %ebp
 8048401:       89 e5                  mov    %esp,%ebp
 8048403:       83 ec 08              sub    $0x8,%esp
 8048406:       83 c4 f4                add    $0xfffffff4,%esp
 8048409:       68 90 84 04 08          push   $0x8048490
 804840e:       e8 f9 fe ff ff          call   804830c <_init+0x74>
 8048413:       83 c4 10                add    $0x10,%esp
 8048416:       31 c0                   xor    %eax,%eax
 8048418:       eb 06                   jmp    8048420 <main+0x20>
 804841a:       8d b6 00 00 00 00       lea    0x0(%esi),%esi
 8048420:       89 ec                   mov    %ebp,%esp
 8048422:       5d                      pop    %ebp
 8048423:       c3                      ret

and have the ability to assign the proper name "printf" to "804830c" and reread that source again.

And for more high level tools, how about transforming that piece of code into register transfer language and then a parse tree again. Even if you are not going to create HLL code again, you might be able to do useful things with the parse tree, like creating interface definitions and API documentations of propietary libraries, or run very simple automated theorem provers over the code in order to automatically find buffer overflows, or prove that the code in question is not viral.



[ Parent ]
Re: I want source again (none / 0) (#24)
by Cryptnotic on Thu Jul 06, 2000 at 02:09:06 AM EST

Doing a quick search on freshmeat, I found BIEW, a binary viewer for x86. I have not used this program, so I cannot comment on its usefulness.

[ Parent ]
Linking and embedding, or something (none / 0) (#25)
by Demona on Thu Jul 06, 2000 at 03:35:23 AM EST

Traditional stuff like this in the DOS days wasn't all that sophisticated compared to the tools available on most current free software platforms; they just looked prettier 'cause they had an interface, which can easily be mocked up in curses, tcl/tk or whatever. Of course, those tools still require skill to be used effectively.

There's a lot of material out there to read on the subject, and one of the most accessible was the recent "Breaking of Cyberpatrol" (just do a search, it was mirrored all over like DeCSS and the like); as a non-programmer, I found it engaging, instructive, and very readable (and fun). It covered a basic garden variety of introductory subjects, including encryption issues (the primary topic).

[ Parent ]

The best tool for reverse engineering (4.00 / 3) (#28)
by odradek on Thu Jul 06, 2000 at 06:10:26 AM EST

There is really no better tool for reverse engineering than IDA Pro. Yeah, it's not open source, and it's not free, but it is amazing for this type of work. It allows a person to examine the cross references of symbols and interactively modify the interpretation of various parts of it. It even identifies routines from some standard libraries. Using it, I've been able to identify open source libraries used by various commercial packages. It might be an interesting task to write a free/open version of IDA Pro.

The only caveats I have about this tool are that it only runs on DOS, Windows, and (surprisingly) OS/2, and it's somewhat pricy. Still, it allows reverse engineering software for a vast array of operating systems and processors, and is, IMHO, the best tool available.



Some related links (4.00 / 2) (#31)
by kris on Thu Jul 06, 2000 at 10:52:08 AM EST

Just gobbeled up this here. No time to annotate, no time to read just now. Perhaps something of this is useful. The dcc link at svrc.uq.edu.au looks particularly promising.

http://www.csee.uq.edu.au/~csmweb/decompilation/
http://www.decompiler.com/download.html
http://www.room42.com/store/computer_center/decompiler.shtml
http://www.decompiler.net/
http://www.svrc.uq.edu.au/groups/csm/dcc.html
http://www.decompile.com/html/decompiler_faq.html
http://www.softguide.de/prog_y/py_0123.htm
http://advice.networkice.com/Advice/Underground/Hacking/Methods/Reverse_Engineering/default.htm
http://208.233.94.170/fravia/fravia.org/
http://citeseer.nj.nec.com/did/10614


Whats the flipside of this, devils-advocate-wise? (4.00 / 1) (#32)
by torpor on Thu Jul 06, 2000 at 08:11:27 PM EST

What tools exist to take, for example, some C source code, obfuscate it in some fashion, change basic structures, make semi-non-obvious changes to it, and produce re-compilable code that is not an *exact* duplicate of the original yet produces the same fundamental binaries?

I'm thinking some sort of lint-like obfuscator. I'm sure I remember seeing something like this in the old USENET C archive CD's I used to subscribe to in the early 90's, but I don't remember the details exactly.

Follow this trail and you'll see how futile this argument really is, just like copy protection, just like intellectual property, etc.

Yet another example of how technology in and of itself cannot be used to solve an ethic/moral dilemma.

Machines simply cannot be used to *SOLVE* problems of the soul.

j. -- boink! i have no sig!
Open Source Reverse Engineering Tools | 38 comments (33 topical, 5 editorial, 0 hidden)
Display: Sort:

kuro5hin.org

[XML]
All trademarks and copyrights on this page are owned by their respective companies. The Rest 2000 - Present Kuro5hin.org Inc.
See our legalese page for copyright policies. Please also read our Privacy Policy.
Kuro5hin.org is powered by Free Software, including Apache, Perl, and Linux, The Scoop Engine that runs this site is freely available, under the terms of the GPL.
Need some help? Email help@kuro5hin.org.
My heart's the long stairs.

Powered by Scoop create account | help/FAQ | mission | links | search | IRC | YOU choose the stories!